🟣Structural Bioinformatics
Structural Bioinformatics — FAIR Data Management Summary Table
🧬 Structural Bioinformatics — FAIR Data Management Summary Table
Section
Subsection
Key Considerations / Description
Solutions / Best Practices
Direct External Links
Introduction
Overview
Structural bioinformatics studies 3D structure of macromolecules (proteins, RNA, DNA, carbohydrates, ligands) using experimental and computational approaches.
Apply FAIR principles; capture metadata during modelling; deposit models with clear provenance and licensing.
—
Storing & Sharing Structure Predictions
Model type
Is your prediction experimental (integrative/hybrid) or purely computational (in silico)?
Determines appropriate deposition system.
—
Prediction purpose
Is this a large-scale automated effort or manual, application-specific modelling?
Select suitable repository or service.
—
Modelling steps
Were modelling steps/software documented clearly?
Ensure reproducibility; cite software/methods.
—
Input data
What alignments, homologues, templates were used?
Improve transparency and downstream reuse.
—
Model accuracy
Does the model include local/global confidence metrics?
Use recognised tools to compute accuracy.
Model deposition
1. ModelArchive – theoretical models
Deposit with quality metrics and metadata.
3. MineProt – large-scale AlphaFold-style pipelines
Deploy self-hosted model server.
4. 3D-Beacons – unify model access across sources
Network to unify predicted/experimental access.
Model metadata
Store metadata using ModelCIF extension dictionary.
Community-maintained format for prediction metadata.
Quality estimation
Benchmark confidence using community tools.
E.g., global and per-residue metrics.
Storing & Sharing Experimentally Solved Structures
Structure sources
X-ray, NMR, cryo-EM methods produce different data types/formats.
Deposit models + raw data separately.
—
Cryo-EM raw data
EM volumes/tomograms stored in EMPIAR, models in EMDB.
Use MRC/CCP4 format and XML metadata.
X-ray raw data
Store in IRRMC or SBGrid Data Bank.
Cross-reference to PDB entries.
Metadata formats
Follow repository-specific schema and file formats.
E.g., EMDB XML/XSD, EMPIAR Schematron.
Workflow tools
Use Scipion for image processing and deposition.
FAIR-compatible workflow engine for EM.
Infrastructure tools
Use ARIA for managing proposal access & linking data.
Link metadata to equipment, projects, outputs.
Structure annotation
Visualise and annotate models with 3DBioNotes.
Integrate biomedical + biochemical annotations.
COVID-19 special collection
Explore published SARS-CoV-2 structures.
Use dedicated structural hub.
Expanded Table for the “Structural Bioinformatics” Domain (RDMkit)
Includes direct external links for items referenced on the page.
Introduction
‑ Structural bioinformatics covers 3D structures of macromolecules (proteins, RNA, DNA, carbohydrates + bound small molecules). ‑ Both experimental and computational methods apply. ‑ FAIR deposition of models + metadata required. (RDMkit)
Encourage metadata capture during modelling/acquisition; apply FAIR principles across structure/data flow.
—
Storing & Sharing Structure Predictions
Considerations include: • Is the model based on experimental or purely computational data?• Purpose: large‑scale automated vs individual manual modelling?• Source sequence origins (link to UniProt etc).• Document modelling steps, input data, tools.• Provide model quality/confidence estimates.• Clear licensing/usage terms. (RDMkit)
Use appropriate repository depending on model type; ensure metadata, quality and licence are present; adopt standard formats; cross‑link to source sequence/databases.
• ModelArchive – theoretical/computational models
• PDB‑Dev – integrative/hybrid models
• 3D‑Beacons Network – unified access to predicted + experimental models
• MineProt – lightweight self‑hosted model sharing platform
• ModelCIF dictionary – metadata extension for computational models • PDBx/mmCIF format tools – standard format for structural models
• UniProtKB – sequence cross‑reference
Storing & Sharing Experimentally‑Solved Structures
Challenges: • Raw/intermediate data (maps, volumes, tomograms) vs final models.• Multiple experimental modalities (X‑ray, NMR, cryo‑EM) each with specialized metadata/format requirements.• Traceability between raw data, intermediate steps, processed models and publications.
Deposit final atomic models and raw/intermediate data in domain‑specific archives; use standard formats; document workflow provenance; link everything (data, model, publication).
• PDB (Worldwide Protein Data Bank) – final atomic models
• EMDB (Electron Microscopy Data Bank) – cryo‑EM reconstructions
• EMPIAR (Electron Microscopy Public Image Archive) – raw EM data
• BMRB (Biological Magnetic Resonance Bank) – NMR data
• IRRMC (Integrated Resource for Reproducibility in Macromolecular Crystallography) – raw crystallography data
• Scipion / ScipionCloud – workflow & provenance management for cryo‑EM
Licensing & Metadata
Issues: • Lack of clarity about model/data licences limits reuse.• Metadata often missing or incompatible across resources.
Use open licences (e.g., CC0, CC BY‑SA); adopt standard metadata models; ensure metadata includes modelling provenance and quality metrics.
• CC0 1.0 Universal – public domain dedication
• CC BY‑SA 4.0 – share‑alike licence
Standards & Metadata Models
Structural bioinformatics lacks a single universal metadata model for both experimental and predicted structures; interoperability issues persist.
Use standard formats (PDBx/mmCIF, ModelCIF); adopt community metadata schemas; cross‑link identifiers (sequence, structure, model).
• PDBx/mmCIF – standard structure format
• ModelCIF – extension for computational models
Validation & Benchmarking
Ensuring model quality and method evaluation is important for confidence in structure predictions.
Participate in benchmarking initiatives; publish reproducible model evaluation results; adopt community standard assessment frameworks.
• CAMEO – continuous evaluation of protein structure prediction servers
• CASP (Critical Assessment of Structure Prediction) – biennial structure prediction competition • CAPRI (Critical Assessment of PRediction of Interactions) – evaluation of docking/prediction of protein‑protein interactions
Tool Registries & Discovery
Finding and selecting appropriate resources/tools is challenging; interoperability of software and metadata is often weak.
Use registries of tools and resources; publish your tools to registries; link to standard resources for interoperability.
• bio.tools registry – catalogue of bioinformatics tools
• FAIRsharing.org – registry of standards, repositories, policies
Training & Capacity Building
Many practitioners may lack awareness of FAIR deposition, metadata standards, model archiving.
Use training portals, courses, community networks; integrate training into workflow planning.
• TeSS: Training e‑Support System – ELIXIR training portal
• ELIXIR 3D‑BioInfo Community – community for 3D data & structural bioinformatics
• PDBe Online Course: “Exploring 3D Macromolecular Structure Data” – focused tutorial on structure deposition & exploration
National & Infrastructure Resources
Many national nodes and infrastructures provide specialized services/support which may not be always evident; coordination is required.
Identify the national node/infrastructure relevant to your country; link to imaging/structure facilities and data management services early.
(Note: The RDMkit page lists many national nodes but not always external links).
Last updated