🟣Structural Bioinformatics
Structural Bioinformatics — FAIR Data Management Summary Table
The table below provides an overview of best practices for managing and sharing macromolecular structure data in structural bioinformatics. It summarizes key considerations, recommended deposition strategies, metadata standards, and external tools for predicted and experimentally solved structures, with direct links to authoritative resources.
🧬 Structural Bioinformatics — FAIR Data Management Summary Table
Section
Subsection
Key Considerations / Description
Solutions / Best Practices
Direct External Links
Introduction
Overview
Structural bioinformatics studies 3D structure of macromolecules (proteins, RNA, DNA, carbohydrates, ligands) using experimental and computational approaches.
Apply FAIR principles; capture metadata during modelling; deposit models with clear provenance and licensing.
—
Storing & Sharing Structure Predictions
Model type
Is your prediction experimental (integrative/hybrid) or purely computational (in silico)?
Determines appropriate deposition system.
—
Prediction purpose
Is this a large-scale automated effort or manual, application-specific modelling?
Select suitable repository or service.
—
Modelling steps
Were modelling steps/software documented clearly?
Ensure reproducibility; cite software/methods.
—
Input data
What alignments, homologues, templates were used?
Improve transparency and downstream reuse.
—
Model accuracy
Does the model include local/global confidence metrics?
Use recognised tools to compute accuracy.
Model deposition
1. ModelArchive – theoretical models
Deposit with quality metrics and metadata.
3. MineProt – large-scale AlphaFold-style pipelines
Deploy self-hosted model server.
4. 3D-Beacons – unify model access across sources
Network to unify predicted/experimental access.
Model metadata
Store metadata using ModelCIF extension dictionary.
Community-maintained format for prediction metadata.
Quality estimation
Benchmark confidence using community tools.
E.g., global and per-residue metrics.
Storing & Sharing Experimentally Solved Structures
Structure sources
X-ray, NMR, cryo-EM methods produce different data types/formats.
Deposit models + raw data separately.
—
Cryo-EM raw data
EM volumes/tomograms stored in EMPIAR, models in EMDB.
Use MRC/CCP4 format and XML metadata.
X-ray raw data
Store in IRRMC or SBGrid Data Bank.
Cross-reference to PDB entries.
Metadata formats
Follow repository-specific schema and file formats.
E.g., EMDB XML/XSD, EMPIAR Schematron.
Workflow tools
Use Scipion for image processing and deposition.
FAIR-compatible workflow engine for EM.
Infrastructure tools
Use ARIA for managing proposal access & linking data.
Link metadata to equipment, projects, outputs.
Structure annotation
Visualise and annotate models with 3DBioNotes.
Integrate biomedical + biochemical annotations.
COVID-19 special collection
Explore published SARS-CoV-2 structures.
Use dedicated structural hub.
Expanded Table for the “Structural Bioinformatics” Domain (RDMkit)
Includes direct external links for items referenced on the page.
Introduction
‑ Structural bioinformatics covers 3D structures of macromolecules (proteins, RNA, DNA, carbohydrates + bound small molecules). ‑ Both experimental and computational methods apply. ‑ FAIR deposition of models + metadata required. (RDMkit)
Encourage metadata capture during modelling/acquisition; apply FAIR principles across structure/data flow.
—
Storing & Sharing Structure Predictions
Considerations include: • Is the model based on experimental or purely computational data?• Purpose: large‑scale automated vs individual manual modelling?• Source sequence origins (link to UniProt etc).• Document modelling steps, input data, tools.• Provide model quality/confidence estimates.• Clear licensing/usage terms. (RDMkit)
Use appropriate repository depending on model type; ensure metadata, quality and licence are present; adopt standard formats; cross‑link to source sequence/databases.
• ModelArchive – theoretical/computational models
• PDB‑Dev – integrative/hybrid models
• 3D‑Beacons Network – unified access to predicted + experimental models
• MineProt – lightweight self‑hosted model sharing platform
• ModelCIF dictionary – metadata extension for computational models • PDBx/mmCIF format tools – standard format for structural models
• UniProtKB – sequence cross‑reference
Storing & Sharing Experimentally‑Solved Structures
Challenges: • Raw/intermediate data (maps, volumes, tomograms) vs final models.• Multiple experimental modalities (X‑ray, NMR, cryo‑EM) each with specialized metadata/format requirements.• Traceability between raw data, intermediate steps, processed models and publications.
Deposit final atomic models and raw/intermediate data in domain‑specific archives; use standard formats; document workflow provenance; link everything (data, model, publication).
• PDB (Worldwide Protein Data Bank) – final atomic models
• EMDB (Electron Microscopy Data Bank) – cryo‑EM reconstructions
• EMPIAR (Electron Microscopy Public Image Archive) – raw EM data
• BMRB (Biological Magnetic Resonance Bank) – NMR data
• IRRMC (Integrated Resource for Reproducibility in Macromolecular Crystallography) – raw crystallography data
• Scipion / ScipionCloud – workflow & provenance management for cryo‑EM
Licensing & Metadata
Issues: • Lack of clarity about model/data licences limits reuse.• Metadata often missing or incompatible across resources.
Use open licences (e.g., CC0, CC BY‑SA); adopt standard metadata models; ensure metadata includes modelling provenance and quality metrics.
• CC0 1.0 Universal – public domain dedication
• CC BY‑SA 4.0 – share‑alike licence
Standards & Metadata Models
Structural bioinformatics lacks a single universal metadata model for both experimental and predicted structures; interoperability issues persist.
Use standard formats (PDBx/mmCIF, ModelCIF); adopt community metadata schemas; cross‑link identifiers (sequence, structure, model).
• PDBx/mmCIF – standard structure format
• ModelCIF – extension for computational models
Validation & Benchmarking
Ensuring model quality and method evaluation is important for confidence in structure predictions.
Participate in benchmarking initiatives; publish reproducible model evaluation results; adopt community standard assessment frameworks.
• CAMEO – continuous evaluation of protein structure prediction servers
• CASP (Critical Assessment of Structure Prediction) – biennial structure prediction competition • CAPRI (Critical Assessment of PRediction of Interactions) – evaluation of docking/prediction of protein‑protein interactions
Tool Registries & Discovery
Finding and selecting appropriate resources/tools is challenging; interoperability of software and metadata is often weak.
Use registries of tools and resources; publish your tools to registries; link to standard resources for interoperability.
• bio.tools registry – catalogue of bioinformatics tools
• FAIRsharing.org – registry of standards, repositories, policies
Training & Capacity Building
Many practitioners may lack awareness of FAIR deposition, metadata standards, model archiving.
Use training portals, courses, community networks; integrate training into workflow planning.
• TeSS: Training e‑Support System – ELIXIR training portal
• ELIXIR 3D‑BioInfo Community – community for 3D data & structural bioinformatics
• PDBe Online Course: “Exploring 3D Macromolecular Structure Data” – focused tutorial on structure deposition & exploration
National & Infrastructure Resources
Many national nodes and infrastructures provide specialized services/support which may not be always evident; coordination is required.
Identify the national node/infrastructure relevant to your country; link to imaging/structure facilities and data management services early.
(Note: The RDMkit page lists many national nodes but not always external links).
Last updated