🟣Structural Bioinformatics

Structural Bioinformatics — FAIR Data Management Summary Table

The table below provides an overview of best practices for managing and sharing macromolecular structure data in structural bioinformatics. It summarizes key considerations, recommended deposition strategies, metadata standards, and external tools for predicted and experimentally solved structures, with direct links to authoritative resources.

🧬 Structural Bioinformatics — FAIR Data Management Summary Table

Section

Subsection

Key Considerations / Description

Solutions / Best Practices

Direct External Links

Introduction

Overview

Structural bioinformatics studies 3D structure of macromolecules (proteins, RNA, DNA, carbohydrates, ligands) using experimental and computational approaches.

Apply FAIR principles; capture metadata during modelling; deposit models with clear provenance and licensing.

Storing & Sharing Structure Predictions

Model type

Is your prediction experimental (integrative/hybrid) or purely computational (in silico)?

Determines appropriate deposition system.

Prediction purpose

Is this a large-scale automated effort or manual, application-specific modelling?

Select suitable repository or service.

Sequence source

What is the source for modelled sequences?

Link to UniProtKB.

Modelling steps

Were modelling steps/software documented clearly?

Ensure reproducibility; cite software/methods.

Input data

What alignments, homologues, templates were used?

Improve transparency and downstream reuse.

Model accuracy

Does the model include local/global confidence metrics?

Use recognised tools to compute accuracy.

Licensing

Are usage terms clear and permissive?

Model deposition

1. ModelArchive – theoretical models

Deposit with quality metrics and metadata.

2. PDB-Dev – integrative/hybrid models

Deposit under CC0 licence.

3. MineProt – large-scale AlphaFold-style pipelines

Deploy self-hosted model server.

4. 3D-Beacons – unify model access across sources

Network to unify predicted/experimental access.

Model format

Use PDBx/mmCIF format.

Preferred over legacy PDB format.

Model metadata

Store metadata using ModelCIF extension dictionary.

Community-maintained format for prediction metadata.

Quality estimation

Benchmark confidence using community tools.

E.g., global and per-residue metrics.

Storing & Sharing Experimentally Solved Structures

Structure sources

X-ray, NMR, cryo-EM methods produce different data types/formats.

Deposit models + raw data separately.

Atomic models

Use standard PDBx/mmCIF format.

Store models in PDB.

Cryo-EM raw data

EM volumes/tomograms stored in EMPIAR, models in EMDB.

Use MRC/CCP4 format and XML metadata.

NMR raw data

Store in BMRB using NMR-STAR format.

Support STAR-based metadata.

X-ray raw data

Store in IRRMC or SBGrid Data Bank.

Cross-reference to PDB entries.

Metadata formats

Follow repository-specific schema and file formats.

E.g., EMDB XML/XSD, EMPIAR Schematron.

Workflow tools

Use Scipion for image processing and deposition.

FAIR-compatible workflow engine for EM.

Infrastructure tools

Use ARIA for managing proposal access & linking data.

Link metadata to equipment, projects, outputs.

Structure annotation

Visualise and annotate models with 3DBioNotes.

Integrate biomedical + biochemical annotations.

COVID-19 special collection

Explore published SARS-CoV-2 structures.

Use dedicated structural hub.


Expanded Table for the “Structural Bioinformatics” Domain (RDMkit)

Includes direct external links for items referenced on the page.

Section
Key Considerations / Challenges
Solutions / Best Practices
External Resources & Links

Introduction

‑ Structural bioinformatics covers 3D structures of macromolecules (proteins, RNA, DNA, carbohydrates + bound small molecules). ‑ Both experimental and computational methods apply. ‑ FAIR deposition of models + metadata required. (RDMkit)

Encourage metadata capture during modelling/acquisition; apply FAIR principles across structure/data flow.

Storing & Sharing Structure Predictions

Considerations include: • Is the model based on experimental or purely computational data?• Purpose: large‑scale automated vs individual manual modelling?• Source sequence origins (link to UniProt etc).• Document modelling steps, input data, tools.• Provide model quality/confidence estimates.• Clear licensing/usage terms. (RDMkit)

Use appropriate repository depending on model type; ensure metadata, quality and licence are present; adopt standard formats; cross‑link to source sequence/databases.

ModelArchive – theoretical/computational models

PDB‑Dev – integrative/hybrid models

3D‑Beacons Network – unified access to predicted + experimental models

MineProt – lightweight self‑hosted model sharing platform

ModelCIF dictionary – metadata extension for computational models • PDBx/mmCIF format tools – standard format for structural models

UniProtKB – sequence cross‑reference

Storing & Sharing Experimentally‑Solved Structures

Challenges: • Raw/intermediate data (maps, volumes, tomograms) vs final models.• Multiple experimental modalities (X‑ray, NMR, cryo‑EM) each with specialized metadata/format requirements.• Traceability between raw data, intermediate steps, processed models and publications.

Deposit final atomic models and raw/intermediate data in domain‑specific archives; use standard formats; document workflow provenance; link everything (data, model, publication).

PDB (Worldwide Protein Data Bank) – final atomic models

EMDB (Electron Microscopy Data Bank) – cryo‑EM reconstructions

EMPIAR (Electron Microscopy Public Image Archive) – raw EM data

BMRB (Biological Magnetic Resonance Bank) – NMR data

IRRMC (Integrated Resource for Reproducibility in Macromolecular Crystallography) – raw crystallography data

Scipion / ScipionCloud – workflow & provenance management for cryo‑EM

Licensing & Metadata

Issues: • Lack of clarity about model/data licences limits reuse.• Metadata often missing or incompatible across resources.

Use open licences (e.g., CC0, CC BY‑SA); adopt standard metadata models; ensure metadata includes modelling provenance and quality metrics.

CC0 1.0 Universal – public domain dedication

CC BY‑SA 4.0 – share‑alike licence

Standards & Metadata Models

Structural bioinformatics lacks a single universal metadata model for both experimental and predicted structures; interoperability issues persist.

Use standard formats (PDBx/mmCIF, ModelCIF); adopt community metadata schemas; cross‑link identifiers (sequence, structure, model).

PDBx/mmCIF – standard structure format

ModelCIF – extension for computational models

Validation & Benchmarking

Ensuring model quality and method evaluation is important for confidence in structure predictions.

Participate in benchmarking initiatives; publish reproducible model evaluation results; adopt community standard assessment frameworks.

CAMEO – continuous evaluation of protein structure prediction servers

CASP (Critical Assessment of Structure Prediction) – biennial structure prediction competition • CAPRI (Critical Assessment of PRediction of Interactions) – evaluation of docking/prediction of protein‑protein interactions

Tool Registries & Discovery

Finding and selecting appropriate resources/tools is challenging; interoperability of software and metadata is often weak.

Use registries of tools and resources; publish your tools to registries; link to standard resources for interoperability.

bio.tools registry – catalogue of bioinformatics tools

FAIRsharing.org – registry of standards, repositories, policies

Training & Capacity Building

Many practitioners may lack awareness of FAIR deposition, metadata standards, model archiving.

Use training portals, courses, community networks; integrate training into workflow planning.

TeSS: Training e‑Support System – ELIXIR training portal

ELIXIR 3D‑BioInfo Community – community for 3D data & structural bioinformatics

PDBe Online Course: “Exploring 3D Macromolecular Structure Data” – focused tutorial on structure deposition & exploration

National & Infrastructure Resources

Many national nodes and infrastructures provide specialized services/support which may not be always evident; coordination is required.

Identify the national node/infrastructure relevant to your country; link to imaging/structure facilities and data management services early.

(Note: The RDMkit page lists many national nodes but not always external links).

Last updated