Skip to main content
Biomni Lab provides access to 60+ biomedical databases, HPC-accelerated analysis tools, and curated software packages. Browse the Resources panel in the sidebar to explore what’s available.

Databases

Protein Structures

DatabaseDescription
PDBProtein Data Bank - experimental 3D structures
AlphaFold DBPredicted protein structures for 200M+ proteins
UniProtProtein sequences, function, and annotations
InterProProtein families, domains, and functional sites

Genomics & Variants

DatabaseDescription
EnsemblGenome browser, gene annotations, and comparative genomics
NCBI GeneGene-centric information across species
GEOGene Expression Omnibus - public expression datasets
dbSNPSingle nucleotide polymorphisms database
ClinVarClinical variant interpretations and pathogenicity
gnomADGenome/exome aggregation database - population variants
GTExGenotype-Tissue Expression - tissue-specific expression
RefSeqReference sequences for genomes, transcripts, proteins
1000 GenomesHuman genetic variation from global populations
TOPMedTrans-Omics for Precision Medicine variants
UK BiobankLarge-scale biomedical database (requires access)
ClinGenClinical genome resource for variant curation
LOVDLocus-specific variant databases
HGMDHuman Gene Mutation Database

Literature

DatabaseDescription
PubMedBiomedical literature citations and abstracts

Pathways & Ontologies

DatabaseDescription
KEGGPathway maps, genome, and disease information
ReactomeCurated pathway database
Gene OntologyStandardized gene and protein function annotations
MSigDBMolecular Signatures Database - gene sets for GSEA

Cancer

DatabaseDescription
COSMICCatalogue of Somatic Mutations in Cancer
TCGAThe Cancer Genome Atlas - multi-omics cancer data
cBioPortalCancer genomics data visualization and analysis
DepMapCancer dependency map - gene essentiality

Drugs & Compounds

DatabaseDescription
ChEMBLBioactivity data for drug-like molecules
PubChemChemical structures and biological activities
BindingDBProtein-ligand binding affinities
ClinicalTrials.govRegistry of clinical studies
DrugBankDrug and drug target information
DGIdbDrug-gene interaction database
PharmGKBPharmacogenomics knowledge base
TTDTherapeutic Target Database

Single Cell

DatabaseDescription
CZI Cell CensusChan Zuckerberg Initiative single-cell atlas
Human Cell AtlasReference maps of all human cells
CellMarker 2.0Cell type marker genes

Disease & Phenotype

DatabaseDescription
DisGeNETGene-disease associations
OMIMOnline Mendelian Inheritance in Man
GWAS CatalogPublished genome-wide association studies
OpenTargetsDrug target identification and validation
HPOHuman Phenotype Ontology
OrphanetRare disease information
MalaCardsHuman disease database

Protein Interactions & Networks

DatabaseDescription
STRINGProtein-protein interaction networks
BioGRIDBiological interaction repository
Human Protein AtlasProtein expression across tissues and cells
PrimeKGPrecision medicine knowledge graph

RNA & Regulatory

DatabaseDescription
miRTarBasemicroRNA-target interactions
ENCODEEncyclopedia of DNA Elements - regulatory data
JASPARTranscription factor binding profiles
ReMapRegulatory atlas of DNA-binding proteins
RNAcentralNon-coding RNA sequences
lncRNAdbLong non-coding RNA database

Additional Resources

DatabaseDescription
AddgenePlasmid repository
PRIDEProteomics data repository
MouseMineMouse genome informatics
TxGNNTherapeutic target prediction
ZINCPurchasable compounds for virtual screening
ChEBIChemical Entities of Biological Interest
PfamProtein domain families
PROSITEProtein domains and motifs
IntActMolecular interaction database
Metabolomics WorkbenchMetabolomics data repository
Database queries are cached for faster subsequent lookups. Results are refreshed regularly to ensure up-to-date information.

Tools

Structure Prediction (HPC-Accelerated)

ToolDescriptionGPU
AlphaFoldProtein structure predictionYes
Boltz-2Fast structure predictionYes
RFDiffusionProtein design with diffusion modelsYes
ColabFoldFast AlphaFold with MSA serverYes
ESMFoldLanguage model-based structure predictionYes
OpenFoldOpen-source AlphaFold implementationYes
RoseTTAFoldAlternative structure predictionYes

Genome Assembly

ToolDescriptionBest For
CanuLong-read assemblyPacBio, ONT
FlyeFast long-read assemblyONT, PacBio HiFi
SPAdesShort-read assemblyIllumina
TrinityTranscriptome assemblyRNA-seq
HifiasmHiFi read assemblyPacBio HiFi
wtdbg2Fast long-read assemblyLarge genomes

Alignment

ToolBest ForSpeed
minimap2Long reads (PacBio, ONT)Very fast
STARRNA-seq, splice-awareFast
bowtie2DNA-seq, ChIP-seqFast
BWA-MEMDNA-seq, whole genomeFast
HISAT2RNA-seq, low memoryMedium
SalmonTranscript quantificationVery fast
kallistoTranscript quantificationVery fast

Variant Calling

ToolVariant TypesUse Case
BCFtoolsSNPs, indelsFast variant calling
Clair3SNPs, indelsLong-read variants
SnifflesStructural variantsLong-read SV calling
GATK HaplotypeCallerSNPs, indelsGermline variants
GATK Mutect2SNPs, indelsSomatic variants
DeepVariantSNPs, indelsDeep learning-based
FreeBayesSNPs, indels, MNPsHaplotype-based

Biomni Package Tools

ToolDescription
Primer DesignDesign PCR and sequencing primers
Cloning AssistantPlan molecular cloning strategies
CRISPR Guide DesignDesign sgRNAs for gene editing
Sequence AnalysisAnalyze DNA/protein sequences
GPU-accelerated tools run on HPC infrastructure. Structure predictions for large proteins may take 30-60 minutes.

Packages

Python

PackageDescription
scanpySingle-cell analysis
biopythonBiological computation
rdkitCheminformatics
pandasData manipulation
numpyNumerical computing
scipyScientific computing
scikit-learnMachine learning
torchDeep learning
tensorflowDeep learning
matplotlibVisualization
seabornStatistical visualization
plotlyInteractive plots
anndataAnnotated data matrices
pyBigWigBigWig file handling
pysamSAM/BAM file handling

R

PackageDescription
DESeq2Differential expression analysis
edgeRDifferential expression
limmaLinear models for microarray
ggplot2Data visualization
SeuratSingle-cell analysis
clusterProfilerEnrichment analysis
ComplexHeatmapAdvanced heatmaps
GenomicRangesGenomic intervals
BiostringsBiological sequences
BSgenomeReference genomes
VariantAnnotationVCF handling

CLI Tools

ToolDescription
samtoolsSAM/BAM manipulation
bedtoolsBED file operations
bcftoolsVCF/BCF manipulation
blastSequence alignment
gatkGenome analysis toolkit
fastqcQuality control
trimmomaticRead trimming
cutadaptAdapter removal
featureCountsRead counting
htseqRead counting
vcftoolsVCF manipulation
tabixIndex TAB-delimited files

Reference Genomes

Pre-indexed reference genomes available:
OrganismAssemblies
HumanGRCh38 (hg38), GRCh37 (hg19)
MouseGRCm39 (mm39), GRCm38 (mm10)
RatmRatBN7.2
ZebrafishGRCz11
DrosophilaBDGP6
C. elegansWBcel235
YeastR64
E. coliK-12 MG1655
For organisms not listed, you can provide your own reference FASTA file. Biomni Lab will index it automatically.

Requesting New Resources

Don’t see a tool or database you need? Request it:
  1. Click the Help button in the sidebar
  2. Select Live Chat or Join Slack to connect with other researchers and our team
  3. Describe the tool/database and your use case
  4. Our team evaluates requests weekly
Popular requests are prioritized for addition.