Foundation Models¶
Pre-trained models for specialized omics domains.
Overview¶
Foundation models are large pre-trained models specialized for specific data types. Agentomics-ML can pre-download these models to speed up agent runs.
Available Types¶
| Type | Domain | Use Case |
|---|---|---|
dna |
Genomics | DNA sequences, variants |
rna |
Transcriptomics | RNA sequences, expression |
protein |
Proteomics | Protein sequences, structure |
molecule |
Chemistry | Small molecules, drugs |
Pre-downloading Models¶
Download foundation models before running:
This downloads relevant models to the Docker image, avoiding download delays during agent execution.
You can also use --foundation-model-type all to include every type.
Multiple Types¶
Download multiple types by running multiple times:
In local mode (--local), models are downloaded into the workspace instead of
being baked into a Docker image.
DNA Models¶
For genomic sequence data:
- Variant effect prediction
- Regulatory element detection
- Sequence classification
Example datasets: - Gene expression from DNA features - SNP effect prediction - Promoter classification
RNA Models¶
For transcriptomic data:
- RNA sequence analysis
- Secondary structure prediction
- Expression-based classification
Example datasets: - RNA-seq classification - Splice site prediction - RNA modification detection
Protein Models¶
For protein sequence data:
- Protein function prediction
- Structure-based classification
- Interaction prediction
Example datasets: - Protein family classification - Enzyme activity prediction - Binding site detection
Molecule Models¶
For small molecule/chemical data:
- Drug property prediction
- Molecular classification
- Activity prediction
Example datasets: - Drug-target interaction - Toxicity prediction - ADMET properties
How the Agent Uses Foundation Models¶
- Discovery - Agent queries available foundation models
- Selection - Agent chooses appropriate model for the data
- Embedding - Features extracted using the model
- Training - Embeddings used as input to ML model
Configuration¶
Foundation model configurations are in:
Each type has a configuration specifying: - Model names and sources - Download locations - Usage instructions for the agent
Without Pre-downloading¶
If you don't pre-download, the agent can still use foundation models but will download them during execution (slower first run).
Storage Requirements¶
Foundation models can be large:
| Type | Approximate Size |
|---|---|
| DNA | 1-5 GB |
| RNA | 1-5 GB |
| Protein | 2-10 GB |
| Molecule | 0.5-2 GB |
Ensure sufficient disk space in the Docker volume or local environment.
GPU Acceleration¶
Foundation models benefit significantly from GPU:
- With GPU: Fast embedding generation
- CPU only: Much slower, but functional
Use --cpu-only if GPU unavailable, but expect longer run times for foundation model-based approaches.
Custom Foundation Models¶
To add custom foundation models:
- Add configuration to
foundation_models/ - Update the download script
- Add usage instructions for the agent
Related¶
- Agent Architecture - How models are used
- GPU Settings - GPU configuration