https://github.com/braceal/bio_ai_agent_example
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: braceal
- Language: Python
- Default Branch: main
- Size: 15.6 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Bio AI Agent Example
A Python application for automated phylogenetic analysis using local bioinformatics tools. This workflow fetches protein sequences directly from NCBI, performs multiple sequence alignment, and builds phylogenetic trees.
Overview
This workflow performs comparative analysis of flagellin proteins across different bacterial species:
- Sequence Retrieval: Fetches protein sequences from NCBI using Biopython
- Multiple Sequence Alignment: Aligns sequences using MAFFT, ClustalO, or MUSCLE
- Phylogenetic Analysis: Builds trees using FastTree, RAxML, and IQ-TREE
- Output: Generates publication-ready phylogenetic trees in Newick format
Prerequisites
Python Requirements
- Python 3.8 or higher
- Internet connection for NCBI database queries
External Bioinformatics Tools
You'll need to install the following command-line tools (install instructions below):
Alignment Tools: - MAFFT - Multiple sequence alignment - Clustal Omega - Multiple sequence alignment - MUSCLE - Multiple sequence alignment
Phylogenetic Tools: - FastTree - Fast phylogenetic tree construction - RAxML - Maximum likelihood phylogenetic analysis - IQ-TREE - Efficient phylogenetic inference
Setup Instructions
1. Clone the Repository
bash
git clone git@github.com:braceal/bio_ai_agent_example.git
cd bio_ai_agent_example
2. Create a Virtual Environment
Create and activate a Python virtual environment:
```bash
Create virtual environment
python3 -m venv venv
Activate virtual environment
source venv/bin/activate ```
3. Install Python Dependencies
Install the required Python packages:
bash
conda create -n bio_ai_agent_example python=3.12
conda activate bio_ai_agent_example
pip install biopython
conda install -c bioconda mafft clustalo muscle fasttree raxml iqtree -y
4. Configure Email for NCBI
Edit workflow.py and update the email address for NCBI Entrez queries:
python
Entrez.email = "your.email@example.com" # Replace with your actual email
Note: The code runs without changing the email address.
Usage
Run the complete phylogenetic analysis workflow:
bash
python workflow.py
What the workflow does:
- Fetches protein sequences for flagellin genes from 10 bacterial species
- Creates individual FASTA files in the
fasta_seqs/directory - Combines and aligns sequences using MAFFT
- Builds three phylogenetic trees using different methods:
- FastTree (fast approximate method)
- RAxML (maximum likelihood with bootstrap)
- IQ-TREE (model selection + ultrafast bootstrap)
Output Files
The workflow generates several output files:
fasta_seqs/- Directory containing individual FASTA files for each speciescombined.fasta- All sequences combined into one filealignment_mafft.fasta- Multiple sequence alignmentalignment_mafft.fasttree.nwk- FastTree phylogenyRAxML_bestTree.raxml_tree- RAxML phylogenyiqtree_out.treefile- IQ-TREE phylogeny
To view the trees, you can use the browser tool: https://itol.embl.de/upload.cgi
Customization
Modify Species List
Edit the species list in workflow.py to analyze different genes/organisms:
python
species = [
("gene_name", "Organism name"),
("fliC", "Escherichia coli"),
# Add more species...
]
Change Alignment Method
Modify the alignment method call:
python
msa_file = run_alignment(fasta_files, method="clustalo") # or "muscle"
Select Different Tree Methods
Comment out tree-building methods you don't need:
```python
fasttreeout = runfasttree(msa_file) # Skip FastTree
raxmlout = runraxml(msafile) # Keep RAxML iqtreeout = runiqtree(msafile) # Keep IQ-TREE ```
Project Structure
workflow.py- Main phylogenetic analysis workflowrequirements.txt- Python dependenciesREADME.md- This documentationfasta_seqs/- Output directory for individual sequences (created during run)
Deactivating the Virtual Environment
When finished, deactivate the virtual environment:
bash
deactivate
Owner
- Name: Alex Brace
- Login: braceal
- Kind: user
- Company: University of Chicago
- Repositories: 11
- Profile: https://github.com/braceal
GitHub Events
Total
- Push event: 1
- Public event: 1
Last Year
- Push event: 1
- Public event: 1