fasta-ai_csi-r1
Fasta AI_CSI R1 – GPT-based Country Semantic Inference Module
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Repository
Fasta AI_CSI R1 – GPT-based Country Semantic Inference Module
Basic Info
- Host: GitHub
- Owner: Bambusaoldhamii
- License: mit
- Language: HTML
- Default Branch: main
- Size: 249 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Fasta AI_CSI R1: AI-Powered Country Inference for Avian Influenza FASTA files
🔍 Project Overview
Fasta AI_CSI R1 is a semantic inference module designed to resolve ambiguous or inconsistent location strings embedded in avian influenza virus FASTA metadata. When rule-based ISO 3166-1 assignment fails, this tool leverages OpenAI’s GPT models (gpt-3.5 / gpt-4o) to infer the most likely country of origin.
It is designed for scalable metadata normalization in large-scale surveillance of avian influenza viruses, ensuring high-resolution geographic mapping for downstream analysis.
⚙️ Features
- Language-model-driven inference via OpenAI GPT (model selectable)
- Batch processing with progress tracking (
tqdm) - Fault tolerance with autosave/resume and retry mechanism
- Live dictionary updates (
location_to_country_AI.json) - Export of unresolved entries for manual review (
other_locations.csv)
📂 Files
Fasta AI_CSI R1.ipynb: The main executable notebooklocation_to_country_AI.json: Output mapping of AI-inferred countriesother_locations.csv: Unresolved entries labeled as "Other"country_stat.csv: Final country-level summary table
📥 Installation
bash
pip install -r requirements.txt
🔐 OpenAI API Key Setup
This module requires access to OpenAI's API in order to perform country inference using GPT models. Please follow the steps below to obtain and set up your API key:
Step 1. Create an OpenAI account
Visit https://platform.openai.com/signup to create a free or paid OpenAI account.
Step 2. Generate your API key
- Go to your account dashboard: https://platform.openai.com/account/api-keys
- Click “Create new secret key”
- Copy and securely store the generated key (it will only be shown once)
Step 3. Set the API key as an environment variable
You can store your key as an environment variable called OPENAI_API_KEY. For example:
On Linux/macOS:
bash
export OPENAI_API_KEY="your-api-key-here"
To make this permanent, add the above line to your ~/.bashrc or ~/.zshrc.
On Windows (Command Prompt):
cmd
set OPENAI_API_KEY=your-api-key-here
On Windows (PowerShell):
powershell
$env:OPENAI_API_KEY="your-api-key-here"
Step 4. Verify installation
After setting the key, run the notebook. The OpenAI SDK will automatically access OPENAI_API_KEY from the environment.
If the key is missing, the script will raise the following error:
ValueError: ❗ OPENAI_API_KEY not found. Please set your API key as an environment variable.
🔒 Keep your API key private. Do not share or upload it to GitHub.
🧪 Prompt Format
The system uses a standardized prompt:
Determine the country corresponding to each of the following locations...
[omitted here, see full prompt in paper]
📜 Citation
He, Jie-Long (2025). Fasta AI_CSI R1: AI-Powered Country Inference for Avian Influenza FASTA files. Zenodo. https://doi.org/10.5281/zenodo.15344823
Owner
- Login: Bambusaoldhamii
- Kind: user
- Repositories: 1
- Profile: https://github.com/Bambusaoldhamii
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "Fasta AI_CSI R1: AI-Powered Country Inference"
authors:
- family-names: He
given-names: Jie-Long
affiliation: Asia University, Department of Veterinary Medicine
orcid: https://orcid.org/0000-0002-4301-0829
date-released: 2025-05-05
version: 1.0.3
doi: 10.5281/zenodo.15344824
license: MIT
message: "If you use this software, please cite it as below."
GitHub Events
Total
- Release event: 3
- Push event: 5
- Create event: 5
Last Year
- Release event: 3
- Push event: 5
- Create event: 5
Dependencies
- biopython ==1.85
- openai ==1.76.0
- pandas ==2.2.3
- tqdm ==4.67.1