fasta-rbca-resolver
Fasta Automated Rule-Based Country Assignment (RBCA) for influenza
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
Fasta Automated Rule-Based Country Assignment (RBCA) for influenza
Basic Info
- Host: GitHub
- Owner: Bambusaoldhamii
- License: other
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.69 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Rule-Based Country Assignment (RBCA) for Avian Influenza FASTA files
This repository contains the implementation of a rule-based pipeline for assigning standardized ISO 3166-1 country names to geographic metadata embedded in avian influenza FASTA headers. The main script is implemented in Fasta RBCA R1.ipynb.
Overview
This study uses a deterministic string-matching strategy to resolve sampling locations from HA segment FASTA headers. Virus names are expected to follow the GISAID-recommended format:
A/host/location/isolate/year.
The third component (location) is extracted using regular expressions and matched to a standardized dictionary (location_to_country_ISO_3166_1.json) that maps known location strings to ISO 3166-1 short English country names. Unmatched locations are labeled as Other.
Features
- Deterministic parsing using regular expressions
- ISO 3166-1-based country mapping (no AI inference)
- Robust handling of location extraction errors
- Export of location list and country sample counts
- Compatible with Jupyter Notebook and Python 3.12+
Output
location_list.csv: Extracted location names and frequenciescountry_stat.csv: Country-level sample count distribution
Requirements
Install dependencies with:
bash
pip install -r requirements.txt
Or create a dedicated Conda environment:
bash
conda create -n rbca-env python=3.12
conda activate rbca-env
pip install -r requirements.txt
🚀 How to Run
- Launch Jupyter:
bash
jupyter notebook
Open the file:
Fasta RBCA R1.ipynbFollow the notebook cells step-by-step.
📂 Note: Make sure your FASTA file is placed in the same directory as the notebook. The script automatically detects the latest .fasta file for processing.
Citation
He, Jie-Long. (2025). Fasta RBCA R1: Rule-Based Country Assignment (RBCA) for Avian Influenza FASTA files (v1.0.2). Zenodo. https://doi.org/10.5281/zenodo.15342773
License
MIT License
Owner
- Login: Bambusaoldhamii
- Kind: user
- Repositories: 1
- Profile: https://github.com/Bambusaoldhamii
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this notebook, please cite it as below:"
title: "Fasta RBCA R1: Rule-Based Country Assignment (RBCA) for Avian Influenza FASTA files"
version: 1.0.2
authors:
- family-names: YourLastName
given-names: YourFirstName
affiliation: Asia University
date-released: 2025-05-05
doi: 10.5281/zenodo.15342773
url: https://github.com/yourusername/fasta-rbca-resolver
GitHub Events
Total
- Release event: 4
- Push event: 10
- Create event: 5
Last Year
- Release event: 4
- Push event: 10
- Create event: 5
Dependencies
- biopython ==1.85
- geopandas ==0.14.3
- matplotlib ==3.8.4
- pandas ==2.2.3
- plotly ==5.21.0
- tqdm ==4.67.1