kalamari
:octopus: A curated database of completed assemblies with taxonomy IDs
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Repository
:octopus: A curated database of completed assemblies with taxonomy IDs
Basic Info
Statistics
- Stars: 46
- Watchers: 5
- Forks: 4
- Open Issues: 8
- Releases: 17
Metadata Files
README.md
Kalamari
Synopsis
Kalamari is a database of completed and public assemblies, backed by trusted institutions. These assemblies can be further used in formatted databases such as Kraken or Blast.
Prerequisites & Recommendations
Requirements:
- clone this repo locally
git clone https://github.com/lskatz/Kalamari.git - NCBI entrez-utilities set of tools
edirect,esearch, etc.- install via your package manager
- debian/ubuntu:
apt install ncbi-entrez-direct
Optional, but recommended:
NCBI_API_KEYenvironmental variableEMAILenvironmental variable
Ensure that you have the NCBI API key. This key associates your edirect requests with your username. Without it, edirect requests might be buggy. After obtaining an NCBI API key, add it to your environment with
export NCBI_API_KEY=unique_api_key_goes_here
where unique_api_key_goes_here is a unique hexadecimal number with characters from 0-9 and a-f.
You should also set your email address in the
EMAIL environment variable as edirect tries to guess it, which is an error prone process.
Add this variable to your environment with
export EMAIL=my@email.address
using your own email address instead of my@email.address.
Download instructions
First, build the taxonomy.
The script buildTaxonomy.sh uses the diffs in Kalamari to enhance the default NCBI taxonomy.
Next, filterTaxonomy.sh reduces the taxonomy files to just those found in Kalamari.
filterTaxonomy.sh uses taxonkit and so this needs to be in your
environment before starting.
bash bin/buildTaxonomy.sh
bash bin/filterTaxonomy.sh
To download the chromosomes and plasmids, use the .tsv files, respectively, with downloadKalamari.pl.
Run downloadKalamari.pl --help for usage.
However, to download the files to a standard location,
please simply use downloadKalamari.sh which uses
downloadKalamari.pl internally.
bash bin/downloadKalamari.pl
Database formatting instructions
How to format and query databases
Further description
Kalamari is a database of completed and public assemblies, backed by trusted institutions. Completed assemblies means that you do not have to worry about the database itself being contaminated with "rogue" contigs. Additionally, most assemblies were obtained by subject matter experts (SMEs) at Centers for Disease Control and Prevention (CDC). Those not from CDC come from other trusted institutions or projects such as FDA-ARGOS. Most genomes are from species that are either studied or are common contaminants in the Enteric Diseases Laboratory Branch (EDLB) at CDC.
Kalamari also comes with a custom taxonomy database such as defining Shigella as a subspecies of Escherichia coli or defining the four lineages of Listeria monocytogenes. These changes have been backed by trusted SMEs in EDLB.
Contributing
Please see CONTRIBUTING.md
Citation
Katz LS, Griswold T, Lindsey RL, Lauer AC, Im MS, Williams G, Halpin JL, Gómez GA, Kucerova Z, Morrison S, Page A, Den Bakker HC, Carleton HA. 2025. "Kalamari: a representative set of genomes of public health concern." Microbiol Resour Announc 14:e00963-24. https://doi.org/10.1128/mra.00963-24
Owner
- Name: Lee Katz
- Login: lskatz
- Kind: user
- Location: Atlanta, GA
- Company: CDC (work) + personal projects
- Website: https://lskatz.github.io
- Twitter: lskatz
- Repositories: 138
- Profile: https://github.com/lskatz
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Katz" given-names: "Lee S." orcid: "https://orcid.org/0000-0002-2533-9161" - family-names: "Griswold" given-names: "Taylor" - family-names: "Lindsey" given-names: "Rebecca" - family-names: "Lauer" given-names: "Ana" - family-names: "Im" given-names: "Monica S." - family-names: "Williams" given-names: "Grant" - family-names: "Halpin" given-names: "Jessica L." - family-names: "Gómez" given-names: "Gerardo A." - family-names: "Roache" given-names: "Katie" - family-names: "Kucerova" given-names: "Zuzana" - family-names: "Tarr" given-names: "Cheryl L." - family-names: "Page" given-names: "Andrew" - family-names: "Henk" given-names: "den Bakker" - family-names: "Carleton" given-names: "Heather" title: "Kraken with Kalamari: Contamination Detection" version: 5.0 date-released: 2021-07-28 url: "https://github.com/lskatz/Kalamari"
GitHub Events
Total
- Create event: 3
- Release event: 2
- Issues event: 2
- Watch event: 8
- Delete event: 1
- Issue comment event: 1
- Push event: 22
- Pull request event: 4
- Fork event: 1
Last Year
- Create event: 3
- Release event: 2
- Issues event: 2
- Watch event: 8
- Delete event: 1
- Issue comment event: 1
- Push event: 22
- Pull request event: 4
- Fork event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 30
- Total pull requests: 27
- Average time to close issues: over 1 year
- Average time to close pull requests: 3 days
- Total issue authors: 7
- Total pull request authors: 3
- Average comments per issue: 2.03
- Average comments per pull request: 0.07
- Merged pull requests: 25
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 5
- Pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: 6 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lskatz (24)
- hcdenbakker (1)
- tseemann (1)
- jessicarowell (1)
- miguelpmachado (1)
- kapsakcj (1)
- CreatorOfMoon (1)
Pull Request Authors
- lskatz (35)
- kapsakcj (1)
- SVN-PhD (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- shogo82148/actions-setup-perl v1 composite
- actions/checkout v2 composite
- shogo82148/actions-setup-perl v1 composite
- actions/checkout v2 composite
- shogo82148/actions-setup-perl v1 composite
- actions/checkout v2 composite
- shogo82148/actions-setup-perl v1 composite