mcaat
Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Repository
Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies
Basic Info
- Host: GitHub
- Owner: RNABioInfo
- License: gpl-3.0
- Language: C++
- Default Branch: master
- Size: 52.8 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 2
Metadata Files
readme.md
😸 MCAAT - Metagenomic CRISPR Array Analysis Tool
- CRISPR-Cas is a bacterial immune system also famous for its use in genome editing. The diversity of known systems could be significantly increased by metagenomic data.
- Here we present the Metagenomic CRISPR Array Analysis Tool MCAAT, a highly sensitive algorithm for finding CRISPR Arrays in un-assembled metagenomic data.
- It takes advantage of the properties of CRISPR arrays that form multicycles in de Bruijn graphs.
- MCAAT's assembly-free graph-based strategy outperforms assembly-based workflows and other assembly-free methods on synthetic and real metagenomes.
🥳 NEWS
Docker container available under: https://hub.docker.com/r/feeka94/mcaat
Installation using docker
Docker Build
bash
docker build -t mcaat .
Run the Tool Using Docker
Mount your working directory to access input/output files:
bash
docker run --rm -v $(pwd):/data mcaat \
--input_files /data/reads_R1.fastq /data/reads_R2.fastq \
--output-folder /data/results
Final Image Size
The final image is based on debian:bookworm-slim and includes only:
- The
mcaatbinary - Runtime libraries:
libomp5,zlib1g
This keeps the image small and portable.
Clean Up
To remove the image:
bash
docker rmi mcaat
Compiling the project
🔧 Build the Project
To allow ./install.sh make changes, we execute following command:
bash
chmod +x ./install.sh
You can build the project and the working version will be saved in the build folder.
bash
./install.sh
It is also possible to install the library by simply putting the --install flag.
bash
./install.sh --install
To clean up you can use --clean flag.
Usage
bash
./mcaat --input-files <file1> [file2] [--ram <amount>] [--threads <num>] [--output-folder <path>] [--help]
🧾 Command-Line Arguments
✅ Required
| Argument | Description |
|---------------------------|-----------------------------------------------------------------------------|
| --input_files <file1> [file2] | One or two input FASTA/FASTQ files. If one file is provided, it is treated as single-end data. If two files are provided, they are treated as paired-end reads. |
⚙️ Optional
| Argument | Description |
|---------------------------|-----------------------------------------------------------------------------|
| --ram <amount> | Maximum RAM to use. Units: B, K, M, G.
Default: 95% of system RAM
Example: --ram 4G |
| --threads <num> | Number of threads to use.
Default: total CPU cores minus 2 |
| --output-folder <path> | Output directory for results.
If not provided, a timestamped folder will be created automatically. If provided, the folder is used exactly as given. |
| --help, -h | Show usage information and exit |
📁 Output Structure
The tool creates the following directory structure inside the specified output folder:
<output-folder>/
├── CRISPR_Arrays.txt # Raw CRISPR array output
🧪 Example Usage
| Scenario | Command |
|-----------------------------|-------------------------------------------------------------------------|
| Paired-end input with custom output | ./mcaat --input_files reads_R1.fastq reads_R2.fastq --ram 8G --threads 12 --output-folder results/my_run |
| Single-end input with default output | ./mcaat --input_files reads.fastq
Creates a folder like mcaat_run_2025-07-07_15-30-00/ |
Notes
- Input files must exist and be accessible.
- If RAM is set below 1 GB or above system capacity, the program will exit with an error.
- If only one input file is provided, the tool assumes single-end data.
Requirements
- C++17 compiler
- RapidFuzz (for fuzzy string matching)
- Filesystem support (
<filesystem>)
Support
If you encounter issues or have questions, feel free to open an issue.
Owner
- Name: RNABioInfo
- Login: RNABioInfo
- Kind: organization
- Repositories: 1
- Profile: https://github.com/RNABioInfo
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "MCAAT"
authors:
- family-names: "Talibli"
given-names: "Fikrat"
- family-names: "Voß"
given-names: "Björn"
version: "0.1.0"
date-released: "2025-02-20"
url: "https://github.com/RNABioInfo/mcaat.git"
GitHub Events
Total
- Release event: 2
- Issue comment event: 1
- Push event: 11
- Public event: 1
- Fork event: 1
- Create event: 1
Last Year
- Release event: 2
- Issue comment event: 1
- Push event: 11
- Public event: 1
- Fork event: 1
- Create event: 1
Dependencies
- ubuntu latest build