mcaat

Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies

https://github.com/rnabioinfo/mcaat

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies

Basic Info

Host: GitHub
Owner: RNABioInfo
License: gpl-3.0
Language: C++
Default Branch: master
Size: 52.8 MB

Statistics

Stars: 1
Watchers: 2
Forks: 1
Open Issues: 1
Releases: 2

Created over 2 years ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

😸 MCAAT - Metagenomic CRISPR Array Analysis Tool

CRISPR-Cas is a bacterial immune system also famous for its use in genome editing. The diversity of known systems could be significantly increased by metagenomic data.
Here we present the Metagenomic CRISPR Array Analysis Tool MCAAT, a highly sensitive algorithm for finding CRISPR Arrays in un-assembled metagenomic data.
It takes advantage of the properties of CRISPR arrays that form multicycles in de Bruijn graphs.

- MCAAT's assembly-free graph-based strategy outperforms assembly-based workflows and other assembly-free methods on synthetic and real metagenomes.

🥳 NEWS

Docker container available under: https://hub.docker.com/r/feeka94/mcaat

Installation using docker

Docker Build

bash docker build -t mcaat .

Run the Tool Using Docker

Mount your working directory to access input/output files:

bash docker run --rm -v $(pwd):/data mcaat \ --input_files /data/reads_R1.fastq /data/reads_R2.fastq \ --output-folder /data/results

Final Image Size

The final image is based on debian:bookworm-slim and includes only:

The mcaat binary
Runtime libraries: libomp5, zlib1g

This keeps the image small and portable.

Clean Up

To remove the image:

bash docker rmi mcaat

Compiling the project

🔧 Build the Project

To allow ./install.sh make changes, we execute following command: bash chmod +x ./install.sh You can build the project and the working version will be saved in the build folder. bash ./install.sh It is also possible to install the library by simply putting the --install flag. bash ./install.sh --install To clean up you can use --clean flag.

Usage

bash ./mcaat --input-files <file1> [file2] [--ram <amount>] [--threads <num>] [--output-folder <path>] [--help]

🧾 Command-Line Arguments

✅ Required

| Argument | Description | |---------------------------|-----------------------------------------------------------------------------| | --input_files <file1> [file2] | One or two input FASTA/FASTQ files. If one file is provided, it is treated as single-end data. If two files are provided, they are treated as paired-end reads. |

⚙️ Optional

| Argument | Description | |---------------------------|-----------------------------------------------------------------------------| | --ram <amount> | Maximum RAM to use. Units: B, K, M, G.
Default: 95% of system RAM
Example: --ram 4G | | --threads <num> | Number of threads to use.
Default: total CPU cores minus 2 | | --output-folder <path> | Output directory for results.
If not provided, a timestamped folder will be created automatically. If provided, the folder is used exactly as given. | | --help, -h | Show usage information and exit |

📁 Output Structure

The tool creates the following directory structure inside the specified output folder:

<output-folder>/ ├── CRISPR_Arrays.txt # Raw CRISPR array output

🧪 Example Usage

| Scenario | Command | |-----------------------------|-------------------------------------------------------------------------| | Paired-end input with custom output | ./mcaat --input_files reads_R1.fastq reads_R2.fastq --ram 8G --threads 12 --output-folder results/my_run | | Single-end input with default output | ./mcaat --input_files reads.fastq
Creates a folder like mcaat_run_2025-07-07_15-30-00/ |

Notes

Input files must exist and be accessible.
If RAM is set below 1 GB or above system capacity, the program will exit with an error.
If only one input file is provided, the tool assumes single-end data.

Requirements

C++17 compiler
RapidFuzz (for fuzzy string matching)
Filesystem support (<filesystem>)

Support

If you encounter issues or have questions, feel free to open an issue.

Owner

Name: RNABioInfo
Login: RNABioInfo
Kind: organization

Repositories: 1
Profile: https://github.com/RNABioInfo

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "MCAAT"
authors:
  - family-names: "Talibli"
    given-names: "Fikrat"
  - family-names: "Voß"
    given-names: "Björn"
version: "0.1.0"
date-released: "2025-02-20"
url: "https://github.com/RNABioInfo/mcaat.git"

GitHub Events

Total

Release event: 2
Issue comment event: 1
Push event: 11
Public event: 1
Fork event: 1
Create event: 1

Last Year

Release event: 2
Issue comment event: 1
Push event: 11
Public event: 1
Fork event: 1
Create event: 1

Dependencies

Dockerfile docker

ubuntu latest build

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science