mcaat

Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies

https://github.com/rnabioinfo/mcaat

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies

Basic Info
  • Host: GitHub
  • Owner: RNABioInfo
  • License: gpl-3.0
  • Language: C++
  • Default Branch: master
  • Size: 52.8 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 1
  • Releases: 2
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

readme.md

😸 MCAAT - Metagenomic CRISPR Array Analysis Tool

  • CRISPR-Cas is a bacterial immune system also famous for its use in genome editing. The diversity of known systems could be significantly increased by metagenomic data.
  • Here we present the Metagenomic CRISPR Array Analysis Tool MCAAT, a highly sensitive algorithm for finding CRISPR Arrays in un-assembled metagenomic data.
  • It takes advantage of the properties of CRISPR arrays that form multicycles in de Bruijn graphs.

- MCAAT's assembly-free graph-based strategy outperforms assembly-based workflows and other assembly-free methods on synthetic and real metagenomes.

🥳 NEWS

Docker container available under: https://hub.docker.com/r/feeka94/mcaat

Installation using docker

Docker Build

bash docker build -t mcaat .


Run the Tool Using Docker

Mount your working directory to access input/output files:

bash docker run --rm -v $(pwd):/data mcaat \ --input_files /data/reads_R1.fastq /data/reads_R2.fastq \ --output-folder /data/results


Final Image Size

The final image is based on debian:bookworm-slim and includes only:

  • The mcaat binary
  • Runtime libraries: libomp5, zlib1g

This keeps the image small and portable.


Clean Up

To remove the image:

bash docker rmi mcaat

Compiling the project

🔧 Build the Project

To allow ./install.sh make changes, we execute following command: bash chmod +x ./install.sh You can build the project and the working version will be saved in the build folder. bash ./install.sh It is also possible to install the library by simply putting the --install flag. bash ./install.sh --install To clean up you can use --clean flag.


Usage

bash ./mcaat --input-files <file1> [file2] [--ram <amount>] [--threads <num>] [--output-folder <path>] [--help]


🧾 Command-Line Arguments

✅ Required

| Argument | Description | |---------------------------|-----------------------------------------------------------------------------| | --input_files <file1> [file2] | One or two input FASTA/FASTQ files. If one file is provided, it is treated as single-end data. If two files are provided, they are treated as paired-end reads. |

⚙️ Optional

| Argument | Description | |---------------------------|-----------------------------------------------------------------------------| | --ram <amount> | Maximum RAM to use. Units: B, K, M, G.
Default: 95% of system RAM
Example: --ram 4G | | --threads <num> | Number of threads to use.
Default: total CPU cores minus 2 | | --output-folder <path> | Output directory for results.
If not provided, a timestamped folder will be created automatically. If provided, the folder is used exactly as given. | | --help, -h | Show usage information and exit |


📁 Output Structure

The tool creates the following directory structure inside the specified output folder:

<output-folder>/ ├── CRISPR_Arrays.txt # Raw CRISPR array output


🧪 Example Usage

| Scenario | Command | |-----------------------------|-------------------------------------------------------------------------| | Paired-end input with custom output | ./mcaat --input_files reads_R1.fastq reads_R2.fastq --ram 8G --threads 12 --output-folder results/my_run | | Single-end input with default output | ./mcaat --input_files reads.fastq
Creates a folder like mcaat_run_2025-07-07_15-30-00/ |


Notes

  • Input files must exist and be accessible.
  • If RAM is set below 1 GB or above system capacity, the program will exit with an error.
  • If only one input file is provided, the tool assumes single-end data.

Requirements

  • C++17 compiler
  • RapidFuzz (for fuzzy string matching)
  • Filesystem support (<filesystem>)

Support

If you encounter issues or have questions, feel free to open an issue.

Owner

  • Name: RNABioInfo
  • Login: RNABioInfo
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "MCAAT"
authors:
  - family-names: "Talibli"
    given-names: "Fikrat"
  - family-names: "Voß"
    given-names: "Björn"
version: "0.1.0"
date-released: "2025-02-20"
url: "https://github.com/RNABioInfo/mcaat.git"

GitHub Events

Total
  • Release event: 2
  • Issue comment event: 1
  • Push event: 11
  • Public event: 1
  • Fork event: 1
  • Create event: 1
Last Year
  • Release event: 2
  • Issue comment event: 1
  • Push event: 11
  • Public event: 1
  • Fork event: 1
  • Create event: 1

Dependencies

Dockerfile docker
  • ubuntu latest build