Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.1%) to scientific vocabulary
Keywords
Repository
NEFF Calculator and MSA File Converter
Basic Info
- Host: GitHub
- Owner: Maryam-Haghani
- License: gpl-3.0
- Language: C++
- Default Branch: main
- Homepage: https://maryam-haghani.github.io/NEFFy/
- Size: 3.67 MB
Statistics
- Stars: 10
- Watchers: 2
- Forks: 1
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
NEFFy: NEFF Calculator and MSA File Converter
NEFFy is a versatile and efficient tool for bioinformatics research, offering advanced features for calculating NEFF (Normalized Effective Number of Sequences) for Multiple Sequence Alignments (MSA)s of any biological sequences, including protein, RNA, and DNA across various MSA formats.
Additionally, NEFFy includes built-in support for format conversion, allowing users to seamlessly convert between different MSA formats.
Table of Contents
C++ Executable
Installation
To install the NEFFy tool, clone the repository and compile the code using a C++ compiler that supports C++17 or a newer version. You can use the provided Makefile in the repository for this purpose. Navigate to the repository directory and enter the following command in the terminal:
make
If the make command is not available on your operating system, here is how you can install it.
Once the compilation is complete, you can run the program via the command line.
This package is cross-platform and works on Linux, Windows, and macOS without requiring additional compilation.
For more information on installing the executable, please refer to the documentation.
Project Outline
The NEFFy repository is structured as follows:

Usage
1. NEFF Computation
NEFF determines the effective number of homologous sequences within a Multiple Sequence Alignment (MSA). It accounts for sequence similarities and provides a measure of sequence diversity.
To calculate NEFF, use the neff script by providing one or more MSA files and specifying the appropriate flags for NEFF computation. If multiple files are provided, NEFFy will combine them and compute NEFF for the integrated version.
Flags:
The code accepts the following command-line flags:
| Flag | Description | Required | Default Value | Example |
|------|-------------|----------|---------------|---------|
| --file=<list of filenames> | Input files (comma-separated, no spaces) containing multiple sequence alignments | Yes | N/A | --file=example.fasta |
| --format=<list of file formats> | Input file formats (comma-separated, no spaces) | No | "" | --format=fasta |
| --alphabet=<value> | Alphabet of MSA
0: Protein
1: RNA
2: DNA | No | 0 | --alphabet=1 |
| --check_validation=[true/false] | Validate the input MSA file based on alphabet or not | No | false | --check_validation=true |
| --threshold=<value> | Threshold value of considering two sequences similar (between 0 and 1) | No | 0.8 | --threshold=0.7 |
| --norm=<value> | Normalization option for NEFF
0: Normalize by the square root of sequence length
1: Normalize by the sequence length
2: No Normalization | No | 0 | --norm=2 |
| --omit_query_gaps=[true/false] | Omit gap positions of query sequence from entire sequences for NEFF computation | No | true | --omit_query_gaps=true |
| --is_symmetric =true/false] | Consider gaps in number of differences when computing sequence similarity cutoff (asymmetric) or not (symmetric)| No | true | --is_symmetric=false |
| --non_standard_option=<value> | Options for handling non-standard letters of the specified alphabet
0: Treat them the same as standard letters
1: Consider them as gaps when computing similarity cutoff of sequences (only used in asymmetryc version)
2: Consider them as gaps in computing similarity cutoff and checking position of match/mismatch | No | 0 | --non_standard_option=1 |
| --depth=<value> | Depth of MSA to be considered in computation (starting from the first sequence) | No | inf (consider all sequences) | --depth=10
(if given value is greater than original depth, it considers the original depth) |
| --gap_cutoff=<value>| Threshold for considering a position as gappy and removing that (between 0 and 1) | No | 1 (no gappy position) | --gap_cutoff=0.7 |
| --pos_start=<value>| Start position of each sequence to be considered in NEFF (inclusive) | No | 1 (the first position) | --pos_start=10 |
| --pos_end=<value>| Last position of each sequence to be considered in NEFF (inclusive) | No | inf (consider all sequence) | --pos_end=50 (if given value is greater than the length of the MSA sequences, consider length of sequences in the MSA)|
| --only_weights=[true/false] | Return only sequence weights, rather than the final NEFF | No | false | --only_weights=true |
| --multimer_MSA=[true/false] | Compute NEFF for MSA of a multimer | No | false | --multimer_MSA=true |
| --stoichiom=<value> | Stochiometry of the multimer | when multimerMSA=true | | --stoichiom=A2B1 |
| `--chainlength=| Length of the chains in a heteromer | when _multimer_MSA_=true and multimer is a heteromer | 0 |--chainlength=17 45|
|--residueneff=[true/false]| Compute per-residue (column-wise) NEFF | No | false |--residueneff=true|
|--skiplines=| Number of lines to skip at the beginning of the input file. | No | 0 |--skip_lines=1` |
For more details about features, please refer to the documentation.
Example:
neff --file=./MSAs/example.a2m --threshold=0.6 --norm=2 --is_symmetric=false --check_validation=true
As output, it will print the final MSA length, depth and Neff to the console, based on the given options.
For more examples on using NEFFy for NEFF calculations with various options and features, please refer to the documentation usage guide.
2. MSA File Conversion
The MSA file conversion allows you to convert MSA files between different supported formats.
All you need is to use the converter program and specify the input and output files with their formats, and the tool will perform the conversion accordingly.
Flags:
The code accepts the following command-line flags:
| Flag | Description | Required | Default Value | Example |
|------|-------------|----------|---------------|---------|
| --in_file=<filename> | Specifies the input MSA file to be converted.
Replace <filename> with the path and name of the input file | Yes | N/A | --in_file=input.fasta |
| --in_format=<format> | Specifies the input MSA file format. | No | "" | --in_format=fasta |
| --out_file=<filename>| Specifies the output file where the converted MSA will be saved.
Replace <filename> with the desired path and name of the output file | Yes | N/A | --out_file=output.a2m |
| --out_format=<format> | Specifies the output MSA file format. | No | "" | --out_format=a2m |
| --alphabet=<value> | Alphabet of MSA
0: Protein
1: RNA
2: DNA | No | 0 | --alphabet=1 |
| --check_validation=[true/false] | Validate the input MSA file based on alphabet or not | No | true | --check_validation=true |
Please note that the conversion is performed based on the specified input and output file extensions.
For more details about features, please refer to the documentation.
Example:
Suppose you have an MSA file named "input.fasta" and you want to convert to the A2M format and save it as "output.a2m".
converter --in_file=./MSAs/example.a2m --out_file=./MSAs/example.sto
For more examples on using NEFFy for MSA conversion, please refer to the documentation usage guide.
Python Library
Neffy also provides a python library as an interface of the executable files.
Library Installation
From Source
To install the library from the source:
- Clone the repository:
bash git clone https://github.com/Maryam-Haghani/Neffy.git - Navigate to the project directory:
bash cd Neffy - Ensure you have
setuptoolsandwheelinstalled:bash pip install setuptools wheel - Build the source distribution and wheel:
bash python setup.py sdist bdist_wheel Install the package from the root directory of the project:
bash pip install .Alternatively, you can install the package directly from the built wheel file (in the
distdirectory):bash pip install dist/neffy-0.1-py3-none-any.whl
From PyPI:
The package is available on PyPI. Install the package via pip:
bash
pip install neffy
From BioConda:
You can also install neffy through BioConda by running the following commands:
bash
conda config --add channels conda-forge
conda install -c bioconda neffy
The first command adds the conda-forge channel to your Conda configuration, which is necessary to access a broader range of packages and dependencies that might not be available on the default channels.
Library Usage
An example of neff computation:
bash
cd example
python compute_neff.py
You can find more examples of using the Python library's various methods for NEFF calculations in the examples directory. For method parameters and detailed explanations, please refer to the documentation usage guide.
An example of MSA conversion:
bash
cd example
python convert_msa.py
Additional examples of using NEFFy for MSA conversion can be found in the example directory. For further detailed explanations, please refer to the documentation usage guide.
Supported File Formats
- A2M (aligned FASTA-like format)
- A3M (compressed aligned FASTA-like format with lowercase letters for insertions)
- FASTA, AFA, FAS, FST, FSA (FASTA format)
- STO (Stockholm format)
- CLUSTAL (CLUSTAL format)
- ALN (ALN format)
- PFAM (format mostly used for nucleotides)
In the documentation, you will find a brief explanation of each format, along with an illustrative alignment example for each one.
Error Handling
If any errors occur during the execution of the MSA Processor, an error message will be displayed, describing the issue encountered.
Please refer to the error message for troubleshooting or make necessary corrections to the input.
License
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
Citation
If you use NEFFy in your research, please cite the following:
NEFFy: A Versatile Tool for Computing the Number of Effective Sequences. Haghani, M., Bhattacharya, D., Murali, T. M. (2024). bioRxiv. https://www.biorxiv.org/content/10.1101/2024.12.01.625733v1
Archival DOI
For long-term accessibility and reproducibility, each release of NEFFy is archived with a DOI: | Version | DOI | |----------|-------------------------------------------|
| Latest (v0.1.1) | 🔗 Zenodo DOI: 10.5281/zenodo.14908220 |
For further assistance, please see the documentation.
Citation (CITATION.cff)
cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Haghani"
given-names: "Maryam"
- family-names: "Bhattacharya"
given-names: "Debswapna"
- family-names: "Murali"
given-names: "T M"
title: "NEFFy: A Versatile Tool for Computing the Number of Effective Sequences"
version: 1.0.0
date-released: 2024-12
url: "https://github.com/Maryam-Haghani/Neffy"
doi: "10.1093/bioinformatics/btaf222"
GitHub Events
Total
- Create event: 7
- Release event: 1
- Issues event: 3
- Watch event: 11
- Delete event: 8
- Issue comment event: 3
- Push event: 27
- Pull request event: 19
- Fork event: 1
Last Year
- Create event: 7
- Release event: 1
- Issues event: 3
- Watch event: 11
- Delete event: 8
- Issue comment event: 3
- Push event: 27
- Pull request event: 19
- Fork event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 8
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 8
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 7
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- apcamargo (1)
- oskasf (1)
Pull Request Authors
- Maryam-Haghani (8)
- oskasf (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 129 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
pypi.org: neffy
A Python interface of NEFFy C++ tool: NEFF Calculator and MSA File Converter
- Homepage: https://github.com/Maryam-Haghani/NEFFy
- Documentation: https://neffy.readthedocs.io/
- License: GNU General Public License v3 (GPLv3)
-
Latest release: 0.1.1
published about 1 year ago