sequenceforge-lite

🧬Simple tool to work with fasta, fastq files and bio seqs

https://github.com/iliapopov17/sequenceforge-lite

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • ✓
    CITATION.cff file
    Found CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • â—‹
    Academic email domains
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary

Keywords

amino-acid-sequences blast dna-sequences fasta fastq randomforestclassifier rna-sequences sequence tool
Last synced: 6 months ago · JSON representation ·

Repository

🧬Simple tool to work with fasta, fastq files and bio seqs

Basic Info
  • Host: GitHub
  • Owner: iliapopov17
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 7.25 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
amino-acid-sequences blast dna-sequences fasta fastq randomforestclassifier rna-sequences sequence tool
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

SequenceForge-Lite

Python3 OS License

SequenceForge-Lite is a lightweight tool designed to work with biological sequence data, providing various functionalities for filtering FASTQ files and manipulating FASTA files. Additionally, it offers utilities for parsing BLAST output files.

Table of contents

Features

FASTQ Filtering

  • Filter FASTQ files based on GC content, sequence length, and quality score.
  • Specify custom ranges for GC content and sequence length.
  • Set a minimum quality score threshold for sequences. ### FASTA File Manipulation
  • Get quick info on each sequence in FASTA file.
  • Convert multiline FASTA files to one-line format.
  • Shift the start position of one-line FASTA sequences by a specified amount. ### BLAST Output Parsing
  • Extract the top hit for each query from BLAST output files.
  • Results are sorted alphabetically for easy analysis. ### DNA, RNA & amino acid classes
  • Calculates GC content in DNA and RNA sequences
  • Prints complement sequence for DNA
  • Transcribes DNA sequence to RNA
  • Prints RNA sequence in codons
  • Finds motifs in nucleic acids sequences
  • Translates RNA sequence to amino acid (without biological meaning, it does it "dumbly")
  • Calculates molecular weight of amino acid sequence ### Custom RandomForestClassifier
  • Self written implementation of RandomForestClassifier
  • Has parallelisation functionality (speeds up 2 times when specifying 2 threads)

Installation

bash git clone https://github.com/iliapopov17/SequenceForge-Lite.git && cd SequenceForge-Lite

bash pip install -r requirements.txt

Usage Guide

  • Demonstrational python notebook is available in demo.ipynb file
  • Demonstrational data is available in demo_data folder

🔗 Visit SequenceForge-Lite wiki page

Troubleshooting

Common Issues and Solutions: 1. File Not Found Error: - Issue: The script raises a FileNotFoundError when trying to access the input file. - Solution: Verify that the input file path provided to the function is correct and the file exists in the specified location. 2. Incorrect File Format: - Issue: The function fails to process the file due to incorrect formatting. - Solution: Ensure that the input files are properly formatted according to the specifications mentioned in the function or class descriptions.

Contributing

Contributions are welcome! If you have any ideas, bug fixes, or enhancements, feel free to open an issue or submit a pull request.

Contact

For any inquiries or support, feel free to contact me via email

Happy sequencing! 🧬🔬

Owner

  • Name: Ilia Popov
  • Login: iliapopov17
  • Kind: user
  • Location: Russia

Citation (CITATION.cff)

cff-version: 1.8.0
message: If you use this software, please cite it as below.
authors:
  - family-names: Popov
    given-names: Ilia
title: "SequenceForge Lite"
date-released: 2024-04-28
url: https://github.com/iliapopov17/SequenceForge-Lite

GitHub Events

Total
Last Year