flupipe

Influenza genome reconstruction

https://github.com/rki-mf1/flupipe

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Influenza genome reconstruction

Basic Info
  • Host: GitHub
  • Owner: rki-mf1
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 98.6 KB
Statistics
  • Stars: 5
  • Watchers: 3
  • Forks: 1
  • Open Issues: 1
  • Releases: 3
Created about 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

FluPipe

Twitter Follow DOI

1. Introduction

FluPipe provides a fully automated, flexible and reproducible workflow for reconstructing genome sequences from Illumina NGS data. The pipeline is optimized for Influenza data.

2. Setup

The most convenient way to install the pipeline is by using git and conda:

```bash

installing the pipeline using git

cd designated/path git clone https://github.com/rki-mf1/FluPipe.git/ cd FluPipe conda env create -f flupipe.yml -n FluPipe conda activate FluPipe ```

3. Usage

As a minimum the pipeline needs the following input: - folder containing gz-compressed FASTQ files (-d) - output folder (-o), in which a subfolder named results is automatically created to store all results - a reference sequence (--ref) or an influenza segment database, containing representative fasta files for each genome segment (--segmentdb)

```bash

activate conda environment once before using the pipeline

conda activate FluPipe

flupipe.py -d path/to/myInputFolder \ -o path/to/myOutputFolder \ --segmentdb path/to/segmentdb ```

4. Options to customize the workflow

The manual page provides information on all options available.

bash flupipe.py --help

4.1 Adjusting Read Filtering

4.1.1 Read Length

Per default the minimum read length filter is set to 50.

(-l 50)

4.1.2 Read Quality

Qualitative read quality used by fastp to filter reads. By default the "--readfilterqual" option uses a phredscore of 20 as a cutoff.

4.2 Taxonomic Read Filtering

If necessary, reads not derived from the Orthomyxoviridae family can be excluded. Read classification is based on corresponding k-mer frequencies using a defined kraken2 database (--kraken). A database containing Influenza A , Influenza B and human genome sequences is recommended.

4.3 Find a reference for each segment

For each segment, a multifasta file with any number of reference sequences can be provided. The pipeline compares the sequencing reads to the given references and determines the optimal reference sequence per segment for the given data based on read coverage, read depth, and uniformity of mapping.

4.4 Adapting variant calling

Sites considered for variant calling can be restricted based on the following parameters at the respective position.

  • the minimum sequencing depth (--vvar_mincov; default: 20)
  • the minimum number of reads supporting a variant (--var_call_count; default: 10)
  • the relative number of reads supporting a variant (--var_call_frac; default: 0.1)

CHECK PARAMETERS

4.5 Consensus generation

When generating the consensus sequence, all positions whose read coverage is below a defined threshold can be hard-masked by N (--cns_min_cov; default: 20). In addtion, genotypes can be adjusted meaning that variants supported by a given fraction of all reads covering the respective site are called explicitely (--cns_gt_adjust; default: 0.9). This means that a variant that shows a read fraction of 0.94 would be set to full alternate allele and variants showing only 0.03 readfraction are changed to reference.

Owner

  • Name: RKI MF1 Bioinformatics
  • Login: rki-mf1
  • Kind: organization
  • Location: Germany

Bioinformatics code of MF1

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: FluPipe
message: >-
  If you use this software, please cite it using these
  metadata.
type: software
authors:
  - name: >-
      Viroinformatics, Genome Competence Centre (MF1),
      Robert Koch Institute
    address: Nordufer 20
    city: Berlin
    country: DE
    post-code: '13353'
    website: 'https://www.rki.de/EN/Home/homepage_node.html'
  - given-names: Katja
    family-names: Winter
    affiliation: Robert Koch Institute (RKI)
  - given-names: Stephan
    family-names: Fuchs
    affiliation: Robert Koch Institute (RKI)
  - given-names: Namuun   
    family-names: Battur
    affiliation: Robert Koch Institute (RKI)
  - given-names: Marie
    family-names: Lataretu
    affiliation: Robert Koch Institute (RKI)
    orcid: 'https://orcid.org/0000-0002-3637-5870'
  - given-names: Dimitri
    family-names: Ternovoj
    affiliation: Robert Koch Institute (RKI)
identifiers:
  - type: doi
    value: 10.5281/zenodo.13684139
    description: All versions of FluPipe
repository-code: 'https://github.com/rki-mf1/FluPipe'
keywords:
  - Influenza
  - genome reconstruction

GitHub Events

Total
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 2
  • Push event: 5
  • Pull request event: 2
  • Fork event: 1
  • Create event: 1
Last Year
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 2
  • Push event: 5
  • Pull request event: 2
  • Fork event: 1
  • Create event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 8 days
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 2.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 8 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 2.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • DimitriTernovoj (1)
  • MarieLataretu (1)
Top Labels
Issue Labels
Pull Request Labels