any2tsv

Convert various bioinformatic outputs to TSV

https://github.com/rpetit3/any2tsv

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.2%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

Convert various bioinformatic outputs to TSV

Basic Info
  • Host: GitHub
  • Owner: rpetit3
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 32.2 KB
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 4 years ago · Last pushed almost 4 years ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

Note: This repo is under active development. So it'll be changing a lot

any2tsv

Convert various bioinformatic outputs to TSV

Motivation

Well you see I have this pipeline called Bactopia for the analysis of bacterial genomes, and it produces a lot of output files. I started making parsers for these outputs, but I didn't want them to be hidden in Bactopia. Instead, I wanted create a simple tool (e.g. Torsten Seemann's any2fasta) the community could use.

Although, please keep in mind, unless there are outside contributions, the available parsers will be reflective of tools I use in Bactopia. I frankly don't have the bandwidth to expand further. But, please don't worry, if you would like to add a parser for a tool that you use, by all means lets get it added!

Installation

I'm too early in the game for this, but you can expect it to be available from pip and Bioconda in due time.

Usage

```{bash} any2tsv --help

Usage: any2tsv [OPTIONS]

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────╮ │ --version Show the version and exit. │ │ --list_tools List tools with an available parser. │ │ --help -h Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

```

Example Usage

fastq-scan

Let's start with fastq-scan, which is a simple tool to output FASTQ summary statistics in JSON format. Because its already in JSON format, this is an easy conversion to TSV.

Example fastq-scan Output

```{bash} cat fastq-scan.json { "qcstats": { "totalbp":7500, "coverage":0.05, "readtotal":75, "readmin":100, "readmean":100, "readstd":0, "readmedian":100, "readmax":100, "read25th":100, "read75th":100, "qualmean":34.0267, "qualstd":0.711306, "qualmedian":34, "qual25th":34, "qual75th":34 }, "readlengths": {

    "100":75
},
"per_base_quality": {
    "1":30.7467,        "2":31.5467,        "3":31.5467,        "4":35.44,        "5":34.24,
    "6":34.12,        "7":34.7067,        "8":34.24,        "9":36.9333,        "10":37.0667,
    "11":35.88,        "12":36.0667,        "13":36.72,        "14":38.2667,        "15":37.48,
    "16":38.2133,        "17":36.7467,        "18":37.8267,        "19":36.3333,        "20":37.2933,
    "21":37.9867,        "22":37.1067,        "23":37.4133,        "24":38.2667,        "25":36.6133,
    "26":36.2,        "27":36.3067,        "28":35.8533,        "29":36.5067,        "30":37.72,
    "31":37.3333,        "32":36.0133,        "33":37.4933,        "34":36.1067,        "35":36.76,
    "36":34.8533,        "37":36.3733,        "38":35.1867,        "39":36.0133,        "40":35.3067,
    "41":35.6,        "42":36.7867,        "43":35.52,        "44":37.3333,        "45":36.6533,
    "46":36.8,        "47":35.9867,        "48":35.4533,        "49":35.2,        "50":37.2533,
    "51":35.04,        "52":36,        "53":35.28,        "54":36.16,        "55":35.2,
    "56":33.6133,        "57":36.0533,        "58":34.4533,        "59":35.88,        "60":35.3733,
    "61":35.6933,        "62":34.8267,        "63":35.1067,        "64":35.2933,        "65":32.2667,
    "66":34.4267,        "67":33.9333,        "68":33.6667,        "69":32.6133,        "70":33.4267,
    "71":32.8267,        "72":32.96,        "73":33.5467,        "74":33.1067,        "75":31.8667,
    "76":30.72,        "77":30.6133,        "78":30.2133,        "79":31.7467,        "80":33.8933,
    "81":32.72,        "82":33.1733,        "83":31.5867,        "84":32.6933,        "85":32.0667,
    "86":32.2933,        "87":30.7467,        "88":30.6933,        "89":32.48,        "90":31.08,
    "91":31.6133,        "92":31.72,        "93":30.3867,        "94":30.7067,        "95":29.9733,
    "96":31.96,        "97":32.44,        "98":30.2267,        "99":31.2533,        "100":30.2267
}

} ```

Converting fastq-scan to TSV

{bash} any2tsv fastq-scan fastq-scan.json filename total_bp coverage read_total read_min read_mean read_std read_median read_max read_25th read_75th qual_mean qual_std qual_median qual_25th qual_75th fastq-scan.json 7500 0.05 75 100 100 0 100 100 100 100 34.0267 0.711306 34 34 34

You might be wondering, Where'd the read lengths and per-base qualities go?. Well, honestly, I didn't think they were useful in TSV format, so out they went! But, if for some reason you think they would be useful, please let me know.

Naming

I think its pretty obvious, but the name any2tsv is inspired by Torsten Seemann's any2fasta. any2fasta converts many different formats to FASTA format. I wanted to do the same except TSV outputs. These TSV outputs can then be easily manipulated by the user.

Author

Owner

  • Name: Robert A. Petit III
  • Login: rpetit3
  • Kind: user
  • Location: Cheyenne, WY
  • Company: Wyoming Public Health Laboratory

Bioinformatician at the Wyoming Public Health Laboratory. Developer of the Bactopia and other microbial genomic tools.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use any2tsv, please cite it as below."
authors:
- family-names: "Petit III"
  given-names: "Robert A. "
  orcid: "https://orcid.org/0000-0002-1350-9426"
title: "any2tsv: Convert various bioinformatic outputs to TSV (2022)"
url: "https://github.com/rpetit3/any2tsv"
version: 0.0.1

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 13
  • Total Committers: 2
  • Avg Commits per committer: 6.5
  • Development Distribution Score (DDS): 0.077
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Robert A. Petit III r****t@g****m 12
Debian r****3@v****t 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

requirements.txt pypi
  • click *
  • jinja2 *
  • jsonschema >=3.0
  • markdown >=3.3
  • packaging *
  • pytest-workflow *
  • rich >=10.0.0
  • rich-click >=1.0.0