gtf-ops

Filtering GENCODE or ENSEMBLE annotation in GTF format. Annotating missing regions in iCount genomic segmentation.

https://github.com/ulelab/gtf-ops

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Filtering GENCODE or ENSEMBLE annotation in GTF format. Annotating missing regions in iCount genomic segmentation.

Basic Info
  • Host: GitHub
  • Owner: ulelab
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 49.2 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 4 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

gtf-ops

DOI

FiterGtf module: Filtering GENCODE or ENSEMBLE annotation in GTF format for tag "basic" (first step) and "transcriptsupportlevel" (second step).

🔴 Important note: Currently, these tags are only available for Human (Hs) and Mouse (Mm) annotations.

ResolveUnnanotated module: Annotates missing regions in iCount genomic segmentation as "genic_other".

Features

gtf-ops package contains 2 functions, which can be run as scripts, that complement iCount segmentation.

FilterGtf.py filters GENCODE or ENSEMBL genomic annotation in GTF format. It can be used prior to running iCount segment to improve genome-level segmentation by removing lower-confidence trancripts and favoring transcripts of full protein-coding genes over ncRNA, where they overlap.

ResolveUnannotated.py annotates genome segments that are not annotated by iCount segmentation. Missing annotations occur when a region overlaps with a gene in GTF annotation, but relevant transcripts were removed during filtering. Such region is not annotated as "intergenic", because it overlaps a gene, nor can it be assigned any other region (5'UTR, 3'UTR, CDS, intron or ncRNA), due to lack of relevant transcripts.

Scripts can be run via command-line interface, more details are given in respective README files.

Owner

  • Name: Ulelab
  • Login: ulelab
  • Kind: organization
  • Location: London

Citation (CITATION.cff)

cff-version: 0.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kuret"
  given-names: "Klara"
  orcid: "https://orcid.org/0000-0002-8445-8080"
title: "gtf-ops"
version: 0.0.0
doi: https://doi.org/10.5281/zenodo.8386577
date-released: 2022-06-10
url: "https://github.com/ulelab/gtf-ops"

GitHub Events

Total
  • Delete event: 1
  • Push event: 4
Last Year
  • Delete event: 1
  • Push event: 4

Dependencies

environment.yml conda
  • pandas
  • plumbum
  • pybedtools
  • python 3.7.*