https://github.com/biojulia/genomicannotations.jl
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Basic Info
Statistics
- Stars: 16
- Watchers: 6
- Forks: 4
- Open Issues: 1
- Releases: 25
Topics
Metadata Files
README.md
GenomicAnnotations.jl
Description
GenomicAnnotations is a package for reading, modifying, and writing genomic annotations in the GenBank, GFF3, GFF2/GTF, and EMBL file formats.
Installation
julia
julia>]
pkg> add GenomicAnnotations
Usage
GenBank and GFF3 files are read with readgbk(input) and readgff(input), which return vectors of Records. input can be an IOStream or a file path. GZipped data is unzipped automatically if a filename ending in ".gz" is passed as input. If we're only interested in the first chromosome in example.gbk we only need to store the first element.
julia
chr = readgbk("test/example.gbk")[1]
Records have five fields, name, header, genes, genedata, and sequence. The name is read from the header, which is stored as a string. The annotation data is stored in genedata, but generally you should use genes to access that data. For example, it can be used to iterate over annotations, and to modify them.
```julia
for gene in chr.genes
gene.locustag = "$(chr.name)$(gene.locus_tag)"
end
chr.genes[2].locus_tag = "test123" ```
The locus of a Gene can be retrieved with locus(gene), and updated with locus!(gene, newlocus). The easiest way to create a locus is to use the constructor Locus(s), which takes an AbstractString s and parses it as a GenBank locus string as defined here: https://www.insdc.org/submitting-standards/feature-table/#3.4. Note that remote entry descriptors have not been implemented.
```julia
The following are all equivalent
locus!(gene, "complement(join(1..100,200..>300))") locus!(gene, Locus("complement(join(1..100,200..>300))")) locus!(gene, Complement(Join(ClosedSpan(1:100), OpenRightSpan(200:300)))) ```
position(gene) can be used as shorthand for locus(gene).position to retrieve the chromosomal positions included in the locus, excluding all metadata such as strandedness. The return type depends on the locus type, but is quaranteed to iterate over the individual positions.
```julia
for i in position(gene)
print(parent(gene).sequence[i])
end
is equivalent to
print(sequence(gene)) ```
Accessing properties that haven't been stored will return missing. For this reason, it often makes more sense to use get() than to access the property directly.
```julia
chr.genes[2].pseudo returns missing, so this will throw an error
if chr.genes[2].pseudo println("Gene 2 is a pseudogene") end
... but this works:
if get(chr.genes[2], :pseudo, false) println("Gene 2 is a pseudogene") end ```
The macro @genes can be used to filter through the annotations. The macro takes a Record or a Vector{Record}, followed by any number of expressions that will be evaluated for each gene. The keyword gene is used to refer to the individual Genes. @genes can also be used to modify annotations. Gene attributes can be referred to using Symbols.
```julia
@genes(chr, feature(gene) == "CDS") # Returns all coding regions
@genes(chr, length(gene) > 300) # Returns all features longer than 300 nt
@genes(chr, iscomplement(gene)) # Returns all features on the complement strand
@genes(chr, ismissing(:product)) # Returns all features for which the attribute "product" has not been set
Some short-hand forms are available to make life easier:
CDS expands to feature(gene) == "CDS", and
get(s::Symbol, default) expands to get(gene, s, default)
The following two are thus equivalent:
@genes(chr, feature(gene) == "CDS", occursin("glycoprotein", get(gene, :product, ""))) @genes(chr, CDS, occursin("glycoprotein", get( :product, "")))
All arguments have to evaluate to true for a gene to be included, so the following expressions are equivalent:
@genes(chr, feature(gene) == "CDS", length(gene) > 300) @genes(chr, (feature(gene) == "CDS") && (length(gene) > 300))
@genes returns a Vector{Gene}. Attributes can be accessed with dot-syntax, and can be assigned to:
@genes(chr, :locus_tag == "tag03")[1].pseudo = true @genes(chr, CDS, ismissing(:gene)).gene .= "unknown" ```
Gene sequences can be accessed with sequence(gene). For example, the following code will write the translated sequences of all complete protein-coding genes to a file:
julia
using BioSequences
using FASTX
open(FASTA.Writer, "proteins.fasta") do w
for gene in @genes(chr, CDS, iscomplete(gene))
aaseq = GenomicAnnotations.sequence(gene; translate = true)
write(w, FASTA.Record(gene.locus_tag, get(:product, ""), aaseq))
end
end
Genes can be added using addgene!, and sort! can be used to make sure that the resulting annotations are in the correct order for printing. delete! is used to remove genes.
```julia
newgene = addgene!(chr, "regulatory", 670:677)
newgene.locus_tag = "reg02"
sort!(chr.genes)
Genes can be deleted. This works for all genes where :pseudo is true, and ignores genes where it is false or missing
delete!(@genes(chr, :pseudo))
Delete all genes 60 nt or shorter
delete!(@genes(chr, length(gene) <= 60)) ```
Individual genes, and Vector{Gene}s are printed in GBK format. To include the GBK header and the nucleotide sequence, printgbk(io, chr) can be used to write them to a file. printgff(io, chr) prints the annotations as GFF3, in which case the GenBank header is lost.
```julia
println(chr.genes[1])
println(@genes(chr, CDS))
open(GenBank.Writer, "updated.gbk") do w write(w, chr) end ```
Owner
- Name: BioJulia
- Login: BioJulia
- Kind: organization
- Website: https://biojulia.dev
- Repositories: 79
- Profile: https://github.com/BioJulia
Bioinformatics and Computational Biology in Julia
GitHub Events
Total
- Create event: 5
- Commit comment event: 10
- Release event: 5
- Issues event: 13
- Watch event: 5
- Issue comment event: 13
- Push event: 28
Last Year
- Create event: 5
- Commit comment event: 10
- Release event: 5
- Issues event: 13
- Watch event: 5
- Issue comment event: 13
- Push event: 28
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Karl Dyrhage | k****e@g****m | 252 |
| Karl Dyrhage | 6 | |
| Xiangting Li | 3****n | 1 |
| Julia TagBot | 5****t | 1 |
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 13
- Total pull requests: 4
- Average time to close issues: 3 months
- Average time to close pull requests: 3 months
- Total issue authors: 11
- Total pull request authors: 3
- Average comments per issue: 3.15
- Average comments per pull request: 0.25
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 0
- Average time to close issues: about 16 hours
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 0.75
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ian-small (5)
- attobot (2)
- Xiao-Zhong (1)
- yang-dongxu (1)
- TorkelE (1)
- diegozea (1)
- adityanprasad (1)
- jakobnissen (1)
- jowch (1)
- JuliaTagBot (1)
- camilogarciabotero (1)
Pull Request Authors
- kdyrhage (2)
- hsianktin (1)
- JuliaTagBot (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- julia 63 total
- Total dependent packages: 2
- Total dependent repositories: 3
- Total versions: 25
juliahub.com: GenomicAnnotations
- Documentation: https://docs.juliahub.com/General/GenomicAnnotations/stable/
- License: MIT
-
Latest release: 0.4.5
published over 1 year ago