https://github.com/brentp/hileup

horizontal pileup

https://github.com/brentp/hileup

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

horizontal pileup

Basic Info
  • Host: GitHub
  • Owner: brentp
  • License: apache-2.0
  • Language: C
  • Default Branch: master
  • Size: 72.3 KB
Statistics
  • Stars: 16
  • Watchers: 4
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Created about 7 years ago · Last pushed over 3 years ago
Metadata Files
Readme License

README.md

hileup is an early-stage version of a pileup engine.

It aims to provide an interface that is: + easy-to-use + fast from an interpreted language (like python).

It is currently targetted for accessing targetted sites (e.g. < 100K sites), rather than sweeping across every site in the genome.

There is a version in nim, one in C, and a cython wrapper for the C in python.

Python

The python version, which takes a pysam AlignmentFile object looks like:

```Python import pysam import chileup

bam = pysam.AlignmentFile("tests/three.bam", "rb")

setting track_xxx to False will speed the hileup as less copying and and data access is required.

config = chileup.Config(tags=[], trackreadnames=True, trackreads=True, trackbasequalities=True, trackmappingqualities=True, excludeflags=pysam.FQCFAIL | pysam.FSECONDARY | pysam.FSUPPLEMENTARY | pysam.FDUP, minbasequality=10, minmappingquality=10)

We can ignore all reads with 'C' at this base (for example to get variant-only reads)

h = chileup.pileup(bam, "1", 1585270, config, 'C')

print(h.bases) # 'tt' print(h.read_names) # [b'A00227:74:HCWC7DSXX:1:1269:13449:13855', b'A00227:74:HCWC7DSXX:1:2426:7157:15483'] print(h.bqs) # [37, 37] numpy array that is a view into underlying data. print(h.mqs) # [60, 60] numpy view. print(h.deletions) # [(0, 8) (1, 8)] numpy view

NOTE: if you're needing this, it might be simpler to use pysam pileup.

reads = h.reads(bam.header) # [, ] readpositions = [chileup.querypos(r) for r in reads] print(readpositions) # [69, 69]] print("query sequence:", [read.querysequence[p] for (read, p) in zip(reads, read_positions)]) # 'TT' matches bases above.

the insertions and deletions have a .index property that can be used

to access the read-names, tags, etc that are associated with the indel event.

for ins in h.insertions: # copy of the data. print(h.read_names[ins.index], h.tags[ins.index], ins.sequence, ins.len)

print('tags:', h.tags) # copy. ```

To build python setup.py build_ext -i To install python setup.py install

Because it minimizes operations in python, it is quite fast (for python).

NOTE that strand information is encoded by case for python (lower case == reverse strand).

C

The C version should be transparent to anyone familier with htslib The signature is:

C hile *hileup(htsFile *htf, bam_hdr_t *hdr, hts_idx_t *idx, char *chrom, int position, hile_config_t *cfg);

where hile_config_t is a simple struct that indicates min-mapping and base-qualities and whether to track read-names, base-qualities, etc.

```C htsFile *htf = htsopen("tests/three.bam", "rb"); int start = 1585270; bamhdrt *hdr = samhdrread(htf); htsidxt *idx = samindexload(htf, "tests/three.bam"); hileconfigt cfg = hileinitconfig(); cfg.trackbasequalities = true; cfg.trackmappingqualities = true; cfg.trackread_names = true; // track the cell-barcode so we can get per-cell pileup!! cfg.tags[0] = 'C'; cfg.tags[1] = 'B';

hile* h = hileup(htf, hdr, idx, "1", start, &cfg);
fprintf(stderr, "%s:%d ", "1", start);
for(int i=0; i < h->n; i++){
    fprintf(stderr, "%c", (char)h->bases[i].base);
}
if(cfg.track_mapping_qualities) {
        fprintf(stderr, " ");
        for(int i=0; i < h->n; i++){
            fprintf(stderr, "%c", (char)(h->bqs[i] + 33));
        }
}
if(cfg.tags[0] != 0) {
        fprintf(stderr, " ");
        for(int i=0; i < h->n; i++){
            fprintf(stderr, "%d:%s ", i, h->tags[i]);
        }
}
fprintf(stderr, "\n");

hile_destroy(h);
bam_hdr_destroy(hdr);
hts_idx_destroy(idx);
hts_close(htf);

```

Owner

  • Name: Brent Pedersen
  • Login: brentp
  • Kind: user
  • Location: Oregon, USA

Doing genomics

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels