project-run-hgsvc-hybrid-assemblies

Project repository for the HGSVC phase 3 paper

https://github.com/core-unit-bioinformatics/project-run-hgsvc-hybrid-assemblies

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Project repository for the HGSVC phase 3 paper

Basic Info
  • Host: GitHub
  • Owner: core-unit-bioinformatics
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Size: 14.1 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

HGSVC phase3 project repository (HHU)

Repository info

This repository contains custom code (jupyter notebooks, small snakemake workflows and python scripts) implementing various analyses for the HGSVC phase 3 manuscript:

Logsdon, Ebert, Audano, Loftus et al., Complex genetic variation in nearly complete human genomes https://doi.org/10.1101/2024.09.24.614721 bioRxiv

The code was written specifically for the purposes of this project and, thus, does not address generalized use cases.

Processes and analyses realized with the code in this repository are, e.g., internal data management, post-processing of workflow results (de novo assembly and evaluation, see workflow info in pyproject.toml) mostly for the purpose of plotting and creating summary tables.

The snakemake workflows implement small-scale processes such as extracting the MHC/HLA region or the Chromosome Y from each assembly.

Internal / WIP notes

Notes for revision

Following the update of the HG00514 Verkko assembly, only a subset of customized workflows was executed.

Updated:

```

update of for SIG-MHC, extracted MHC region from new assembly

workflow::modules::regions::hla::extract.smk

update of rDNA / ribotin runs, forwarded to Mir Henglin

workflow::modules::rdna::ribotin.smk

update of alignment support for CHS trio (child HG00514)

workflow::modules::asmcompare::trioalign.smk

update of HPRC gap evaluation

workflow::modules::regions::gaps::annotate_gaps.smk ```

The integrative QC analysis ("assembly label QC") was not updated because the complete run requires the annotation of centromeres, which is not available for HG00514 v2.

development notes / outdated

Verkko (env) updates

Last commit before updating MBG and GraphAligner to get bug fixes for last set of samples:

commit: #dbd3f7c88d9e28b052164c650e6ed56b7ba837de - graphaligner=1.0.17 - mbg=1.0.15

Updated to

commit: #2fe632d38ea615fb93d1af63c462078490e285f9 - graphaligner=1.0.18 - mbg=1.0.16

for samples:

  • YRI trio: NA19238, NA19239, NA19240
  • CHS trio: HG00512, HG00513, HG00514
  • HG00096
  • HG00732 / PUR mother

Removed

NA19320 - cell line does not grow, insufficient ONT

Confirmed contamination

NA18939 HiFi - resequencing

Potential contamination

HG04036 HiFi - v1.4.1+dirty assembly completed NA21487 HiFi - v1.4.1+dirty assembly completed

Notes on Verkko

Production version currently is v1.4+dirty [added commits #3119b39 and #4f6a54e]

Notes on nomenclature

  1. Trio kmer DBs
    • HXT - Illumina HiSeq X Ten
    • NVS - Illumina NovaSeq (6000)

Owner

  • Name: core-unit-bioinformatics
  • Login: core-unit-bioinformatics
  • Kind: organization

Citation (CITATION.md)

# Citing this repository

If you are using the content of this repository in whole or in part for your own work,
please credit the Core Unit Bioinformatics in an appropriate form.

In general, please add this statement to the acknowledgments of your publication:

    This work was supported by the Core Unit Bioinformatics
    of the Medical Faculty of the Heinrich Heine University Düsseldorf.

Additionally, please follow the below instructions to obtain a citable reference
for your publication.

## Identifying the right source link

Each repository of the Core Unit Bioinformatics is assigned a persistent
identifier (PID) at some point (usually after the prototype stage). Please use
this PID to link to the repository. You always find the PID in the top-level
`pyproject.toml`. Depending on the type of repository (project, workflow, or
workflow template), the relevant PID is listed in the corresponding metadata
section:

```toml
# workflow repository
[cubi.workflow]
pid = "THE-PID"
```

```toml
# workflow template repository
[cubi.workflow.template]
pid = "THE-PID"
```

```toml
# project repository
[cubi.project]
pid = "THE-PID"
```

If a PID has not yet been assigned to the repository, please use the repository URL,
and, if time permits, contact the repository maintainer regarding assigning a PID
in the near future.

### 1. Release version

For release versions, please use the respective version string in addition to the source link,
i.e. ideally in combination with the PID, and integrate that information into your list
of references as appropriate.

Note that repositories of the type "project" may not contain a lot of code, and are thus
often not amenable to the usual "release cycle" following bug fixes, feature integrations and so on.
Hence, the "project" metadata do not contain a "version" key (as opposed to workflow and workflow
template repositories). See the next point if you encounter that situation.

### 2. Development (non-release) version

For development versions, please use the respective git commit hash in addtion to the source link,
i.e. ideally in combination with the PID, and integrate that information into your list
of references as appropriate. It is strongly recommended to only use git commits from the two
central branches `main` and `dev`.

If a "project" repository is lacking an explicit release version, please use the same strategy
to obtain a citable reference of the repository.


### 3. None of the above

Please get in touch and we'll find a solution for your case :-)

GitHub Events

Total
  • Release event: 1
  • Push event: 14
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 14
  • Create event: 1

Dependencies

pyproject.toml pypi