project-run-hgsvc-hybrid-assemblies
Project repository for the HGSVC phase 3 paper
https://github.com/core-unit-bioinformatics/project-run-hgsvc-hybrid-assemblies
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Repository
Project repository for the HGSVC phase 3 paper
Basic Info
- Host: GitHub
- Owner: core-unit-bioinformatics
- License: mit
- Language: Shell
- Default Branch: main
- Size: 14.1 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
HGSVC phase3 project repository (HHU)
Repository info
This repository contains custom code (jupyter notebooks, small snakemake workflows and python scripts) implementing various analyses for the HGSVC phase 3 manuscript:
Logsdon, Ebert, Audano, Loftus et al.,
Complex genetic variation in nearly complete human genomes
https://doi.org/10.1101/2024.09.24.614721 bioRxiv
The code was written specifically for the purposes of this project and, thus, does not address generalized use cases.
Processes and analyses realized with the code in this repository are, e.g., internal data management, post-processing of workflow results (de novo assembly and evaluation, see workflow info in pyproject.toml) mostly for the purpose of plotting and creating summary tables.
The snakemake workflows implement small-scale processes such as extracting the MHC/HLA region or the Chromosome Y from each assembly.
Internal / WIP notes
Notes for revision
Following the update of the HG00514 Verkko assembly, only a subset of customized workflows was executed.
Updated:
```
update of for SIG-MHC, extracted MHC region from new assembly
workflow::modules::regions::hla::extract.smk
update of rDNA / ribotin runs, forwarded to Mir Henglin
workflow::modules::rdna::ribotin.smk
update of alignment support for CHS trio (child HG00514)
workflow::modules::asmcompare::trioalign.smk
update of HPRC gap evaluation
workflow::modules::regions::gaps::annotate_gaps.smk ```
The integrative QC analysis ("assembly label QC") was not updated because the complete run requires the annotation of centromeres, which is not available for HG00514 v2.
development notes / outdated
Verkko (env) updates
Last commit before updating MBG and GraphAligner to get bug fixes for last set of samples:
commit: #dbd3f7c88d9e28b052164c650e6ed56b7ba837de
- graphaligner=1.0.17
- mbg=1.0.15
Updated to
commit: #2fe632d38ea615fb93d1af63c462078490e285f9
- graphaligner=1.0.18
- mbg=1.0.16
for samples:
- YRI trio: NA19238, NA19239, NA19240
- CHS trio: HG00512, HG00513, HG00514
- HG00096
- HG00732 / PUR mother
Removed
NA19320 - cell line does not grow, insufficient ONT
Confirmed contamination
NA18939 HiFi - resequencing
Potential contamination
HG04036 HiFi - v1.4.1+dirty assembly completed NA21487 HiFi - v1.4.1+dirty assembly completed
Notes on Verkko
Production version currently is v1.4+dirty [added commits #3119b39 and #4f6a54e]
Notes on nomenclature
- Trio kmer DBs
- HXT - Illumina HiSeq X Ten
- NVS - Illumina NovaSeq (6000)
Owner
- Name: core-unit-bioinformatics
- Login: core-unit-bioinformatics
- Kind: organization
- Repositories: 15
- Profile: https://github.com/core-unit-bioinformatics
Citation (CITATION.md)
# Citing this repository
If you are using the content of this repository in whole or in part for your own work,
please credit the Core Unit Bioinformatics in an appropriate form.
In general, please add this statement to the acknowledgments of your publication:
This work was supported by the Core Unit Bioinformatics
of the Medical Faculty of the Heinrich Heine University Düsseldorf.
Additionally, please follow the below instructions to obtain a citable reference
for your publication.
## Identifying the right source link
Each repository of the Core Unit Bioinformatics is assigned a persistent
identifier (PID) at some point (usually after the prototype stage). Please use
this PID to link to the repository. You always find the PID in the top-level
`pyproject.toml`. Depending on the type of repository (project, workflow, or
workflow template), the relevant PID is listed in the corresponding metadata
section:
```toml
# workflow repository
[cubi.workflow]
pid = "THE-PID"
```
```toml
# workflow template repository
[cubi.workflow.template]
pid = "THE-PID"
```
```toml
# project repository
[cubi.project]
pid = "THE-PID"
```
If a PID has not yet been assigned to the repository, please use the repository URL,
and, if time permits, contact the repository maintainer regarding assigning a PID
in the near future.
### 1. Release version
For release versions, please use the respective version string in addition to the source link,
i.e. ideally in combination with the PID, and integrate that information into your list
of references as appropriate.
Note that repositories of the type "project" may not contain a lot of code, and are thus
often not amenable to the usual "release cycle" following bug fixes, feature integrations and so on.
Hence, the "project" metadata do not contain a "version" key (as opposed to workflow and workflow
template repositories). See the next point if you encounter that situation.
### 2. Development (non-release) version
For development versions, please use the respective git commit hash in addtion to the source link,
i.e. ideally in combination with the PID, and integrate that information into your list
of references as appropriate. It is strongly recommended to only use git commits from the two
central branches `main` and `dev`.
If a "project" repository is lacking an explicit release version, please use the same strategy
to obtain a citable reference of the repository.
### 3. None of the above
Please get in touch and we'll find a solution for your case :-)
GitHub Events
Total
- Release event: 1
- Push event: 14
- Create event: 1
Last Year
- Release event: 1
- Push event: 14
- Create event: 1