hicberg
Statistical profiling based program for contact (Hi-C) and pair ended genomic data reconstruction
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Keywords
Repository
Statistical profiling based program for contact (Hi-C) and pair ended genomic data reconstruction
Basic Info
- Host: GitHub
- Owner: sebgra
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://sebgra.github.io/hicberg/
- Size: 148 MB
Statistics
- Stars: 2
- Watchers: 4
- Forks: 1
- Open Issues: 8
- Releases: 2
Topics
Metadata Files
README.md
HiC-BERG
Badges
Python package to reconstruct Hi-C contact maps.
Documentation
Documentation and tutorial: https://sebgra.github.io/hicberg/
Table of contents
- Environment and dependencies
- Docker
- Installation
- Usage/Examples
- Config files
- Snakemake usage
- Individual components
- Chaining pipeline steps
- Model evaluation
- Contributing
- License
- Authors
- Citation
Environment and dependencies
Environnement
Create environment by using following command :
bash
mamba env create -n [ENV_NAME] -f hicberg.yaml;
Dependencies
To ensure that HiC-BERG is correctly working, Bowtie2, Samtools, bedGraphToBigWig and BedTools have to be installed. These can be install through :
```bash
mamba install bowtie2 -c bioconda; mamba install samtools -c bioconda; mamba install -c bioconda ucsc-bedgraphtobigwig; mamba install bedtools -c bioconda; ```
Depending on your aligner preferences, BWA and Minimap2 might be installed through:
bash
mamba install bioconda::bwa;
mamba install bioconda::minimap2
Installation
Install my-project with pip:
bash
pip install hicberg
or in developper mode:
bash
mamba activate [ENV_NAME];
pip install -e . ;
pip
Install HiC-BERG locally by using
```bash
pip install -e .
```
Conda / Mamba
We highly recommend installing HiC-BERG through Mamba.
bash
conda install -c bioconda hicberg
```bash
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" bash Mambaforge-$(uname)-$(uname -m).sh mamba create -n hicberg python=3.11.4 mamba activate hicberg mamba install bioconda::bowtie2 mamba install bioconda::samtools mamba install bioconda::bedtools mamba install bioconda::ucsc-bedgraphtobigwig
For exhaustive aligners usage
mamba install bioconda::bwa; mamba install bioconda::minimap2 ```
Docker
HiC-BERG can be used via Docker to limit dependency compatibility problems. More information about container such as Docker can be found there.
To be used through Docker, the HiC-BERG container as to be build using :
```bash
Pick user and group variables
export HOSTUID=$(id -u);
export HOSTGID=$(id -g);
Build container
sudo docker build --build-arg UID=$HOSTUID --build-arg GID=$HOSTGID -t hicberg:1.0.1 . ```
HiC-BERG can therefore be used with usual command detailed further using :
bash
sudo docker run -v $PWD:$PWD -w <working directory> -u $(id -u):$(id -g) hicberg:1.0.1 <hicberg_command>
N.B : workingdirectory_ and output (-o argument) of HiC-BERG command have to be the same.
For instance if you need to access the help of HiC-BERG through Docker :
bash
sudo docker run -v $PWD:$PWD -w <working directory> -u $(id -u):$(id -g) hicberg:1.0.1 hicberg --help
HiC-BERG can also be used through Docker in interactive mode using :
bash
sudo docker run -v $PWD:$PWD -w <working directory> -u $(id -u):$(id -g) -it hicberg:0.0.1
Then, the user will be directly placed in an interactive shell where HiC-BERG can be use directly, by typing any of HiC-BERG command such as further examples.
Usage/Examples
Full pipeline

All components of the pipeline can be run at once using the hicberg pipeline command. This allows to generate a contact matrix and its reconstruction from reads in a single command.\ By default, the output is in COOL format. More detailed documentation can be found on the readthedocs website:
https://sebgra.github.io/hicberg/
```bash
hicberg pipeline [--enzyme=["DpnII", "HinfI"]] [--distance=1000]
[--rate=1.0] [--cpus=1] [--mode="full"] [--aligner="bowtie2"] [--read-type="sr"] [--max-alignments=None] [--sensitivity="very-sensitive"]
[--bins=2000] [--circular=""] [--mapq=35] [--kernel-size=11] [--deviation=0.5] [--start-stage="fastq"]
[--exit-stage=None] [--output=DIR] [--index=None] [--blacklist=STR] [--force] [--config=STR]
```
For example, to run the pipeline using 8 threads, performing alignments with bowtie2, in default mode, using ARIMA Hi-C kit enzymes (DpnII & HinfI) without blacklisting and generate a matrix and its reconstruction in the directory out:
```bash hicberg pipeline -e DpnII -e HinfI --cpus 8 -o out/ genome.fa readsfor.fq revreads.fq
```
Snakemake usage
Configuration
Several parameters and Hi-C banks can be set in the file config.yaml. The parameters are the following:
```yaml samples: "config/samples.csv" basedir: Path to the base directory containing data outdir: Path where to save results refdir: Path to the folder containing genomes fastqdir: Path to the folder containing fastq files
name : libraryname ref: Path to the reference genome from refdir R1: Path to the foward reads file from fastqdir R2: Path to the reverse read file from fastqdir circular: Coma separated list of circular chromosomes enzymes : Coma separated list of enzymes used for the experiment sampling_rate: Sampling rate of the restriction map res: Resolution of the contact matrix (in bp) ```
The samples.csv file can be used to set the parameters for each library. The file is a csv file with the following columns:
```csv library;species;samplingrates;enzymes;kernelsizes;deviations;modes;resolutions;max_reports;circularity
name_1
libraryname1;species1;samplingrate1;enzymes1;kernelsizes1;deviations1;mode1;resolutions1;maxreports_1
name_2
libraryname2;species2;samplingrate2;enzymes2;kernelsizes2;deviations2;mode2;resolutions2;maxreports_2 ```
Run
Locally
The HiC-BERG pipeline can be run using Snakemake. The pipeline is defined in the file Snakefile. The pipeline can be run using the following command:
bash
snakemake --cores [cpus]
Cluster
HiC-BERG can also be run on a cluster using Snakemake. The cluster configuration is defined in the file cluster_slurm.yaml. The pipeline can be run using the following command:
bash
snakemake --cluster "sbatch --mem {cluster.mem} -p {cluster.partition} --qos {cluster.queue} -c {cluster.ncpus}
-n {cluster.ntasks} -J {cluster.name} -o {cluster.output} -e {cluster.error}" --cluster-config config/cluster_slurm.yaml -j 16 --rerun-incomplete
As for the local run, the libraries to process are defined in the file samples.csv. Computational resources can be set in the file cluster_slurm.yaml. In this file, the different parameters have to be specified by rules. For instance:
yaml
hicberg_step_0:
queue: normal
partition: common
ncpus: 16
mem: 32G
ntasks: 1
name: hicberg.{rule}.{wildcards}
output: logs/cluster/{rule}.{wildcards}.out
error: logs/cluster/{rule}.{wildcards}.err
Jobs will be sent to the cluster as usual sbatch commands. The fields queue, partition, ncpus, mem, ntasks, name, output and error are mandatory.
Considering the previous example, the following command will be sent to the cluster:
bash
sbatch --mem 32G -p common -c 16 -n 1 -J hicberg.hicberg_step_0.{wildcards} -o logs/cluster/hicberg_step_0.{wildcards}.out -e logs/cluster/hicberg_step_0.{wildcards}.err
And such for each rule defined in the file cluster_slurm.yaml, through all the libraries specified in sample.csv.
Log files will be saved in the folder logs/cluster. The output and error files will be named as the following: {rule}.{wildcards}.out and {rule}.{wildcards}.err. The wildcards are the parameters specified in the file samples.csv.
_N.B: The parameter --rerun-incomplete is used to restart the pipeline from the last step if it has been interrupted.
N.B 2: The parameter -j is used to specify the number of jobs to run in parallel. It is recommended to set this parameter to the number of libraries to process.
N.B 3: The ressources allocated to each job can be modified in the file clusterslurm.yaml. The parameters mem and ncpus are used to specify the memory and the number of CPUs to allocate to each job. The parameter ntasks is used to specify the number of tasks to run in parallel. The parameter queue is used to specify the queue to use. The parameter partition is used to specify the partition to use. The parameters name, output and error are used to specify the name of the job and the output and error files.
Configuration Files
Hicberg allows you to manage pipeline parameters efficiently using configuration files in the INI format. This is particularly useful for complex runs or for sharing reproducible settings across multiple analyses.
INI File Format
Hicberg uses the standard INI file format. Each file is organized into [sections] and contains key = value pairs. Comments can be added using ; or #.
A template of such config file is provided in the templates folder.
Example config.ini:
```ini ; General pipeline settings [General] samplename = myexperimentrun startstage = fastq exitstage = None ; Set to 'None' for full pipeline, or a stage name outputdirectory = ./results verboselogging = True forceoverwrite = False ; Set to True to overwrite existing output directories
; General pipeline settings [General] samplename = myexperimentrun startstage = fastq exitstage = None ; Set to 'None' for full pipeline, or a stage name outputdirectory = ./results verboselogging = True forceoverwrite = False ; Set to True to overwrite existing output directories
; Input file paths [InputFiles] genomefasta = /path/to/mydata/genome.fasta genomeindex = /path/to/mydata/genomeindexprefix ; Optional: if not provided, Hicberg will build it forwardreads = /path/to/mydata/readsR1.fastq.gz reversereads = /path/to/mydata/readsR2.fastq.gz blacklist_regions = None ; Set to 'None' or leave blank if no blacklist file otherwise provide either a bed file or a list of coordinates coma separated using UCSC format.
; Alignment parameters [Alignment] aligner = bowtie2 ; 'bowtie2', 'bwa' or 'minimap2' cpus = 8 sensitivity = very-sensitive ; presets: very-fast, fast, sensitive, very-sensitive maxalignment = None ; Max alignments to report per read (int) or 'None' for unlimited mapq = 35 readtype = short ; "sr", "map-pb", "map-hifi", "map-ont", "splice", "splice:hq", "asm5", "asm10", "ava-pb" or"ava-ont"
; Processing parameters [Processing] enzyme = DpnII,HinfI ; Comma-separated list for multiple enzymes (e.g., 'DpnII,HinfI') circulargenome = ; e.g., 'chrM' if mitochondrial genome is circular, otherwise leave blank or 'None' (ChrM in our genome example) rate = 1.0 ; Downsampling rate (float) distance = 1000 ; Distance parameter (int) for omics mode bins = 2000 ; Contact map resolution(int) mode = standard ; 'full', 'standard', 'density', or 'omics' kernelsize = 11 ; Kernel size for density calculation (int) deviation = 0.5 ; Deviation for density calculation (float) ```
Using a Configuration File
To run the pipeline using a configuration file, specify its path with the -C or --config-file option.
Important Note: The positional arguments <genome_fasta>, <R1_fastq>, and <R2_fastq> are always required, even when using a configuration file. These arguments specify the primary input files for your pipeline run.
```bash
hicberg pipeline -C myconfig.ini <pathtogenome.fasta> <pathtoR1.fastq.gz> <pathto_R2.fastq.gz> ```
Individual components
I/O
Create folder
Be careful to create a folder before running the pipeline. The folder can be created using the following command:
bash
hicberg create-folder --output=DIR [--name="folder_name"] [--force]
For example to create a folder named "test" on the desktop:
bash
hicberg create-folder -o ~/Desktop/ -n test
The folders architecture will be the following:
bash
output
alignments
contacts
index
plots
statistics
Preprocessing
After having created a folder with the previous command mentioned in create folder, the genome can be processed to generate fragment file fragmentfixedsizes.txt and the dictionary of chromosomes' sizes chromosome_sizes.npy using the following command:
bash
hicberg get-tables --output=DIR --genome=FILE [--bins=2000]
For example to these files in a folder named "test" previously created on the desktop with a binning size of 2000 bp :
bash
hicberg get-tables -o ~/Desktop/test/ --bins 2000 <genome>
The files fragmentfixedsizes.txt and chromosome_sizes.npy will be generated in the folder output/.
Alignment
After having created a folder with the previous command mentioned in create folder and performed the creation of fragment file fragmentfixedsizes.txt and the dictionary of chromosomes' sizes chromosome_sizes.npy , the reads can be aligned using the following command:
bash
hicberg alignment --output=DIR [--cpus=1] [--aligner="bowtie2"] [--read-type="sr"][--max-alignments=None] [--sensitivity="very-sensitive"] [--index=index]
[--verbosity] <genome> <forward> <reverse>
For example to align reads in a folder named "test" previously created on the desktop with 8 threads:
bash
hicberg alignment -o ~/Desktop/test/ --cpus 8 <genome.fa> <reads_for.fq> <rev_reads.fq>
If the user have already created the index, the following command can be used:
bash
hicberg alignment -o ~/Desktop/test/ --cpus 8 --index index_prefix <genome.fa> <reads_for.fq> <rev_reads.fq>
The files XXX.btl2, 1.sorted.bam and 2.sorted.bam will be created if using bowtie2 as aligner --aligner parameter or -a.
If the aligner used is BWA, the files XXX.fa.amb, XXX.fa.ann, XXX.fa.bwt, XXX.fa.pac and XXX.fa.sa will be created.
Using Minimap2 for Alignment
When using Minimap2 (--aligner "minimap2"), it's crucial to specify the --read-type parameter to ensure optimal alignment for your specific sequencing data. Minimap2 uses different presets (-x option) that are highly optimized for various types of reads, affecting its performance and accuracy.
The hicberg alignment command supports the following read-type values, directly mapping to Minimap2's presets:
sr: For standard short genomic reads (e.g., Illumina, BGI). This is the default setting and is typically suitable for most Hi-C experiments that use short-read sequencing.map-ont: For Oxford Nanopore Technologies (ONT) reads. These are long reads, often characterized by a higher error rate.map-pb: For PacBio CLR (Continuous Long Read) data. These are also long reads but generally have different error profiles than ONT reads.map-hifi: For PacBio HiFi reads. These are highly accurate long reads (circular consensus sequencing).- s
plice: For RNA-seq reads, which accounts for splicing events during alignment. splice:hq: A higher-quality variant for RNA-seq reads, offering more accurate spliced alignment.asm5/asm10: For aligning reads during genome assembly, optimized for around 5% or 10% sequence divergence, respectively.ava-pb/ava-ont: For all-versus-all read overlapping with PacBio or ONT reads, primarily used in assembly workflows.
Example for Nanopore reads:
To align long reads sequenced with Oxford Nanopore Technologies, you would use:
bash
hicberg alignment -o ~/Desktop/test/ --cpus 8 --aligner "minimap2" --read-type map-ont <genome.fa> <reads_for.fq> <rev_reads.fq>
For more detailed information on Minimap2's presets and their underlying parameters, please refer to the official Minimap2 documentation
Classification
bash
hicberg classify --output=DIR [--mapq=35]
Considering the previous example, to classify the reads in a folder named "test" previously created on the desktop:
bash
hicberg classify -o ~/Desktop/test/
The files created are:
- group0.1.bam and group0.2.bam : bam files containing the reads of group0 e.g. where at least one read of the pair is unaligned.
- group1.1.bam and group1.2.bam : bam files containing the reads of group1 e.g. where both reads of the pair are aligned only one time.
- group2.1.bam and group2.2.bam : bam files containing the reads of group2 e.g. where at least one reads of the pair are aligned more than one time.
Pairs and matrix building
Build pairs
After having aligned the reads, the pairs file group1.pairs can be built using the following command:
bash
hicberg build-pairs --output=DIR [--recover]
If the flag argument recover is used, the pairs file will be built from the last step of the analysis e.g. after having computed the statistics and re-attributed reads from group2 bam files.
Considering the previous example, to build the matrix in a folder named "test" previously created on the desktop:
bash
hicberg build-pairs -o ~/Desktop/test/
The file group1.pairs will be created.
If the pairs file has to be built after reads of group2 reassignment, the following command can be used:
bash
hicberg build-pairs -o ~/Desktop/test/ --recover
Thus, the built pairs file will be all_group.pairs.
Build matrix
After having aligned the reads and built the pairs file group1.pairs, the cooler matrix unrescued_map.cool can be built using the following command:
bash
hicberg build-matrix --output=DIR [--recover]
If the flag argument recover is used, the matrix file will be built from the last step of the analyis e.g. after having computed the statistics and re-attributed reads from group2 bam files.
Considering the previous example, to build the matrix in a folder named "test" previously created on the desktop:
bash
hicberg build-matrix -o ~/Desktop/test/
The file unrescued_map.cool will be created.
If the cooler file has to be built after reads of group2 re-assignament, the following command can be used:
bash
hicberg build-matrix -o ~/Desktop/test/ --recover
Thus, the built matrix file will be rescued_map.cool.
Statistics
After having aligned the reads and built the pairs file group1.pairs, the cooler matrix unrescued_map.cool, the statistical laws for the reassignment of the reads from group2 can be learnt by using the following command:
bash
hicberg statistics --output=DIR [--bins=bins_number] [--circular=""] [--rate=1.0] [--mode="standard"]
[--kernel-size=11] [--deviation=0.5] [--balcklist=STR] <genome>
Considering the previous example, to get the statistical laws (with respect of ARIMA kit enzymes) and default parameters for density estimation, without sub-sampling the restriction map and without blacklisting regions and considering "chrM" as circular in a folder named "test" previously created on the desktop:
bash
hicberg statistics -e DpnII -e HinfI -c "chrM" -o ~/Desktop/test/ <genome.fa>
The statistical laws are going to be saved as:
- xs.npy : dictionary containing the log binned genome as dictionary such as
{chromosome: [log bins]} - uncuts.npy, loops.npy, weirds.npy : dictionary containing the distribution of uncuts, loops and weirds as dictionary such as
{chromosome: [distribution]} - pseudo_ps.pdf : plot of the distribution of the pseudo ps, i.e. ps equivalent for trans-chromosomal cases, extracted from the reads of group1.
- coverage: dictionary containing the coverage of the genome as dictionary such as
{chromosome: [coverage]} - d1d2.npy: np.array containing the d1d2 law as dictionary such as
[distribution] - density_map.npy : dictionary containing the density map as dictionary such as
{chromosome_pair: [density map]}
Blacklisting regions
The user can specify regions to blacklist in the analysis. The regions to blacklist have to be specified in a bed file or a coma separated list of genomic coordinates considering UCSC format. The bed file has to be formatted as follow:
bed
chr1 200000 220000
chr1 308000 314000
...
chr3 100000 120000
List of blacklisted regions provided as a string can be specified as follow:
bash
chr1:200000-220000,chr1:308000-314000,...,chr3:100000-120000
So the learning of the statistical laws, as previously described, with blacklisting regions for P(s) computing can be done using the following command:
- Using a bed file:
```bash
hicberg statistics -e DpnII -e HinfI -c "chrM" -o ~/Desktop/test/ -B
- Using a string:
bash
hicberg statistics -o ~/Desktop/test/ -B "chr1:200000-220000,chr1:308000-314000,chr3:100000-120000" <genome>
Omics mode
The omics mode can be used to reconstruct any pair-ended sequenced genomic data. For such, the model used relies on the $P(s)$ and the coverage. After reconstruction, the data is located in the folder statistics/. The files generated are: - coverage.bed : bed file containing the coverage of the genome - coverage.bedgraph : bedgraph file containing the coverage of the genome - signal.bw : bigwig file containing the reconstructed signal (pair-ended data)
The omics mode can be used using the following command:
bash
hicberg pipeline -o <out folder> -t <cpus> -m omics -s <alignment sensitivity> genome.fa reads_for.fq rev_reads.fq
Reconstruction
After having learnt the statistical laws (based on reads of group1), the reads from group2 can be reassigned using the following command:
bash
hicberg rescue --output=DIR [--enzyme=["DpnII", "HinfI"]] [--mode="full"] [--cpus=1] <genome>
Considering the previous example, to reassign the reads from group2 in a folder named "test" previously created on the desktop:
```bash
hicberg rescue -e DpnII -e HinfI -o ~/Desktop/test/
The files group2.1.rescued.bam and group2.2.rescued.bam will be created.
Plot
To plot all the information about the analysis, the following command can be used:
bash
hicberg plot --output=DIR [--bins=2000] <genome>
Considering all the previous analysis, with 2000bp as bin size to plot all the information in a folder named "test" previously created on the desktop:
bash
hicberg plot -o ~/Desktop/test/ -b 2000 <genome>
The plots created are:
- patternsdistributionX.pdf : plot of the distribution of the different patterns extracted from the reads of group1.
- coverage_X.png : plot of the genome coverage extracted from the reads of group1.
- d1d2.pdf : plot of the d1d2 law extracted from the reads of group1.
- density_X-Y.pdf : plot of the density map extracted from the reads of group1.
- Couplesizesdistribution.pdf : plot of the distribution of the plausible couple sizes extracted from the reads of group2.
- chr_X.pdf : plot of the original map and reconstructed one for each chromosome.
Tidy folder
To tidy the folder, the following command can be used:
bash
hicberg tidy --output=DIR
Considering all the previous analysis, to tidy the folder in a folder named "test" previously created on the desktop:
bash
hicberg tidy -o ~/Desktop/test/
After tidying the folders architecture will be the following:
bash
/home/sardine/Bureau/sample_name
alignments
group0.1.bam
group0.2.bam
group1.1.bam
group1.2.bam
group2.1.bam
group2.1.rescued.bam
group2.2.bam
group2.2.rescued.bam
chunks
chunk_for_X.bam
chunk_rev_X.bam
contacts
matrices
rescued_map.cool
unrescued_map.cool
pairs
all_group.pairs
group1.pairs
fragments_fixed_sizes.txt
chromosome_sizes.txt
index
index.1.bt2l (Bowtie2)
index.2.bt2l (Bowtie2)
index.3.bt2l (Bowtie2)
index.4.bt2l (Bowtie2)
index.rev.1.bt2l (Bowtie2)
index.rev.2.bt2l (Bowtie2)
index.fa.amb (BWA)
index.fa.ann (BWA)
index.fa.bwt (BWA)
index.fa.pac (BWA)
| index.fa.sa (BWA)
plots
chr_X.pdf
Couple_sizes_distribution.pdf
coverage_X.pdf
patterns_distribution_X.pdf
pseudo_ps.pdf
density_X-Y.pdf
statistics
chromosome_sizes.npy
coverage.npy
d1d2.npy
density_map.npy
dist.frag.npy
loops.npy
restriction_map.npy
trans_ps.npy
uncuts.npy
weirds.npy
xs.npy
chromsome_sizes.bed(*)
coverage.bed(*)
coverage.bedgraph(*)
signal.bw(*)
(*) : files generated by hicberg in omics mode
Chaining pipeline steps
It is possible to chain the different steps of the pipeline by using the following command:
```bash
0. Prepare analysis
hicberg pipeline -o --output=DIR [--cpus=1] [--enzyme=[STR, STR]] [--mode=STR] --name=NAME --start-stage fastq --exit-stage bam
1. Align reads
hicberg pipeline -o --output=DIR [--cpus=1] [--aligner=STR] [--read-type=STR] [--enzyme=[STR, STR]] [--mode=STR] --name=NAME --start-stage bam --exit-stage groups
2. Group reads
hicberg pipeline -o --output=DIR [--cpus=1] [--enzyme=[STR, STR]] [--mode=STR] --name=NAME --start-stage groups --exit-stage build
3. Build pairs & cool
hicberg pipeline -o --output=DIR [--cpus=1] [--enzyme=[STR, STR]] [--mode=STR] --name=NAME --start-stage build --exit-stage stats
4. Compute statistics
hicberg pipeline -o --output=DIR [--cpus=1] [--enzyme=[STR, STR]] [--mode=STR] --name=NAME --start-stage stats --exit-stage rescue
5. Reassign ambiguous reads, build pairs & cool then get results
hicberg pipeline -o --output=DIR [--cpus=1] [--enzyme=[STR, STR]] [--mode=STR] --name=NAME --start-stage rescue --exit-stage final
```
Evaluating the model
Evaluating the model
General principle
HiC-BERG provide a method to evaluate the inferred reconstructed maps. The evaluation is based on first a split of the original uniquely mapping reads into two sets :
- group1.X.out.bam : alignment files where selected reads are complementary with the group1.X.in.bam from the alignment files. Thus the reads are uniquely mapped (as the original alignment files) and used to learn the statistics for read couple inference.
- group1.X_in.bam: alignment files where selected reads are duplicated between all possible genomic intervals defined by the user. Thus ambiguity is introduced in the alignment of the reads.
Hence, the most plausible couple from fake duplicated reads in group1.X.in.bam is inferred and the corresponding Hi-C contact matrix is built and compared to the one built from the original reads in group1.X.bam (unrescued_map.cool). The two matrices are then compared (modified bins through duplication) compared using the Pearson correlation coefficient that relates the quality of the reconstruction. The closest the coefficient is to 1, the better the reconstruction is.
The evaluation pipeline can be illustrated as follow :

The genomic intervals used to duplicate the reads are defined by the user through the definition of source and target intervals.The source interval is set through the parameters --chromosome , --position and --bins. The target intervals are set through the parameters --strides and eventually --trans_chromosome with --trans_position.
So in a basic example considering only one chromosome and two artificial duplicated sequence, it is necessary to define a source interval corresponding to the chromosome of interest and a target interval corresponding to the duplicated sequence. The source interval is defined by the chromosome name (chromosome), the position (--position) and the width of the interval in number of bins (bins).
Thus the source interval is defined as $[chromosome:position-binsbin size ; chromosome:position+bins * binsize]$ and the target interval as $[chromosome:(position-binsbinsize) + stride ; chromosome:(position+bins*binsize) + stride]$.
For example, if the source interval is chromosome 1, position 68000 and strides set as [0, 50000] with a bin size of 2000bp, the source interval is defined as chr1:68000-70000 and the target interval is defined as chr1:118000-120000.
The files group1.1.in.bam, group1.2.in.bam, group1.1.out.bam and group1.2.out.bam will be created.
The duplicated aligned reads should look like this :
group1.1.in.bam :
NS500150:487:HNLLNBGXC:1:11101:1071:2862 0 chr1 69227 255 35M * 0 0 ATCTGTTGTGNNGAAGGATACTCCCAGAACTCGTT AAAAAEEEAE##EEEEEEEEEEEEEEEEEEEEEEE AS:i:-2 XN:i:0 XM:i:2 XO:i:0 NM:i:2 MD:Z:10G0A23 YT:Z:UU XG:i:230218
NS500150:487:HNLLNBGXC:1:11101:1071:2862 0 chr1 119227 255 35M * 0 0 ATCTGTTGTGNNGAAGGATACTCCCAGAACTCGTT AAAAAEEEAE##EEEEEEEEEEEEEEEEEEEEEEE AS:i:-2 XN:i:0 XM:i:2 XO:i:0 NM:i:2 MD:Z:10G0A23 YT:Z:UU XG:i:230218 XF:Z:Fake
NS500150:487:HNLLNBGXC:1:11101:3001:19423 16 chr1 118866 255 35M * 0 0 GAAAAAGGATTGGTCCAATAAGTGGGAAAAAAGAT EEAEEEAEE/EAE/EEEEEEEE/EEEEEE6AAAAA AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218
NS500150:487:HNLLNBGXC:1:11101:3001:19423 16 chr1 68866 255 35M * 0 0 GAAAAAGGATTGGTCCAATAAGTGGGAAAAAAGAT EEAEEEAEE/EAE/EEEEEEEE/EEEEEE6AAAAA AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218 XF:Z:Fake
NS500150:487:HNLLNBGXC:1:11101:4986:15168 16 chr1 69239 255 35M * 0 0 GAAGGATACTCCCAGAACTCGTTACTGTCTGGACT EEEEEEEEEEEEEEEEEEEEEEEAEEEAEEAAAAA AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218
NS500150:487:HNLLNBGXC:1:11101:4986:15168 16 chr1 119239 255 35M * 0 0 GAAGGATACTCCCAGAACTCGTTACTGTCTGGACT EEEEEEEEEEEEEEEEEEEEEEEAEEEAEEAAAAA AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218 XF:Z:Fake
...
group1.2.in.bam :
NS500150:487:HNLLNBGXC:1:11101:1071:2862 16 chr1 103994 255 35M * 0 0 TGCTTTTTTGGGATTGGGAATGATTTTTCCTCCTT EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAAAA AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218
NS500150:487:HNLLNBGXC:1:11101:3001:19423 16 chr1 121776 255 35M * 0 0 GGTCAAGAAATGGTTTTCACAGGCGAAATCATTGG EEEEEEEEEEE<EEEE/EEEEEEEEEEAEEAAAAA AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218
NS500150:487:HNLLNBGXC:1:11101:4986:15168 0 chr1 86626 255 35M * 0 0 GATCTAGGGGTACCTCCTCGGGAAACATCCAGCCC AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE AS:i:0 XN:i:0 XM:i:0 XO:i:0 NM:i:0 MD:Z:35 YT:Z:UU XG:i:230218
...
The XF:Z:Fake signed read duplication.
In the case of trans chromosomal duplications, the user has to specify the names of trans chromosomes and the relative positions for each trans chromosome selected. The user has to provide as many position as the number of chromosome names selected.
For example, if the source interval is chromosome 1, position 100000 and strides set as [0, 50000] with a bin size of 2000bp and the specified trans chromosomes and trans positions are respectively [chr2, chr8] and [70000, 130000], the source interval is defined as chr1:100000-102000 and the target intervals are defined as chr1:150000-152000, chr2:70000-72000 and chr8:130000-132000.
The stride is the number of bins between the first bin of the source interval and the first bin of the target interval. The stride can be negative or positive. If the stride is negative, the target interval is located before the source interval. If the stride is positive, the target interval is located after the source interval. The stride can be set to 0, in this case the target interval is the same as the source interval. The target interval can be located on the same chromosome as the source interval or on another chromosome. In this case, the chromosome name and the position of the first bin of the target interval must be specified. All the parameters --position, --strides, --trans-chromosome and --trans-position should be provided as coma separated lists.
The benchmark can be performed considering several modes. The modes are defined by the parameter --modes. The modes are defined as a list of strings separated by comas. The modes are the same as the one used for the reconstruction :
- full
- random
- ps
- cov
- d1d2
- density
- standard (ps and cov)
- one_enzyme (ps, cov and d1d2)
- omics (ps, cov)
N.B : depending on the modes selected for the benchmark, if one of the mode is not included in the list of modes selected for the reconstruction, the reconstruction will not be performed for this mode, and the corresponding statistics will not be computed.
The evaluation can be run using the following command :
bash
hicberg benchmark --output=DIR [--chromosome=STR] [--position=INT] [--trans-chromosome=STR][--trans-position=INT] [--stride=INT] [--bins=INT] [--auto=INT] [--rounds=INT] [--magnitude=FLOAT] [--modes=STR] [--pattern=STR] [--threshold=FLOAT] [--jitter=INT] [--trend] [--top=INT] [--genome=STR][--force]
Considering a benchmark with 4 artificially duplicated sequences set at chr1:100000-102000 (source), chr1:200000-202000 (target 1), chr4:50000-52000 (target 2) and chr7:300000-302000 (target 3), with 2000bp as bin size and considering full and ps_only modes to get the performance of the reconstructions considering a folder named "test" previously created on the desktop containing the original alignment files and the unreconstructed maps, the command line is the following :
bash
hicberg benchmark -o ~/Desktop/test/ -c chr1 -p 100000 -s 0,100000 -C chr4,chr7 -P 50000,30000 -m full,ps_only -g <genome>
It is also possible to let the source and target intervals being picked at random. However in such cases, the empty bins are not considered in the evaluation. The random mode is activated by setting the parameter --auto to the number of desired artificially duplicated sequences.
Thus, considering a benchmark with 100 artificially duplicated sequences , with 2000bp as bin size and considering full and ps_only modes to get the performance of the reconstructions considering a folder named "test" previously created on the desktop containing the original alignment files and the unreconstructed maps, the command line is the following :
bash
hicberg benchmark -o ~/Desktop/test/ -a 100 -m full,ps_only -g <genome.fa>
Pattern based evaluation
A pattern based strategy can be set. Such strategy is similar to the strategy where genomic intervals have to be defined as mentioned above where user has to provide th genomic coordinates of the intervals for read selections whereas in pattern base strategy, only pattern type has to be specified to set the genomic intervals. The pattern is defined by the parameter --pattern.
Such patterns are going to be detected from the original Hi-C map using Chromosight. Then genomic coordinates of the detected patterns are going to be used to select the reads for the evaluation. The number of duplication are going to be adjusted by specifying the chromosome name with --chromosome parameter, the --threshold parameter to set the minimum Pearson score to consider a pattern as detected and the --top parameter to eventually keep the top k% to the remaining detected patterns. Thus, the genomic intervals are going to be defined as the genomic coordinates of the detected patterns.
Thus, the same read strategy will be applied. After reconstruction, the evaluation will be performed using the Pearson correlation coefficient between the original and reconstructed bins selected from the original map. Furthermore, a second pattern detection with Chromosight will be performed on the reconstructed map. Then, the precision, recall and F1 score will be computed to evaluate the reconstruction, by comparing the number of retrieved patterns while identifying false positives and false negatives.
N.B. : Because of the stochasticity of Chromosight while splitting the Hi-C map for pattern detection, some patterns can be considered as false positives and negatives because their coordinates after reconstruction are aside of the original one. We recommend using the --jitter parameter to allow pattern to be considered as retrieved post-reconstruction if they are detected at j bins from the original coordinates.
Considering a benchmark based on loops patterns on chromosome 7 with a threshold of 0.5 with all the detected patterns after thresholding (i.e. 100% rate) and a jitter of 0 with detrend, in full mode, the command line is the following :
bash
hicberg benchmark -o ~/Desktop/test/ -c chr7 -p 100000 -S loops -t 0.5 -k 100 -j 0 -T -m random -g <genome.fa>
Python usage
All components of the hicberg program can be used as python modules. See the documentation on readthedocs. The expected contact map format for the library is a simple COOL file, and the objects handled by the library are simple Numpy arrays through Cooler. The various submodules of hicberg contain various utilities.
python
import hicberg.io #Functions for I/O and folder management.
import hicberg.align #Functions for sequence alignment steps
import hicberg.utils #Functions for handling reads and alignment
import hicberg.statistics #Functions for extract and create statistical models
import hicberg.omics #Functions to treat non Hi-C data.
import hicberg.pipeline #Functions to run end to end Hi-C map reconstruction.
Connecting the modules
All the steps described here are handled automatically when running the hicberg pipeline. But if you want to connect the different modules manually, the intermediate input and output files can be processed using some python scripting.
File formats
pair files: This format is used for all intermediate files in the pipeline and is also used by hicberg build_matrix. It is a tab-separated format holding information about Hi-C pairs. It has an official specification defined by the 4D Nucleome data coordination and integration center.
- readID: Read (pair) identifier.
- chr1: Chromosome identifier of the forward read of the pair.
- pos1: 0-based position of the forward mate, in base pairs.
- chr2: Chromosome identifier of the reverse read of the pair.
- pos2: 0-based position of the reverse mate, in base pairs.
- strand1: Orientation of the aligned forward mate.
- strand2: Orientation of the aligned reverse mate.
```
pairs format v1.0
columns: readID chr1 pos1 strand1 chr2 pos2 strand2
chromsize: chr1 230218
chromsize: chr2 813184
NS500150:487:HNLLNBGXC:1:11101:3066:7109 chr2 683994 chr2 684725 - - NS500150:487:HNLLNBGXC:1:11101:6114:4800 chr2 795379 chr2 796279 + + NS500150:487:HNLLNBGXC:1:11101:6488:14927 chr2 379433 chr2 379138 - + ... ```
cool files: This format is used to store genomic interaction data such as Hi-C contact matrices. These file can be handled using
coolerPython library.npy files: This format is used to store dictionaries containing information about genomic coordinates, binning or statistical laws. Dictionaries are stores with chromosome as key and arrays as values. Such file can be handled using
numpyPython library.- chromosome_sizes.npy : This file is used to store the size of each chromosome. Structure is the following :
{chromosome: size} - xs.npy : This file is used to store the log binned genome. Structure is the following :
{chromosome: [log bins]}with log bins a list of integers. - uncuts.npy : This file is used to store the distribution of uncuts. Structure is the following :
{chromosome: [distribution]}with distribution a list of integers. - loops.npy : This file is used to store the distribution of loops. Structure is the following :
{chromosome: [distribution]}with distribution a list of integers. - weirds.npy : This file is used to store the distribution of weirds. Structure is the following :
{chromosome: [distribution]}with distribution a list of integers. - pseudo_ps.npy : This file is used to store the distribution of pseudo ps. Structure is the following :
{(chrom1, chrom2): [map]}with (chrom1, chrom2) a tuple of chromosomes where chrom1 is different than chrom2 and map a float value. - coverage.npy : This file is used to store the coverage of the genome. Structure is the following :
{chromosome: [coverage]}with coverage a list of integers. - d1d2.npy : This file is used to store the d1d2 law. Structure is the following :
[distribution]with distribution a list of integers. - density_map.npy : This file is used to store the density map. Structure is the following :
{(chrom1, chrom2): [density map]}with (chrom1, chrom2) a tuple of chromosomes density map a 2D numpy array.
- chromosome_sizes.npy : This file is used to store the size of each chromosome. Structure is the following :
bt2l files: Thi format is used to store index of genomes performer using Bowtie2.
bam files: This format is used to built analyses on, by several functions of hicberg. It is a compressed standard alignment format file providing multiple information about read alignments performer by Bowtie2. Such files can be handled through Samtools and it's Python wrapper PySam. More details about SAM and BAM format can be found here.
bed files: This format is used to store genomic intervals. It is a tab-separated format holding information about genomic intervals. It is a standard format used by the UCSC genome browser.
chr4 150 200
chr6 300 400
chr4 800 900
chr2 680000 684000
...
- chromosome_sizes.bed : This file is used to store the size of each chromosome. Structure is the following :
chromosome start end - coverage.bed : This file is used to store the coverage of the genome. Structure is the following :
chromosome start end coverage - coverage.bedgraph : This file is used to store the coverage of the genome. Structure is the following :
chromosome start end coverage signal.bw : This file is used to store the coverage of the genome. Structure is the following :
chromosome start end coverage- fragmentsfixedsizes.txt:
chrom: Chromosome identifier. Order should be the same as in pairs files.
start: 0-based start of fragment, in base pairs.
end: 0-based end of fragment, in base pairs.
chrom start end
chr1 0 2000
chr1 2000 4000
chr1 4000 6000
...
chr1 14000 16000
...
chr2 0 2000
chr2 2000 4000
...
Contributing
All contributions are welcome, in the form of bug reports, suggestions, documentation or pull requests. We use the Numpy standard for docstrings when documenting functions.
The code formatting standard we use is black, with --line-length=79 to follow PEP8 recommendations. We use pytest with the pytest-doctest and pytest-pylint plugins as our testing framework. Ideally, new functions should have associated unit tests, placed in the tests folder. To test the code, you can run:
```bash coverage run --source=hicberg -m pytest -v tests --cov-report=xml
```
Authors
Citation
Owner
- Login: sebgra
- Kind: user
- Location: France
- Repositories: 1
- Profile: https://github.com/sebgra
I'm a bioengineer and developper specilized in Bioinformatics, Biological image processing, machine & deep learning, drug development and biomecanics
GitHub Events
Total
- Create event: 4
- Issues event: 11
- Release event: 9
- Watch event: 2
- Delete event: 1
- Public event: 1
- Push event: 77
- Fork event: 1
Last Year
- Create event: 4
- Issues event: 11
- Release event: 9
- Watch event: 2
- Delete event: 1
- Public event: 1
- Push event: 77
- Fork event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 19 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
pypi.org: hicberg
Standalone command line tool to visualize coverage from a BAM file
- Homepage: https://github.com/sebgra/hicberg
- Documentation: https://hicberg.readthedocs.io/
- License: MIT
-
Latest release: 1.0.1
published 9 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v2 composite
- mamba-org/setup-micromamba v1 composite
- condaforge/mambaforge latest build
- bioframe *
- biopython *
- click *
- cooler *
- funcy *
- hicstuff *
- matplotlib *
- numpy *
- pandas *
- pysam *
- scikit-learn *
- scipy *
- statsmodels *
- bioframe *
- biopython *
- click *
- cooler *
- funcy *
- hicstuff *
- matplotlib *
- numpy *
- pandas *
- pysam *
- scikit-learn *
- scipy *
- statsmodels *
- JamesIves/github-pages-deploy-action 3.7.1 composite
- actions/checkout v2 composite
- actions/setup-python v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
- _libgcc_mutex 0.1
- _openmp_mutex 4.5
- aioeasywebdav 2.4.0
- aiohttp 3.9.1
- aiosignal 1.3.1
- alsa-lib 1.2.9
- amply 0.1.6
- appdirs 1.4.4
- asttokens 2.2.1
- attmap 0.13.2
- attr 2.5.1
- attrs 23.1.0
- backcall 0.2.0
- backports 1.0
- backports.functools_lru_cache 1.6.5
- bcrypt 4.1.1
- boto3 1.33.6
- botocore 1.33.6
- brotli 1.1.0
- brotli-bin 1.1.0
- brotli-python 1.1.0
- bzip2 1.0.8
- c-ares 1.23.0
- ca-certificates 2023.11.17
- cachetools 5.3.2
- cairo 1.16.0
- cffi 1.16.0
- chardet 4.0.0
- charset-normalizer 3.3.2
- coin-or-cbc 2.10.10
- coin-or-cgl 0.60.7
- coin-or-clp 1.17.8
- coin-or-osi 0.108.8
- coin-or-utils 2.11.9
- coincbc 2.10.10
- colorama 0.4.6
- comm 0.1.3
- configargparse 1.7
- connection_pool 0.0.3
- cryptography 41.0.5
- datrie 0.8.2
- dbus 1.13.6
- debugpy 1.6.7
- decorator 5.1.1
- defusedxml 0.7.1
- docutils 0.20.1
- dpath 2.1.6
- dropbox 11.36.2
- eido 0.2.2
- exceptiongroup 1.2.0
- executing 1.2.0
- expat 2.5.0
- filechunkio 1.8
- font-ttf-dejavu-sans-mono 2.37
- font-ttf-inconsolata 3.000
- font-ttf-source-code-pro 2.038
- font-ttf-ubuntu 0.83
- fontconfig 2.14.2
- fonts-conda-ecosystem 1
- fonts-conda-forge 1
- freetype 2.12.1
- frozenlist 1.4.0
- ftputil 5.0.4
- gettext 0.21.1
- gitdb 4.0.11
- gitpython 3.1.40
- glib 2.78.1
- glib-tools 2.78.1
- google-api-core 2.14.0
- google-api-python-client 2.109.0
- google-auth 2.24.0
- google-auth-httplib2 0.1.1
- google-cloud-core 2.3.3
- google-cloud-storage 2.13.0
- google-crc32c 1.1.2
- google-resumable-media 2.6.0
- googleapis-common-protos 1.61.0
- graphite2 1.3.13
- grpcio 1.59.3
- gst-plugins-base 1.22.5
- gstreamer 1.22.5
- harfbuzz 7.3.0
- htslib 1.17
- httplib2 0.22.0
- humanfriendly 10.0
- icu 72.1
- idna 3.6
- importlib-metadata 6.8.0
- importlib_metadata 6.8.0
- importlib_resources 6.1.1
- iniconfig 2.0.0
- ipykernel 6.24.0
- ipympl 0.9.3
- ipython 8.14.0
- ipython_genutils 0.2.0
- ipywidgets 8.1.1
- jedi 0.18.2
- jinja2 3.1.2
- jmespath 1.0.1
- jsonschema 4.20.0
- jsonschema-specifications 2023.11.2
- jupyter_client 8.3.0
- jupyter_core 5.3.1
- jupyterlab_widgets 3.0.9
- keyutils 1.6.1
- krb5 1.21.2
- lame 3.100
- lcms2 2.15
- ld_impl_linux-64 2.40
- lerc 4.0.0
- libabseil 20230802.1
- libblas 3.9.0
- libbrotlicommon 1.1.0
- libbrotlidec 1.1.0
- libbrotlienc 1.1.0
- libcap 2.69
- libcblas 3.9.0
- libclang 15.0.7
- libclang13 15.0.7
- libcrc32c 1.1.2
- libcups 2.3.3
- libcurl 8.4.0
- libdeflate 1.18
- libedit 3.1.20191231
- libev 4.33
- libevent 2.1.12
- libexpat 2.5.0
- libffi 3.4.2
- libflac 1.4.3
- libgcc-ng 13.1.0
- libgcrypt 1.10.3
- libgfortran-ng 13.2.0
- libgfortran5 13.2.0
- libglib 2.78.1
- libgomp 13.1.0
- libgpg-error 1.47
- libgrpc 1.59.3
- libiconv 1.17
- libjpeg-turbo 2.1.5.1
- liblapack 3.9.0
- liblapacke 3.9.0
- libllvm15 15.0.7
- libnghttp2 1.58.0
- libnsl 2.0.0
- libogg 1.3.4
- libopenblas 0.3.25
- libopus 1.3.1
- libpng 1.6.39
- libpq 15.4
- libprotobuf 4.24.4
- libre2-11 2023.06.02
- libsndfile 1.2.2
- libsodium 1.0.18
- libsqlite 3.42.0
- libssh2 1.11.0
- libstdcxx-ng 13.1.0
- libsystemd0 254
- libtiff 4.6.0
- libuuid 2.38.1
- libvorbis 1.3.7
- libwebp-base 1.3.2
- libxcb 1.15
- libxkbcommon 1.6.0
- libxml2 2.11.5
- libzlib 1.2.13
- logmuse 0.2.6
- lz4-c 1.9.4
- markdown-it-py 3.0.0
- markupsafe 2.1.3
- matplotlib-base 3.7.0
- matplotlib-inline 0.1.6
- mdurl 0.1.0
- mpg123 1.32.3
- multidict 6.0.4
- munkres 1.1.4
- mysql-common 8.0.33
- mysql-libs 8.0.33
- nbformat 5.9.2
- ncurses 6.4
- nest-asyncio 1.5.6
- nspr 4.35
- nss 3.92
- oauth2client 4.1.3
- openjpeg 2.5.0
- openssl 3.1.4
- packaging 23.1
- paramiko 3.3.1
- parso 0.8.3
- pcre2 10.42
- peppy 0.35.7
- pexpect 4.8.0
- pickleshare 0.7.5
- pillow 10.0.1
- pip 23.2
- pixman 0.42.2
- pkgutil-resolve-name 1.3.10
- plac 1.4.1
- platformdirs 3.9.1
- ply 3.11
- prettytable 3.9.0
- prompt-toolkit 3.0.39
- prompt_toolkit 3.0.39
- protobuf 4.24.4
- psutil 5.9.5
- pthread-stubs 0.4
- ptyprocess 0.7.0
- pulp 2.7.0
- pulseaudio-client 16.1
- pure_eval 0.2.2
- pyasn1 0.5.1
- pyasn1-modules 0.3.0
- pycparser 2.21
- pygments 2.15.1
- pynacl 1.5.0
- pyopenssl 23.3.0
- pyqt 5.15.9
- pyqt5-sip 12.12.2
- pysftp 0.2.9
- pysocks 1.7.1
- python 3.11.4
- python-dateutil 2.8.2
- python-fastjsonschema 2.19.0
- python-irodsclient 1.1.9
- python-tzdata 2023.3
- python_abi 3.11
- pyu2f 0.1.5
- pyyaml 6.0.1
- pyzmq 25.1.0
- qt-main 5.15.8
- re2 2023.06.02
- readline 8.2
- referencing 0.31.1
- requests 2.31.0
- reretry 0.11.8
- rich 13.7.0
- rpds-py 0.13.2
- rsa 4.9
- s3transfer 0.8.2
- samtools 1.17
- seqtk 1.4
- setuptools 68.0.0
- setuptools-scm 8.0.4
- sip 6.7.12
- six 1.16.0
- slacker 0.14.0
- smart_open 6.4.0
- smmap 5.0.0
- snakemake 7.32.3
- snakemake-minimal 7.32.3
- stack_data 0.6.2
- stone 3.3.1
- stopit 1.1.2
- tabulate 0.9.0
- throttler 1.2.2
- tk 8.6.13
- toml 0.10.2
- tomli 2.0.1
- toposort 1.10
- tornado 6.3.2
- traitlets 5.9.0
- typing-extensions 4.7.1
- typing_extensions 4.7.1
- tzdata 2023c
- ubiquerg 0.6.3
- uritemplate 4.1.1
- veracitools 0.1.3
- wcwidth 0.2.6
- wheel 0.40.0
- widgetsnbextension 4.0.9
- wrapt 1.16.0
- xcb-util 0.4.0
- xcb-util-image 0.4.0
- xcb-util-keysyms 0.4.0
- xcb-util-renderutil 0.3.9
- xcb-util-wm 0.4.1
- xkeyboard-config 2.40
- xorg-kbproto 1.0.7
- xorg-libice 1.1.1
- xorg-libsm 1.2.4
- xorg-libx11 1.8.7
- xorg-libxau 1.0.11
- xorg-libxdmcp 1.1.3
- xorg-libxext 1.3.4
- xorg-libxrender 0.9.11
- xorg-renderproto 0.11.1
- xorg-xextproto 7.3.0
- xorg-xf86vidmodeproto 2.3.1
- xorg-xproto 7.0.31
- xz 5.2.6
- yaml 0.2.5
- yarl 1.9.3
- yte 1.5.1
- zeromq 4.3.4
- zipp 3.16.2
- zlib 1.2.13
- zstd 1.5.5