delfies

delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition - Published in JOSS (2025)

https://github.com/bricoletc/delfies

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation ·

Repository

Querying genomes for evidence of Programmed DNA Elimination

Basic Info
  • Host: GitHub
  • Owner: bricoletc
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 435 KB
Statistics
  • Stars: 2
  • Watchers: 3
  • Forks: 1
  • Open Issues: 3
  • Releases: 8
Created about 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

PyPI codecov License: MIT JOSS paper status

delfies is a tool that identifies genomic locations where double-strand breaks have occurred followed by telomere addition. It was initially designed and validated for studying the process of Programmed DNA Elimination in nematodes, but should work for other clades and applications too.

For details/to credit the tool, please see/cite the associated paper:

Letcher, B. and Delattre, M. (2025). delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition. Journal of Open Source Software, 10(105), 7385, https://doi.org/10.21105/joss.07385

Getting started

delfies takes as input a genome fasta (gzipped supported) and an indexed SAM/BAM of sequencing reads aligned to the genome.

sh delfies --help samtools index <aligned_reads>.bam delfies <genome>.fa.gz <aligned_reads>.bam <output_dir> cat <output_dir>/breakpoint_locations.bed

For how to obtain a suitable SAM/BAM, see input data, and for downloading a real genome and BAMs for a test run of delfies, see test run.

Table of Contents

Installation

Using pip (or equivalent - uv, etc.): ```sh

Install latest release from PyPI

pip install delfies

Or install a specific release from PyPI:

pip install delfies==0.10.0

Or clone and install tip of main

git clone https://github.com/bricoletc/delfies/ pip install ./delfies ```

Input data

Sequencing technologies

delfies is designed to work with both Illumina short reads and ONT or PacBio long reads. Long reads are better for finding breakpoints in more repetitive regions of the genome. A high fraction of sequenced bases with a quality >Q20 is desirable (e.g. >70%). I found delfies worked on recent data from all three sequencing technologies: see test run below.

Aligners

To produce a SAM/BAM with which you can find breakpoints, you need to use a read aligner that reports soft clips (parts of a reads that are not aligned to the reference). Both bowtie2 (in --local mode) and minimap2 (by default) do this. Use minimap2 for long reads (>300bp), with the appropriate preset (e.g. -x map-ont for Nanopore data).

Test run with real data

I provide a processed subset of publicly-available data here: https://doi.org/10.5281/zenodo.14101797.

The data consist of a 2kbp region of the assembled genome of Oscheius onirici and three alignment BAMs from sequencing data produced using Illumina, ONT and PacBio. The data were aligned to the 2kbp region using minimap2. See the Zenodo link for details on the sequencing data (read lengths, error rates) and public links to the raw data.

You can run delfies on the inputs in this archive to make sure it is properly installed and produces the expected outputs:

```sh wget https://zenodo.org/records/14282333/files/delfieszenodotestdata.tar.gz tar xf delfieszenodotestdata.tar.gz

Run delfies; for example, having defined genome, bam and odirname variables:

delfies --threads 16 \ --teloforwardseq TTAGGC \ --breakpointtype all \ --minmapq 20 \ --minsupportingreads 6 \ ${genome} ${bam} ${odirname}

Compare with the expected outputs:

find delfieszenodotestdata -name "*breakpointlocations.bed" | xargs cat ```

User Manual

CLI options

sh delfies --help

  • Do use the --threads option if you have multiple cores/CPUs available.
  • [Breakpoints]
    • There are two types of breakpoints: see detailed docs.
    • Nearby breakpoints can be clustered together to account for variability in breakpoint location (--clustering_threshold).
  • [Region selection]: You can select a specific region to focus on, specified as a string or as a BED file.
  • [Telomeres]
    • Specify the telomere sequence for your organism using --telo_forward_seq. If you're unsure, I recommend the tool telomeric-identifier for finding out.
    • By default, delfies discards breakpoints occurring inside telomere arrays, as they in theory correspond to false positives (cutting + telomere addition at existing telomeres). You can keep these breakpoints with --keep_telomeric_breakpoints.
  • [Aligned reads]
    • To analyse confidently-aligned reads only, you can filter reads by MAPQ (--min_mapq) and by bitwise flag (--read_filter_flag).
    • You can tolerate more or less mutations in the assembly telomeres (and in the sequencing reads) using --telo_max_edit_distance and --telo_array_size.

Outputs

The two main outputs of delfies are:

  • breakpoint_locations.bed: a BED-formatted file containing the location of identified elimination breakpoints.
  • breakpoint_sequences.fasta: a FASTA-formatted file containing the sequences of identified elimination breakpoints

Validating breakpoints

I highly recommend visualising your results! E.g., by loading your input fasta and BAM and output delfies' output breakpoint_locations.bed in IGV.

Confident/true breakpoints will typically have:

  • Good read support. Note that breakpoints are ordered by read support in the delfies output file breakpoint_locations.bed, and you can require a minimum number of supporting reads using the CLI option --min_supporting_reads.
  • A difference in read coverage before and after the breakpoint. The nature of this difference depends on the ratio between cells with and without the breakpoint. As an example, in organisms that eliminate parts of their genome in the soma, if most sequenced cells are from the soma, expect more reads before the breakpoint than after it ('before' and 'after' defined relative to the reported breakpoint strand).

Ultimately though, only biological experiments can truly validate identified breakpoints.

Applications

  • The fasta output enables looking for sequence motifs that occur at breakpoints, e.g. using MEME.
  • The BED output enables classifying a genome into retained and eliminated regions. The 'strand' of breakpoints is especially useful for this: see detailed docs.
  • The BED output also enables assembling past somatic telomeres: for how to do this, see detailed docs.

Detailed documentation

For more details on delfies, including outputs and applications, see detailed_docs.

Contributing

Contributions always welcome!

Please see CONTRIBUTING.md for how (reporting issues, requesting features, contributing code). This document includes instructions on how to run delfies' unit and functional tests.

Owner

  • Name: Brice Letcher
  • Login: bricoletc
  • Kind: user
  • Company: EMBL-EBI

Bioinformatician and early-career researcher - EMBL-EBI and CNRS ~~~~~~ Parsing my way through DNA sequence data

JOSS Publication

delfies: a Python package for the detection of DNA breakpoints with neo-telomere addition
Published
January 12, 2025
Volume 10, Issue 105, Page 7385
Authors
Brice Letcher ORCID
Laboratory of Biology and Modelling of the Cell, Ecole Normale Supérieure de Lyon, CNRS UMR 5239, Inserm U1293, University Claude Bernard Lyon 1, Lyon, France
Marie Delattre ORCID
Laboratory of Biology and Modelling of the Cell, Ecole Normale Supérieure de Lyon, CNRS UMR 5239, Inserm U1293, University Claude Bernard Lyon 1, Lyon, France
Editor
AHM Mahfuzur Rahman ORCID
Tags
Bioinformatics Genomics Programmed DNA Elimination Soma/germline differentiation

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Letcher
  given-names: Brice
  orcid: "https://orcid.org/0000-0002-8921-6005"
- family-names: Delattre
  given-names: Marie
  orcid: "https://orcid.org/0000-0003-1640-0300"
doi: 10.5281/zenodo.14526258
message: If you use this software in your own work, please cite the associated article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Letcher
    given-names: Brice
    orcid: "https://orcid.org/0000-0002-8921-6005"
  - family-names: Delattre
    given-names: Marie
    orcid: "https://orcid.org/0000-0003-1640-0300"
  date-published: 2025-01-12
  doi: 10.21105/joss.07385
  issn: 2475-9066
  issue: 105
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 7385
  title: "delfies: a Python package for the detection of DNA breakpoints
    with neo-telomere addition"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.07385"
  volume: 10
title: "delfies: a Python package for the detection of DNA breakpoints
  with neo-telomere addition"

GitHub Events

Total
  • Create event: 5
  • Release event: 3
  • Issues event: 10
  • Watch event: 3
  • Issue comment event: 7
  • Push event: 27
  • Pull request event: 1
Last Year
  • Create event: 5
  • Release event: 3
  • Issues event: 10
  • Watch event: 3
  • Issue comment event: 7
  • Push event: 27
  • Pull request event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 145
  • Total Committers: 2
  • Avg Commits per committer: 72.5
  • Development Distribution Score (DDS): 0.007
Past Year
  • Commits: 80
  • Committers: 2
  • Avg Commits per committer: 40.0
  • Development Distribution Score (DDS): 0.012
Top Committers
Name Email Commits
Brice Letcher b****r@e****r 144
andrewhsiao11 9****1 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 8
  • Total pull requests: 1
  • Average time to close issues: 12 days
  • Average time to close pull requests: 9 days
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 1
  • Average time to close issues: 12 days
  • Average time to close pull requests: 9 days
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bricoletc (3)
  • natir (3)
  • andrewhsiao11 (2)
Pull Request Authors
  • andrewhsiao11 (2)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 71 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
pypi.org: delfies

delfies is a tool for the detection of DNA Elimination breakpoints

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 71 Last month
Rankings
Dependent packages count: 10.4%
Average: 34.4%
Dependent repos count: 58.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • black 24.1.1 develop
  • flake8 7.0.0 develop
  • isort 5.13.2 develop
  • mccabe 0.7.0 develop
  • mypy-extensions 1.0.0 develop
  • packaging 23.2 develop
  • pathspec 0.12.1 develop
  • platformdirs 4.2.0 develop
  • pycodestyle 2.11.1 develop
  • pyflakes 3.2.0 develop
  • tomli 2.0.1 develop
  • typing-extensions 4.7.1 develop
  • click 8.1.7
  • colorama 0.4.6
  • numpy 1.21.1
  • pybedtools 0.9.1
  • pysam 0.22.0
  • six 1.16.0
pyproject.toml pypi
  • black ^24.1.1 develop
  • flake8 ^7.0.0 develop
  • isort ^5.13.2 develop
  • click ^8.1.7
  • pybedtools ^0.9.1
  • pysam ^0.22.0
  • python ^3.8.1
.github/workflows/draft-pdf.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v3 composite
  • openjournals/openjournals-draft-action master composite
.github/workflows/release_pypi.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • actions/upload-artifact v4 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • snok/install-poetry v1 composite