carpepy

Package for vizualising polarised genomic data

https://github.com/studenecivb/carpepy

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Package for vizualising polarised genomic data

Basic Info

Host: GitHub
Owner: Studenecivb
License: mit
Language: Python
Default Branch: main
Size: 673 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 5

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

CarpePy

Welcome to the CarpePy documentation!

CarpePy is a toolset for visualising polarised genomic data. It is dependent on a pre-processed data from diem package: https://github.com/StuartJEBaird/diem

CarpePy is dependent on pandas and numpy Python packages and we recommend running it using a virtual environment.

Example of running CarpePy with Pneumocystis data

Pneumocystis data: Jan Petružela, Beate Nürnberger, Alexis Ribas, et al. Comparative genomic analysis of co-occurring hybrid zones of house mouse parasites Pneumocystis murina and Syphacia obvelata using genome polarisation. Authorea. January 31, 2025.

First load the input data from diem as a pandas dataframe: python HonzaPneumo_df = pd.read_csv(file_path,sep=',') HonzaPneumo = HonzaPneumo_df.values.tolist() This is how we want the HonzaPneumo_df to look like approximately:

| Row | diemmarkpos | scaffold | refpos | V3 | ... | SK1151DM | SU4201DM | ... | isig | osig | admixturecategory | genic | cds | msg | |-----|--------------|----------|----------------|-----|-----|-----------|-----------|------|-------|-------|--------------------|-------|-----|-----| | 0 | 0 | m1 | AFWA02000001.1 | 44 | ... | 0 | 2 | .... | 0 | 0 | barr | 0 | 0 | 0 | | 1 | 1 | m2 | AFWA02000001.1 | 94 | ... | 0 | _ | ... | 0 | 0 | barr | 0 | 0 | 0 | | 2 | 2 | m3 | AFWA02000001.1 | 314 | ... | 0 | 2 | ... | 0 | 0 | barr | 0 | 0 | 0 |

Now we are going to take the thrid and fourth column to get the BED information into a separate variable: python third_column = [row[2] for row in HonzaPneumo] fourth_column = [row[3] for row in HonzaPneumo] HonzaPneumoBED = list(zip(third_column, fourth_column)) As the next step, we want to extract the names of the individuals: python first_row = HonzaPneumo_df.columns.tolist() HonzaPneumoIndIDs = first_row[18:-6] And then the markers and the selected input data: python column_indices = list(range(18, 18 + len(HonzaPneumoIndIDs))) HonzaPneumoSelected = [[row[i] for i in column_indices] for row in HonzaPneumo] HonzaPneumoMarkers = ["".join(map(str, row)) for row in HonzaPneumoSelected] HonzaPneumoPolariseNjoin = [list(row) + [marker] for row, marker in zip(HonzaPneumoBED, HonzaPneumoMarkers)] Now we are finally ready to run the diem Plot Prepper: ```python plot_theme = "Pneumocystis"

PneumoPlotPrep = DiemPlotPrep(plottheme='Pneumocystis', polariseddata=HonzaPneumoPolariseNjoin, indids=HonzaPneumoIndIDs, dithreshold="NO DI FILTER", dicolumn=5, physres=1, ticks='kb') PneumoPlotPrep.formatbeddata() ``` The arguments include the plot theme, the polarised and processed data, the index names, diagnostic index filtering if we want any and the column we want to use for it, resolution (in this case 1) and lastly the tick sizes we want - either kb or mb, depending on our data.

Now we are all prepped to run either the Unit plots - which represent the unit of our genome, either a scaffold or chromosome or then the IrisPlot which shows us the whole genome. python for i in range(len(PneumoPlotPrep.unit_plot_prep)): diemUnitPlot(PneumoPlotPrep.unit_plot_prep[i], bed_data=PneumoPlotPrep.DIfilteredBED_formatted[i], index=i+1, path='output_path', names_list=PneumoPlotPrep.IndIDs_ordered, ticks='kb') The output of the unit plot:

And now the IrisPlot: ```python

diemIrisPlot(PneumoPlotPrep.diemDITgenomesordered, names=PneumoPlotPrep.IndIDsordered, bedinfo=PneumoPlotPrep.irisplotprep, lengthofchromosomes=PneumoPlotPrep.lengthofchromosomes, heatmap=heatmapmap, path=outputpath, png='cuteiris', pdf='cute_iris') ``` The arguments include the chromosome names, BED information and the diemDITheredgenomes that are ordered according to the Hybrid Index. If you do not add any png or pdf name, the plot will just be shown, pdf and png names allow it to be saved into a folder.

We can also add a heatmap to the IrisPlot and it should be processed: python heatmap_pre_values = list(HonzaPneumo_df.iloc[:, -4]) rle_heatmap_values = np.array(RichRLE(heatmap_pre_values)).T heatmap_map = np.delete(rle_heatmap_values, 1, axis=1) The output of Iris Plot:

Please if you have any questions, contact us on: ninahaladova@gmail.com

Cite as: Baird, S. J. E., & Daley, N. (2025). CarpePy (Version 0.0.1) [Computer software]

Owner

Name: IVB_Studenec
Login: Studenecivb
Kind: organization

Repositories: 1
Profile: https://github.com/Studenecivb

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Baird
    given-names: Stuart J.E.
  - family-names: Daley
    given-names: Nina
title: "CarpePy"
version: 0.0.6
date-released: 2025-01-26

GitHub Events

Total

Release event: 5
Push event: 10
Create event: 8

Last Year

Release event: 5
Push event: 10
Create event: 8

Packages

Total packages: 1
Total downloads:
- pypi 22 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 4
Total maintainers: 1

pypi.org: carpepy

Module for visualising polarised genomes

Documentation: https://carpepy.readthedocs.io/
License: mit
Latest release: 0.0.5
published over 1 year ago

Versions: 4
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 22 Last month

Rankings

Dependent packages count: 9.7%

Average: 32.2%

Dependent repos count: 54.7%

Maintainers (1)

nina215

Last synced: 10 months ago

Dependencies

requirements.txt pypi

matplotlib ==3.6.2
numpy ==1.24.1
pandas ==1.5.2
scipy ==1.15.1
setuptools ==65.5.0
setuptools ==68.2.0

setup.py pypi

matplotlib *
numpy *
pandas *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science