pyslim
Tools for dealing with tree sequences coming to and from SLiM.
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
8 of 15 committers (53.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Keywords from Contributors
Repository
Tools for dealing with tree sequences coming to and from SLiM.
Basic Info
- Host: GitHub
- Owner: tskit-dev
- License: mit
- Language: Python
- Default Branch: main
- Size: 15.8 MB
Statistics
- Stars: 28
- Watchers: 6
- Forks: 23
- Open Issues: 30
- Releases: 7
Metadata Files
README.html
pySLiM
SLiM can now read, and write tree sequences, which store genealogical data of populations. SLiM can use a tree sequence produced by the coalescent simulator
msprimeto initialize a simulation, but to do so we need to add the relevant metadata. SLiM can also write out history as a tree sequence, and in so doing it records extra information in metadata. This package makes it easy to add the relevant metadata to a tree sequence so that SLiM can use it, and to read the metadata in a SLiM-produced tree sequence.The SLiM manual documents how the extra metadata is stored in the tree sequence files, and provides additional examples of how to use this package.
Installation
To install
pyslim, dogit clone https://github.com/tskit-dev/pyslim.git cd pyslim python setup.py install --userYou should also be able to install it with
pip install pyslim. You’ll also need an up-to-date msprime and SLiM, of course.To run the tests to make sure everything is working, do:
cd tests/examples for x in *.slim; do slim $x; done cd - python -m nose testsNote: if you use
python3you may need to replacepythonwithpython3above.Quickstart: coalescent simulation for SLiM
The
pyslim.annotate()command will add default information to a tree sequence, allowing it to be read in by SLiM. This will simulate a tree sequence with msprime, add SLiM information, and write it out to a.treesfile:import msprime import pyslim # simulate a tree sequence of 12 nodes ts = msprime.simulate(12, mutation_rate=1.0, recombination_rate=1.0) new_ts = pyslim.annotate_defaults(ts, model_type="nonWF", slim_generation=1) new_ts.dump("slim_ts.trees")Quickstart: reading SLiM metadata
To retrieve the extra information that SLiM stores in a tree sequence, use the
extract_X_metadata()functions, whereXis one ofmutation,population,node, orindividual. For instance, to see the age of each Individual produced by annotation in the previous example:for ind in pyslim.extract_individual_metadata(new_ts.tables): print(ind.age)In this example, all the ages are 0 (the default).
Quickstart: modifying SLiM metadata
To modify the metadata that
pyslimhas introduced into a coalescent simulation, or the metadata in a SLiM-produced tree sequence, use theannotate_X_metadata()functions. For instance, to set the ages of the individuals in the tree sequence to random numbers between 1 and 4, and write out the resulting tree sequence:import random tables = new_ts.dump_tables() ind_md = list(pyslim.extract_individual_metadata(tables)) for ind in ind_md: ind.age = random.choice([1,2,3,4]) pyslim.annotate_individual_metadata(tables, ind_md) mod_ts = pyslim.load_tables(tables, slim_format=True) for ind in pyslim.extract_individual_metadata(mod_ts.tables): print(ind.age) mod_ts.dump("modified_ts.trees")Documentation
Here we describe the technical details. Currently, the python package
msprimesimulates tree sequences and provides tools to work with them. In the future, tools for working with tree sequences will be separated into a package calledtskit.pyslimprovides a thin interface betweenmsprime/tskit.Metadata entries
SLiM records additional information in the metadata columns of Population, Individual, Node, and Mutation tables. The information is recorded in a binary format, and is extracted and written by
pyslimusing the pythonstructmodule. Nothing besides this binary information can be stored in the metadata of these tables for the tree sequence to be used by SLiM, and so whenpyslimannotates an existing tree sequence, anything in those columns is overwritten. For more detailed documentation on the contents and format of the metadata, see the SLiM manual.Time and SLiM Tree Sequences
The “time” in a SLiM simulation is the number of generations since the beginning of the simulation. However, since tree sequences naturally deal with history retrospectively, properties related to “time” of tree sequences are measured in generations ago, i.e., generation prior to a given point. To distinguish these two notions of time, we’ll talk about “SLiM time” and “tskit time”. When SLiM records a tree sequence, it records tskit time in units of generations before the start of the simulation, so all tskit times it records in the tree sequence are negative (because it measures how long before the start, but happened after the start). This is terribly counterintuitive. The fact that there are two notions of time - one moving forwards, the other backwards - is unavoidable, but
pyslimdoes one thing to make this easier to work with: whenpyslimloads in a tree sequence file, it checks to see what the current SLiM time was at the end of the simulation, and shifts all times in the tree sequence so that tskit time is measured in generations before the end of the simulation.The upshot is that:
The
timeattribute of tree sequence nodes gives the number of generations before the end of the simulation at which those nodes were born.These numbers will not match the values in the
.treesfile, but you should not need to worry about that, as long as you alwaysload()anddump()usingpyslim.The conversion factor, the “SLiM time” that it was when the tree sequence file was written, is stored in an entry in the provenance table of the tree sequence;
pyslimextracts it from there to set theslim_generationattribute of aSlimTreeSequence.An example should help clarify things. Suppose that
my.treesis a file that was saved by SLiM at the end of a simulation run for 100 generations, and that we want to find the list of nodes in the tree sequence that were born during the first 20 generations of the simulation. Since the birth time of a node is recorded in the.timeattribute of a node in tskit time, a node that was born in the first 20 generations of the simulation, i.e., more than 80 generations before the end of the simulation, will have a.timeattribute of at least 80.ts = pyslim.load("my.trees", slim_format=True) old_nodes = [] for n in ts.nodes(): if n.time > ts.slim_generation - 20: old_nodes.append(n)In the future, we may change this behavior, but if so will provide an upgrade path for old files.
SLiM Tree Sequences
Because SLiM adds additional information to tree sequences,
pyslimdefines a subclass ofmsprime.TreeSequenceto make it easy to access this information, and to make the time shift described above seamless. When you runpyslim.load('my.trees', slim_format=True), you get aSlimTreeSequenceobject. This has all the same properties and methods as a plainTreeSequence, with the following differences:
- It has a
slim_generationattribute.- Its
.dump()method shifts times byslim_generationbefore writing them out, so thatpyslim.load("my.trees", slim_format=True).dump("my2.trees")writes out a tree sequence identical to the one that was read in (except for floating point error due to adding and subtracting this value from the times).Mutation and node times
Both types of time - “SLiM time” and “tskit time” - appear in a SLiM tree sequence. The birth times of each individual are stored in the
.timeattribute of each of their nodes as tskit times, while theslim_timeattributes of mutation metadata is in SLiM time.Here is a very small example. Suppose that there are three haploid individuals: node 0 is born in the first generation, node 1 is born from node 0 in the third generation, and node 2 is born from node 1 in the fifth generation. (This is not possible in SLiM for a number of reasons, but ignore this.) Furthermore, suppose that two mutations have appeared: mutation 0 in generation 2 and mutation 1 in generation 3. The simulation is run for 5 generations, then written to a tree sequence. Here is a depiction of this:
slim time tskit time nodes mutation --------- ---------- ----- -------- 1 4 0 2 3 0 3 2 1 1 4 1 5 0 2Here “tskit time” refers to the number of generations before the end of the simulation.
In this situation, the
timeattribute associated with each node would be that appearing in the “tskit time” column, while theslim_timeattribute of mutation metadata would be that appearing in the “slim time” column. We could retrieve this information hypothetically as:>>> ts = pyslim.load("my.trees", slim_format=True) >>> for n in ts.nodes(): >>> print(n.time) [4, 2, 0] >>> for m in ts.mutations(): >>> md = pyslim.decode_mutation(m.metadata) >>> print(md.slim_time) [2, 3]We could then convert the node times to “slim time” as follows:
>>> [ts.slim_generation - n.time for n in ts.nodes()] [1, 3, 5]And, we could convert the mutation times to “tskit time” as follows:
>>> [ts.slim_generation - m.slim_time for m in pyslim.extract_mutation_metadata(ts.tables)] [3, 2]Other important notes:
tskit“nodes” correspond to SLiM “genomes”. Individuals in SLiM are diploid, so each has two nodes.The currently alive individuals will be those in the Individual table; since in SLiM, all individual are diploid, every individual will be associated with two nodes, and all other nodes will not have a corresponding individual.
The “remembered nodes” will be the first nodes.
Owner
- Name: Tskit developers
- Login: tskit-dev
- Kind: organization
- Email: admin@tskit.dev
- Website: https://tskit.dev
- Repositories: 26
- Profile: https://github.com/tskit-dev
Software for the creation and analysis of tree-sequences.
GitHub Events
Total
- Create event: 5
- Release event: 1
- Issues event: 27
- Delete event: 3
- Issue comment event: 42
- Push event: 16
- Pull request review event: 1
- Pull request event: 24
Last Year
- Create event: 5
- Release event: 1
- Issues event: 27
- Delete event: 3
- Issue comment event: 42
- Push event: 16
- Pull request review event: 1
- Pull request event: 24
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| peter | p****p@g****m | 215 |
| Yan Wong | y****g@b****k | 7 |
| Ben Jeffery | b****y@b****k | 6 |
| Jaime Ashander | j****r@u****u | 3 |
| Gilia Patterson | 3****n | 2 |
| Jerome Kelleher | jk@w****k | 2 |
| andrewkern | a****n@u****u | 2 |
| Murillo R | m****s@g****m | 2 |
| Silas Tittes | s****s@g****m | 1 |
| lclclclclclclc | e****n@h****u | 1 |
| chris smith | c****s@d****u | 1 |
| jgallowa07 | j****7@g****m | 1 |
| Ben Jeffery | b****y@w****k | 1 |
| Tatiana Bellagio | 5****o | 1 |
| Xin Huang | x****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 106
- Total pull requests: 72
- Average time to close issues: 9 months
- Average time to close pull requests: 7 days
- Total issue authors: 21
- Total pull request authors: 9
- Average comments per issue: 1.95
- Average comments per pull request: 1.33
- Merged pull requests: 61
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 16
- Pull requests: 18
- Average time to close issues: 3 months
- Average time to close pull requests: 6 days
- Issue authors: 3
- Pull request authors: 2
- Average comments per issue: 2.13
- Average comments per pull request: 0.83
- Merged pull requests: 14
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- petrelharp (74)
- bhaller (7)
- ShyamieG (3)
- Hongjinwu (2)
- ChrystelleDelord (2)
- daikitag (2)
- hyanwong (2)
- isadoo (1)
- grahamgower (1)
- AudeCaizergues (1)
- bguo068 (1)
- mariharris (1)
- Luker121 (1)
- steinrue (1)
- benjeffery (1)
Pull Request Authors
- petrelharp (47)
- benjeffery (14)
- daikitag (4)
- andrewkern (4)
- jiseonmin (3)
- lkirk (2)
- mufernando (1)
- Tatianabellagio (1)
- silastittes (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 28,477 last-month
-
Total dependent packages: 2
(may contain duplicates) -
Total dependent repositories: 31
(may contain duplicates) - Total versions: 34
- Total maintainers: 1
pypi.org: pyslim
Manipulate tree sequences produced by SLiM.
- Homepage: https://github.com/tskit-dev/pyslim
- Documentation: https://pyslim.readthedocs.io/
- License: MIT
-
Latest release: 1.1.0
published 11 months ago
Rankings
Maintainers (1)
conda-forge.org: pyslim
pyslim is a python module to allow reading and writing of SLiM-produced tree sequences as a thin interface to tskit.
- Homepage: https://github.com/tskit-dev/pyslim
- License: MIT
-
Latest release: 1.0.1
published almost 4 years ago