Recent Releases of teiphy
teiphy - New mean-idf, mi, and mean-mi --table options
This release adds three new tabular formats accessible with the --table option. The first is mean-idf, which calculates the mean inverse document frequency (IDF) of two witnesses over all variation units where both have non-zero reading support vectors (i.e., where neither is lacunose, assuming no --split-missing option has been specified), including variation units where those witnesses disagree. The second is mi, which calculates the total mutual information (MI) of two witnesses over all variation units where both have non-zero reading support vectors. At each such variation unit, this corresponds to the Kullback-Leibler divergence of the observed joint distribution of the witnesses' readings from the expected joint distribution of their readings under the assumption that the witnesses are independent. The third format is mean-mi, which calculates the average MI rather than the total MI. The mean-idf and mean-mi formats will improve the value of relationships involving more fragmentary witnesses, but if extremely fragmentary witnesses are not filtered out the with --fragmentary-threshold, then their values may be overinflated.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum 11 months ago
teiphy - Multiple --split-missing options and support for IDF-weighted agreement tables
This release allows multiple options for the --split-missing option for tabular inputs. This option was previously a boolean flag. The uniform option replicates its old behavior: where a witness has missing data in a variation unit, its values for each of its potential readings in that unit are assigned equal proportions of the value 1. This corresponds to a uniform prior on which reading it originally had. The proportional option corresponds to a prior informed by the sample populations of the variant readings among the witnesses that are not missing data in that variation unit.
In conjunction with this new feature, this release also supports a new idf argument for the --table option. With this argument, teiphy produces an agreement matrix with rows and columns for witnesses where the agreements are weighed by their inverse document frequency (IDF). Equivalently, the cell for two witnesses contains the total expected information content (in bits) of their agreements at all variation units, where the probability of agreement on a reading is defined as the sampling probability of that reading. Effectively, this weighting scheme treats more exclusive agreements as more informative. In this way, it can help us get behind certain layers of textual influence, such as assimilation to a popular text, to highlight relationships that general agreement counts or proportions cannot. If no --split-missing argument is specified, then variation units where either witness has missing data are ignored. If either --split-missing argument is provided, then the expected information content of an agreement is determined based on the corresponding priors about which reading one or both witnesses might have had. In practice, ignoring variation units where one or both witnesses are missing data results in fewer false positives with fragmentary witnesses.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum 11 months ago
teiphy - Support for weighted variation units in stemma, updated example file, updated BEAST 2 template
The biggest contribution of this release is the addition of support for weighted variation units (specifically when generating inputs for the stemma program). Another significant contribution is a sweeping update to the example file to reflect the UBS collation of Ephesians more closely. Some unexpected behavior in the validation of witness dates was addressed, so that the check for whether the latest date is fixed in at least one witness now correctly gives preference to witnesses that have precise dates instead of date ranges. Finally, the template for BEAST 2.7 XML outputs now includes a tree initialization element and updated parameters and priors for the birth-death-sampling tree prior (Stadler, et al. 2013) to account more precisely for the use of dated tips (common for stemmata).
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum about 1 year ago
teiphy - Support through Python 3.12, 62-state support for NEXUS outputs, and support for PHYLIP distance/similarity matrices
This release incorporates contributions from @catsmith to ensure compatibility with Python versions 3.9 through 3.12. (As a result of these changes, Python 3.8 is no longer supported.) To accommodate software like PAUP* and @edmondac's fork of MrBayes (https://github.com/edmondac/MrBayes), the symbol set for NEXUS outputs has been extended to 62 symbols (0-9, a-z, and A-Z). This release also adds support for the use of --table distance and --table similarity options (along with the --proportion and --show-ext flags) with outputs in PHYLIP (.phy and .ph) format to produce PHYLIP-formatted distance or similarity matrices.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Support for similarity matrices and common variation unit counts in distance/similarity matrices
This release introduces the --table similarity option, which produces a tabular output with counts of pairwise agreements between witnesses (or, if the --proportion flag is specified, proportions of agreements among variation units where both witnesses have non-ambiguous readings), as well as the --show-ext flag, which adds the number of variation units where both witnesses have non-ambiguous readings to each cell's value (e.g., 47/50 or 0.94/50). This option can also be used with distance matrices specified with --table distance.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Support for exclusion of fragmentary witnesses
With this release, you can exclude fragmentary witnesses from your collation by specifying the --fragmentary-threshold command-line option, followed by a number between 0 and 1 indicating the proportion of variation units at which a witness must be extant (i.e., have a non-missing reading according to the reading type(s) specified with the -m option) to be included in the output. Thus, --fragmentary-threshold 0.7 will exclude all witnesses with more than 30 percent of their readings missing, while --fragmentary-threshold 1.0 will exclude witnesses with any missing readings. (Note that this check is performed after correctors' hands have been filled in, if you have supplied the --fill-correctors option.)
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Extended number of states for BEAST 2 outputs
In principle, any number of states should theoretically be permissible in BEAST 2.7 XML inputs, since the states are specified as sequences of probabilities rather than with one-character symbols. But even with sequences encoded in this way, BEAST 2 still requires code maps (for some reason), so we are limited by the space of allowable single-character symbols. Previously, teiphy restricted the set of BEAST state symbols to 0-9 and a-z. This release adds A-Z to the symbol set.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Support for variation unit identification through combination of "n", "from", and "to" attributes
Previously, teiphy assumed that each variation unit (i.e., an app element) would be uniquely identified by its xml:id attribute or its n attribute alone. While this assumption holds in the case of xml:id attributes (which, by definition, must be unique), it does not hold for n attributes. In practice, TEI XML collations assign app elements in the same larger passage of text (e.g., a verse) the same n value as that larger passage and then assign the app elements additional from and to attributes specifying word indices, so as to specify the unique location of the variation unit within that larger passage. To this end, the VariationUnit class of teiphy now checks for from and to attributes in addition to an n attribute and combines them to form a unique ID for the variation unit.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Support for supplying/updating witness date ranges through external CSV file
This release provides a new feature for the convenience of users who have derived their collation data and witness date ranges from different sources: a CSV file containing witness IDs and (potentially empty) minimum and maximum dates can be specified with the --dates-file command-line option. For witnesses in the CSV file, the specified date range will overwrite any existing date range in the TEI XML collation.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Update to mirror current version of STEMMA; dependency updates
This release increases the number of states for STEMMA outputs from 22 to 62 in accordance with the latest updates to STEMMA. It also updates several dependencies to address vulnerabilities noted by Dependabot.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Minor fix for STEMMA outputs and updates to dependencies
This release corrects the previous release's fix for STEMMA outputs (so that they support 22 states rather than 24) and updates several dependencies to address vulnerabilities noted by Dependabot.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Fixes for BEAST loggers and STEMMA state encodings
This minor release adds some missing attributes to state/ancestral logger elements in BEAST outputs to ensure that root frequencies (corresponding to intrinsic probability judgments) are incorporated into probability calculations for state sampling. It also fixes a previous bug in mapping variant reading indices to state codes in STEMMA outputs, so that reading indices (up to a maximum of 24 per variation unit) are now mapped to single-character state codes.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 1 year ago
teiphy - Support for time-dependent transcriptional relations in BEAST 2.7 outputs
The main change introduced in this release is support for tagging of potential transcriptional explanations with notBefore and notAfter attributes. If these attributes are present in a variation unit's transcriptional relations list, teiphy will now map the transcriptional relations to an EpochSubstitutionModel with a different substitution model for different slices of time. This feature is only supported for BEAST 2.7 XML outputs. This means that BEAST users can now model time-dependent transcriptional changes (like assimilation to later popular texts, paleographic confusions possible only for earlier or later scripts, etc.) more accurately.
A related change is the addition of more comprehensive rules for updating witness date ranges based on the date range of the work's composition (and vice-versa). This change affects age/date calibrations for NEXUS and BEAST 2.7 XML formats (including the MrBayes NEXUS input format).
This release also fixes an error that prevented the --verbose flag from working correctly.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum about 2 years ago
teiphy - Fixed ancestral sequence logging in BEAST XML output
Starting from this release, teiphy now correctly uses AncestralSequenceLogger elements (from the BEAST_CLASSIC package) instead of AncestralStateLogger elements (from the BEASTLabs package) in BEAST XML outputs.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum about 2 years ago
teiphy - Support for works with known origin dates
Previously, works with known dates of origin (specified as a date element with a when attribute) generated BEAST XML outputs with unnecessary prior distributions on the origin of the birth-death-skyline model. This release fixes the handling of such cases.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum about 2 years ago
teiphy - New options and better support for tabular outputs
In this release, teiphy's options for tabular outputs (in CSV, TSV, and Excel format) have been expanded and better organized. The type of tabular output desired can now be specified on the command line with the --table flag. Valid options are matrix (the default option, with rows for variant readings, columns for witnesses, and frequency values in cells), distance (a pairwise distance/dissimilarity matrix with rows and columns for witnesses and counts or proportions of disagreements at extant variation units in cells), nexus (with rows for witnesses, columns for variation units, and reading IDs in cells), and long (a series of rows with witness, variation unit ID, reading index, and reading text entries). Notably, --table long replaces the old --long-table command-line flag. A fix has been added to ensure that Unicode CSV and TSV files are loaded correctly in Excel. Finally, teiphy will now create directories in the output filepath if they do not already exist.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 2 years ago
teiphy - Minor improvements to BEAST and STEMMA outputs
In this release, STEMMA outputs generated by teiphy are written with the corresponding chron file specified with a relative rather than absolute path for better portability, and BEAST 2.7 XML outputs now initialize transcriptional rate parameters with bounds between 0.0 and Infinity and gamma distribution priors. Rates that are not fixed in the TEI XML file will be assigned random values (according to a gamma distribution) in the BEAST 2.7 XML output to avoid singular matrix errors in BEAST's initial computation of site/variation unit likelihoods. A --seed command-line parameter has been added to teiphy to make these assignments replicable.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 2 years ago
teiphy - BEAST 2.7 XML Output and Support for Bayesian Models
With this release, teiphy can now convert TEI XML collation files to BEAST 2.7 XML input files directly. The conversion process for this format can accommodate judgments of intrinsic probabilities (odds ratios describing how much more likely one variant reading is than another to be authorial, which correspond to root frequencies in BEAST) and transcriptional change classes (describing one or more potential causes for specific transitions between readings, which correspond to rate parameters that can be fixed or estimated in substitution models).
In addition, teiphy now includes tree priors and clock models (with multiple options for clock models on the command-line interface) in BEAST 2.7 XML and MrBayes NEXUS outputs. For BEAST, a birth-death skyline prior is used, which can incorporate a date range for the start of the manuscript tradition if one is specified, and strict, uncorrelated relaxed, and local relaxed clock models are supported.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum about 3 years ago
teiphy - Options for Constant Sites
With this release, teiphy now includes constant sites (i.e., variation units with only one substantive reading, after readings of trivial types have been merged with their parent readings) by default for all output formats (except STEMMA), and constant sites can be excluded from these outputs with the --drop-constant option. (But note that if you are using the output with likelihood-based phylogenetic software, you will probably want to use an ascertainment bias correction setting!)
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 3 years ago
teiphy - Witness List Parsing
Added code to check for a listWit element containing an explicit list of witnesses (and throw an exception if none is found), as well as code to print out warnings for all base sigla that occur in the collation, but not in the listWit.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 3 years ago
teiphy - Publication release
This release incorporates the changes recommended in the JOSS reviewers' generous feedback. These changes include the following features:
- support for date calibration blocks based on witness date ranges
- support for PHYLIP and FASTA outputs (for software such as RAxML)
- StatesFormat=StatesPresent is now the default setting for NEXUS output; if StatesFormat=Frequency is desired, the --frequency command-line option can be used to specify this.
- support for "long table" formatting for tabular output (NumPy, Pandas DataFrame, CSV, TSV, Excel)
This release is also the first release to be tracked by Zenodo.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 3 years ago
teiphy - JOSS paper submission release
The version of teiphy as submitted to the Journal of Open Source Software. Changes since the initial release include support for origDate elements containing chronological data under witness elements (current only used for conversion to STEMMA format); the addition of a to_distance_matrix method that returns a NumPy matrix of distances between witnesses; system tests to ensure that outputs for IQTREE, MrBayes, and STEMMA are validated by their respective programs; and further revisions to the paper and documentation.
Scientific Software - Peer-reviewed
- Python
Published by jjmccollum over 3 years ago