primate_paralogs

Scripts for 'Using all gene families vastly expands data available for phylogenomic inference in primates'

https://github.com/meganlsmith/primate_paralogs

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Scripts for 'Using all gene families vastly expands data available for phylogenomic inference in primates'

Basic Info
  • Host: GitHub
  • Owner: meganlsmith
  • Language: Python
  • Default Branch: main
  • Size: 29.3 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Citation

README.md

Primate_Paralogs

Scripts for 'Using all gene families vastly expands data available for phylogenomic inference in primates'

For all scripts, sequences must be names Speciesgenusotherinfo, as the info separated by the first underscore will be used to assign gene copies to species.

singlecopyorthologs_v1a.py: Script for filtering only single copy orthologs with some sampling threshold (SCOs).

allparalogs_v1a.py: Script for filtering gene trees to keep all paralogs with some minimum taxon sampling threshold (ALL PARALOGS)

oneparalogs_v1a.py: Script to sample one paralog per species (ONE PARALOGS)

lineagespecificdups_v3.py: Script to perform Lineage Specific Duplicate filtering (LSDs)

twolineagedups_v3.py: Script to perform Two-lineage and lineage-specific duplicate filtering (TSDs)

SEbranchcutting_v2a.py: Script to perform Subtree Extraction filtering (SE)

subset_MI.py: Scripts to subsample one ortholog per orthogroup for MI filtering.

collapsesn0nodes.py: Script to collapse gene tree nodes with no decisive sites.

resampling.py: Script to test for introgression using GCFs.

cliptsdsv1a.py: Script to prune lsds and tsds from gene trees which can then be used in further filtering. DOES NOT FILTER FOR ONLY THOSE TREES WITH TSDS, LSDS, or SCOs (see twolineagedupsv3.py for this funciton.

Owner

  • Name: Megan Smith
  • Login: meganlsmith
  • Kind: user
  • Location: Bloomington, Indiana
  • Company: Indiana University

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Smith"
  given-names: "M.L."
- family-names: "Vanderpool"
  given-names: "D."
- family-names: "Hahn"
  given-names: "M.W."
title: "Scripts from Using all gene families vastly expands data available for phylogenomic inference"
version: 1.0.0
identifiers:
  - type: doi
    value: 10.5281/zenodo.12687979
date-released: 2022

GitHub Events

Total
Last Year