cluster_kmers
Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Repository
Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.
Basic Info
- Host: GitHub
- Owner: ulelab
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Size: 1.29 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
cluster_kmers
Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.
Installation
``` conda create -n clusterkmers conda activate clusterkmers
Python must be >=3.8 or <=3.11 to support SciPy 1.9
conda install python=3.9 pip pip install git+https://github.com/ulelab/cluster_kmers.git@master ```
Usage
cluster_kmers [-h] -k KMERS [KMERS ...] -o RESULTS_FOLDER [-co {seq,seq_enrichment}] [-n N_CLUSTERS] [-peka PEKA_FILES [PEKA_FILES ...]] [-tl MIN_TOKEN_LENGTH] [-wl {True,False}] [-cons CONSENSUS_LENGTH]
Cluster k-mers based on sequence or on a combination of sequence and enrichment in CLIP data. ``` required arguments: -k KMERS [KMERS ...], --kmers KMERS [KMERS ...] A list of k-mers encoded in RNA alphabet separated by spaces: AAGG GGAG GCCU. -o RESULTSFOLDER, --outputfolder RESULTS_FOLDER A path to an existing output folder, for example "~/results"
optional arguments: -co {seq,seqenrichment}, --clusteron {seq,seqenrichment} Inputs to clustering. Valid options are: seq - Cluster only based on sequence similarity. seqenrichment - Cluster based on sequence similarity and based on enrichment of motifs in CLIP data (this option requires arguments passed to -peka) -n NCLUSTERS, --nclusters NCLUSTERS Number of clusters to split k-mers into. Valid options are "auto" or integer. Default is "auto". -peka PEKAFILES [PEKAFILES ...], --pekafiles PEKAFILES [PEKAFILES ...] A list of peka output files with extensions *merdistribution{regionname}.tsv, separated by spaces. -tl MINTOKENLENGTH, --mintokenlength MINTOKENLENGTH Minimal length of a substrings used for clustering. For k-mers with lengths greater than 5, setting this value to be greater than 1 can improve the results of clustering. -wl {True,False}, --weblogos {True,False} Whether to plot weblogos for motif groups, True by default. -cons CONSENSUSLENGTH, --consensuslength CONSENSUSLENGTH Length of consensus sequence to name k-mer groups. Automatically this length is determined as k-mer length - 1. Valid choices are "auto" or integer. ```
Owner
- Name: Ulelab
- Login: ulelab
- Kind: organization
- Location: London
- Repositories: 10
- Profile: https://github.com/ulelab
Citation (CITATION.cff)
cff-version: 0.0.0 message: "If you use this software, please cite it as below." authors: - family-names: "Kuret" given-names: "Klara" orcid: "https://orcid.org/0000-0002-8445-8080" title: "cluster_kmers" version: 0.0.0 doi: 10.5281/zenodo.8386584 date-released: 2023-02-22 url: "https://github.com/ulelab/cluster_kmers"