cluster_kmers

Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.

https://github.com/ulelab/cluster_kmers

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.

Basic Info

Host: GitHub
Owner: ulelab
License: gpl-3.0
Language: Python
Default Branch: master
Size: 1.29 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 2

Created about 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

cluster_kmers

Get clusters of k-mers based solely on their sequence or in combination with enrichment in PEKA.

Installation

``` conda create -n clusterkmers conda activate clusterkmers

Python must be >=3.8 or <=3.11 to support SciPy 1.9

conda install python=3.9 pip pip install git+https://github.com/ulelab/cluster_kmers.git@master ```

Usage

cluster_kmers [-h] -k KMERS [KMERS ...] -o RESULTS_FOLDER [-co {seq,seq_enrichment}] [-n N_CLUSTERS] [-peka PEKA_FILES [PEKA_FILES ...]] [-tl MIN_TOKEN_LENGTH] [-wl {True,False}] [-cons CONSENSUS_LENGTH]

Cluster k-mers based on sequence or on a combination of sequence and enrichment in CLIP data. ``` required arguments: -k KMERS [KMERS ...], --kmers KMERS [KMERS ...] A list of k-mers encoded in RNA alphabet separated by spaces: AAGG GGAG GCCU. -o RESULTSFOLDER, --outputfolder RESULTS_FOLDER A path to an existing output folder, for example "~/results"

optional arguments: -co {seq,seqenrichment}, --clusteron {seq,seqenrichment} Inputs to clustering. Valid options are: seq - Cluster only based on sequence similarity. seqenrichment - Cluster based on sequence similarity and based on enrichment of motifs in CLIP data (this option requires arguments passed to -peka) -n NCLUSTERS, --nclusters NCLUSTERS Number of clusters to split k-mers into. Valid options are "auto" or integer. Default is "auto". -peka PEKAFILES [PEKAFILES ...], --pekafiles PEKAFILES [PEKAFILES ...] A list of peka output files with extensions *merdistribution{regionname}.tsv, separated by spaces. -tl MINTOKENLENGTH, --mintokenlength MINTOKENLENGTH Minimal length of a substrings used for clustering. For k-mers with lengths greater than 5, setting this value to be greater than 1 can improve the results of clustering. -wl {True,False}, --weblogos {True,False} Whether to plot weblogos for motif groups, True by default. -cons CONSENSUSLENGTH, --consensuslength CONSENSUSLENGTH Length of consensus sequence to name k-mer groups. Automatically this length is determined as k-mer length - 1. Valid choices are "auto" or integer. ```

Owner

Name: Ulelab
Login: ulelab
Kind: organization
Location: London

Repositories: 10
Profile: https://github.com/ulelab

Citation (CITATION.cff)

cff-version: 0.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Kuret"
  given-names: "Klara"
  orcid: "https://orcid.org/0000-0002-8445-8080"
title: "cluster_kmers"
version: 0.0.0
doi: 10.5281/zenodo.8386584
date-released: 2023-02-22
url: "https://github.com/ulelab/cluster_kmers"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

cluster_kmers

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

cluster_kmers

Installation

Python must be >=3.8 or <=3.11 to support SciPy 1.9

Usage

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year