gclink

Gene-Cluster Discovery, Annotation and Visualization

https://github.com/liuyanglee/gclink

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Gene-Cluster Discovery, Annotation and Visualization

Basic Info
  • Host: GitHub
  • Owner: LiuyangLee
  • Language: R
  • Default Branch: main
  • Size: 1.61 MB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 11 months ago · Last pushed 10 months ago
Metadata Files
Readme

README.md

gclink: Gene-Cluster Discovery, Annotation and Visualization

Overview

gclink performs end-to-end analysis of gene clusters (e.g., photosynthesis, carbon/nitrogen/sulfur cycling, carotenoid, antibiotic, or viral genes) from (meta)genomes. It provides:

  • Parsing of Basic Local Alignment Search Tool (BLAST) results in tab-delimited format produced by tools like NCBI BLAST+ and Diamond BLASTp
  • Contiguous cluster detection
  • Publication-ready visualization

Key Features

Adaptive Workflow

  • Works with or without coding sequences input
  • Skips plotting when functional grouping is absent
  • Supports custom gene lists for universal cluster detection

Cluster Detection

  • Density-based identification via AllGeneNum and MinConSeq parameters
  • Handles incomplete gene annotation coverage
  • Optional insertion of hypothetical ORFs at cluster boundaries

Visualization

  • Publication-ready arrow plots with customizable based on gggenes:
    • Color themes
    • Functional group levels
    • Genome subsets

Installation

```r

Install from CRAN

install.packages("gclink")

Install from GitHub

if (!require("devtools")) install.packages("devtools") devtools::install_github("LiuyangLee/gclink") ```

Case 1: Using blastp result

```r

Case 1: Using blastp result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)

library(gclink) data(blastpdf) data(seqdata) data(photosynthesisgenelist) data(PGCgroup) gclist <- gclink(inblastpdf = blastpdf, inseqdata = seqdata, ingenelist = photosynthesisgenelist, inGCgroup = PGCgroup, AllGeneNum = 50, MinConSeq = 25, applylengthfilter = TRUE, downIQR = 10, upIQR = 10, orfbeforefirst = 0, orfafterlast = 0, levelsgenegroup = c('bch','puh','puf','crt','acsF','assembly','regulator', 'hypothetical ORF'), colortheme = c('#3BAA51','#6495ED','#DD2421','#EF9320','#F8EB00', '#FF0683','#956548','grey'), genomesubset = NULL) gcmeta = gclist[["GCmeta"]] gcseq = gclist[["GCseq"]] gcplot = gclist[["GCplot"]] head(gcmeta) # Cluster metadata head(gcseq) # FASTA sequences print(gc_plot) # Visualization ```

1 Input Data Preview

1.1 A dataframe of Diamond BLASTp output (e.g., head(blastp_df))

| qaccver | saccver | pident | length | mismatch | gapopen | qstart | qend | sstart | send | evalue | bitscore | |----------------------------------------------------------|-------------------------------------------------------------------------|--------|--------|----------|---------|--------|------|--------|------|-----------|----------| | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.167 | enzymerhodopsinXP002954798.1Volvoxcarteri | 26.6 | 576 | 343 | 15 | 157 | 666 | 332 | 893 | 8.18e-41 | 161 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1113 | petBCandidatusMethylomirabilisoxyferaDAMO1671MOX | 76.6 | 248 | 58 | 0 | 14 | 261 | 9 | 256 | 5.43e-149 | 417 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1114 | petCCandidatusNitronautalitoralisG3M7016785NLI | 50.8 | 177 | 73 | 2 | 8 | 184 | 27 | 189 | 3.83e-59 | 184 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1523 | cruCHumisphaeraborealisIPV6918620HBS | 31.5 | 365 | 208 | 11 | 42 | 378 | 48 | 398 | 1.45e-41 | 151 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1616 | rfpBKL6621921938 | 33.0 | 227 | 137 | 3 | 4 | 223 | 3 | 221 | 2.53e-32 | 124 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1754 | bchIpMyxococcota--cWYAZ01--oWYAZ01--GCA016703535.1---JADJBV010000002.1754 | 100.0 | 343 | 0 | 0 | 1 | 343 | 1 | 343 | 4.73e-249 | 677 |

1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ⁠ORFid # start # end # strand # ...⁠) and Sequence (e.g., head(`seqdata`))

| SeqName | Sequence | |---------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Houyibacteriaceae--LLY-WYZ-153---k1411028641 # 3 # 266 # 1 # ID=851;partial=10;starttype=Edge;rbsmotif=None;rbsspacer=None;gccont=0.807 | CCGGACGCGCCGCCCGCCCCGAAGGCCCCGCCGGCCGCCCCCACCTATCCGCTCGAAGGCGCGCTCGGTATCAGCCGCGTGCGCCTCGTGCGCGCCACGCCCTGCGGCCTCACCGGCCGCGAGCTCGGCGCCGGCGAGGAGGCCCTCCTCGTCCACTTCGACGACGGACGCCCGCCCCTCGCGGTCGCCCCCGACGCGCTCCCGACGCCCCCCGGCGACGGGACGCCCCCCACCGGCGCTCCGCCGGAAGGAGACCCCGCATGA | | Houyibacteriaceae--LLY-WYZ-153---k1411028642 # 263 # 490 # 1 # ID=852;partial=00;starttype=ATG;rbsmotif=AGGAG;rbsspacer=5-10bp;gccont=0.737 | ATGACCCGCCCCGAAGACGCCCCGCCCACCCACGAAGCCGCGGACCGCGCCGTGCGCTCCCTCTTCCAGATCGGTCGCCTCTGGGCCTCCCACGGCCTCGAGATGGGTCGCATGACCTTGCGGACCGCCGCCAAGACCCTCGAGAGCACCGCCGAGACCCTCGAGGACCTCTCCCAGCGCGTCGCCCCCGACGACGAGCGCCCCGCGGACGAACGCGCCGCCGACTGA | | Houyibacteriaceae--LLY-WYZ-153---k1411028643 # 667 # 2184 # -1 # ID=853;partial=00;starttype=ATG;rbsmotif=AGGAGG;rbsspacer=5-10bp;gccont=0.775 | ATGAGCGCGATCGAAGGGACCCGGCCTCGGGACGGCGAGGCCCGCATGCCCGTGGAGGCGACCCCCGTGGAGGCCATCGGGGGCCTCGTCGCCCGGGCGCGTGACGCCGGCTTCGACCACGCGGCCCGGCCCCTCGCCGAGCGCGCGGGGCTGCTGCGCGCGCTCGCGGACGCCATCCTCGCCGACGGGGAGGCCATCGTCGCGCTCCTCGAGGAGGAGACGGGCAAGCCGGCGGCGGAGGCGTGGCTCCACGAGGTCGTGCCGACGGCGGACCTCGGGAGCTGGTGGAGCAGCCAGGGGCCGGCGCACCTCGCGACGGAAGCCGTGCGCCTCGACCCGCTCGCCTACCCTGGCAAGCGCGCGCGCGTCGAGGTGGTCCCGCGTGGCGTCGTGGCGCTGATCACGCCTTGGAACTTCCCGGTGGCGATCCCGCTGCGGACGCTCTTCCCGGCGCTCCTCGCGGGCAACGGCGTCGTCTGGAAGCCGTCCGAGCACACGCCGCGGGTGGCGGCGCGCGTGCACGGGATCGTGCGCGAGGTCTTCGGGCCGGACCTGGTCGAGCTGGTGCAGGGCGCCGGCGCGCAGGGGGCGGCGCTGGTCGAGGCGGACGTGGACGCGGTGGTGTTCACGGGCAGCGTGGCGACCGGGCGGAAGGTCGGCGCGGCGGCGGGGCGGGCGCTCACGCCGGCGTCGCTCGAGCTCGGCGGCAAGGACGCGGCCGTGGTGCTCGACGACGCGGACCTGGAGCGCACGGCCCGGGGCCTGCTCTGGGCGGCGATGGCGAACGCGGGGCAGAACTGCGCCGGGCTCGAGCGCGTCTACGCGGTGGCGGAGGTCGCCGGCCCGCTGAAGGCGCGGCTCGGTGAGCTGGCCGGAGAGCTGGTGCCCGGGCGCGACGTGGGGCCGCTGGTGACCGAGGCGCAGCTCGCGACGGTGGAGCGGCACGTGCGCGAGGCGGTCGACGGGGGCGCGGAGGTGCTGGCCGGCGGCGAGCGGCTCGAGCGGGGCGGGCGCTGGTTCGCGCCGACCGTGCTGGCGGAGGTCGAGCCGTCTTCGGCGGCGCTCCGGGAGGAGACGTTCGGGCCGGTGGTCGTCGTGCAGACGGTGGCGGACGAGGCGGCGGCCGTGGCGGCGGCGAACGACTCGCGCTTCGGGCTGACGGCGAGCGTCTGGACGCGGGACGCGGCGCGCGGGGAGGCGGTCGCACGGCGGCTCCGGGCGGGCGTCGTGACGGTGAACAACCACGCCTTCACCGGGGCCATCCCGGCGCTGCCCTGGGGCGGCGTCGGCGAGACGGGCTTCGGGGTGACGAACTCGCCGCACGCGCTCCACGCATTGGTGCGGCCGCGGGCCGTGGTCGTGGACGGCAACGCGCGGCCGGAGCTCTACTGGCACCCCTACGACGAGGCGCTCGAGCGGCTCGGGAAGGGCATGGCGGCGCTCCGCGGCAAGGGCGGGCCGATCACGAAGGTGCGCGCCGTGGCCAGGCTGCTCGGGGCGCTCCGCCGGCGCTTCTGA |

1.3 (Optional) Gene group (e.g., head(PGC_group))

| gene | genegroup | genelabel | |----------|------------|------------| | bciE | bci | E | | bchB | bch | B | | bchC | bch | C | | bchD | bch | D |

1.4 (Optional) Candidate gene list (e.g., head(photosynthesis_gene_list))

bciE bchB bchC bchD bchE

2 Output Data Preview

2.1 Gene cluster information (GC_meta)

| gene | qaccver | saccver | pident | length | mismatch | gapopen | qstart | qend | sstart | send | evalue | bitscore | genome | orf | contig | genomecontig | orfposition | genecluster | GCorfposition | GCpresentlength | GCabsentlength | GClength | SeqName | Sequence | start | end | direction | genegroup | genelabel | Pgenome | Pstart | Pend | Pdirection | |------|---------|---------|--------|--------|----------|---------|--------|------|--------|------|--------|----------|--------|-----|--------|--------------|-------------|--------------|----------------|------------------|-----------------|----------|---------|----------|-------|-----|-----------|------------|------------|---------|--------|------|------------| | pufC | Houyibacteriaceae--LLY-WYZ-153---k14110286497 | pufCRhodospirillumcentenumRC12101RCE | 53.1 | 335 | 147 | 7 | 3 | 329 | 6 | 338 | 7.66E-112 | 333 | Houyibacteriaceae--LLY-WYZ-153 | k14110286497 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 97 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 1 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k14110286497 # 117640 # 118917 # -1 # ID=8597;partial=00;starttype=GTG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.710 | GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA | 117640 | 118917 | -1 | puf | C | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 0 | 1277 | FALSE | | pufM | Houyibacteriaceae--LLY-WYZ-153---k14110286498 | pufMpMyxococcota--cPolyangia--oPolyangiales--ERR1726576bin.13---k1411027383 | 100 | 437 | 0 | 0 | 1 | 437 | 1 | 437 | 4.73E-308 | 834 | Houyibacteriaceae--LLY-WYZ-153 | k14110286498 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 98 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 2 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k14110286498 # 118914 # 120224 # -1 # ID=8598;partial=00;starttype=ATG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.704 | ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA | 118914 | 120224 | -1 | puf | M | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 1274 | 2584 | FALSE | | pufL | Houyibacteriaceae--LLY-WYZ-153---k14110286499 | pufLpMyxococcota--cPolyangia--oPolyangiales--ERR1726567bin.15---k1411843592 | 100 | 275 | 0 | 0 | 1 | 275 | 1 | 275 | 2.63E-214 | 583 | Houyibacteriaceae--LLY-WYZ-153 | k14110286499 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 99 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 3 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k14110286499 # 120270 # 121094 # -1 # ID=8599;partial=00;starttype=ATG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.648 | ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA | 120270 | 121094 | -1 | puf | L | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 2630 | 3454 | FALSE | | bchO | Houyibacteriaceae--LLY-WYZ-153---k141102864100 | bchOPararhodospirillumphotometricumRSPPHO00117RPM | 44.9 | 265 | 144 | 1 | 33 | 295 | 28 | 292 | 6.97E-60 | 194 | Houyibacteriaceae--LLY-WYZ-153 | k141102864100 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 100 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 4 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k141102864100 # 121191 # 122102 # -1 # ID=85100;partial=00;starttype=ATG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.762 | ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA | 121191 | 122102 | -1 | bch | O | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 3551 | 4462 | FALSE | | bchD | Houyibacteriaceae--LLY-WYZ-153---k141102864101 | bchDpMyxococcota--cPolyangia--oPolyangiales--GCA002699025.1---PABA01000098.181 | 100 | 587 | 0 | 0 | 1 | 587 | 1 | 587 | 0 | 1064 | Houyibacteriaceae--LLY-WYZ-153 | k141102864101 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 101 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 5 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k141102864101 # 122099 # 123859 # -1 # ID=85101;partial=00;starttype=ATG;rbsmotif=None;rbsspacer=None;gccont=0.792 | ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA | 122099 | 123859 | -1 | bch | D | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 4459 | 6219 | FALSE | | bchI | Houyibacteriaceae--LLY-WYZ-153---k141102864102 | bchIpMyxococcota--cPolyangia--oPolyangiales--GCA002699025.1---PABA01000098.182 | 100 | 339 | 0 | 0 | 1 | 339 | 1 | 339 | 1.97E-239 | 652 | Houyibacteriaceae--LLY-WYZ-153 | k141102864102 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 102 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 6 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k141102864102 # 123863 # 124879 # -1 # ID=85102;partial=00;starttype=ATG;rbsmotif=GGA/GAG/AGG;rbsspacer=5-10bp;gccont=0.745 | ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA | 123863 | 124879 | -1 | bch | I | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 6223 | 7239 | FALSE |

2.2 Gene cluster sequence (GC_seq)

```

pufCHouyibacteriaceae--LLY-WYZ-153---k141102864---1 GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA pufMHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA pufLHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA bchOHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA bchDHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA bchIHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA ```

2.3 Gene cluster plot (GC_plot)

gc_plot case1

Case 2: Using eggNOG (evolutionary gene genealogy Nonsupervised Orthologous Groups) format result

```r

Case 2: Using eggNOG result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)

library(gclink) data(eggnogdf) data(seqdata) data(KOgroup) KOs = c("K02291","K09844","K20611","K13789", "K09846","K08926","K08927","K08928", "K08929","K13991","K04035","K04039", "K11337","K03404","K11336","K04040", "K03403","K03405","K04037","K03428", "K04038","K06049","K10960","K11333", "K11334","K11335","K08226","K08226", "K09773") renameKOs = paste0("ko:", KOs) eggnogdf$qaccver = eggnogdf$#query eggnogdf$saccver = eggnogdf$KEGGko eggnogdf$evalue = eggnogdf$evalue eggnogdf$bitscore = eggnogdf$score eggnogdf$gene = eggnogdf$KEGGko gclist2 = gclink(inblastpdf = eggnogdf, inseqdata = seqdata, ingenelist = renameKOs, inGCgroup = KOgroup, AllGeneNum = 50, MinConSeq = 25, applyevaluefilter = FALSE, minevalue = 1, applyscorefilter = TRUE, minscore = 10, orfbeforefirst = 1, orfafterlast = 1, levelsgenegroup = c('bch','puh','puf','crt', 'acsF','assembly','hypothetical ORF'), colortheme = c('#3BAA51','#6495ED','#DD2421','#EF9320', '#F8EB00','#FF0683','grey')) gcmeta2 = gclist2[["GCmeta"]] gcseq2 = gclist2[["GCseq"]] gcplot2 = gclist2[["GCplot"]] head(gcmeta2) # Cluster metadata head(gcseq2) # FASTA sequences print(gcplot2) # Visualization ```

1 Input Data Preview

1.1 A dataframe of Diamond BLASTp output from eggNOG (e.g., head(eggnog_df))

| #query | seedortholog | evalue | score | eggNOGOGs | maxannotlvl | COGcategory | Description | Preferredname | GOs | EC | KEGGko | KEGGPathway | KEGGModule | KEGGReaction | KEGGrclass | BRITE | KEGGTC | CAZy | BiGGReaction | PFAMs | |--------|--------------|--------|-------|------------|---------------|--------------|-------------|---------------|-----|----|---------|--------------|-------------|---------------|-------------|-------|---------|------|---------------|-------| | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.11 | 439375.Oant2732 | 1.57E-45 | 162 | COG3293@1|root,COG3293@2|Bacteria,1PVIT@1224|Proteobacteria,2TURP@28211|Alphaproteobacteria,1J3RT@118882|Brucellaceae | 28211|Alphaproteobacteria | L | Transposase DDE domain | - | - | - | ko:K07492 | - | - | - | - | ko00000 | - | - | - | DDETnp1,DDETnp12,DUF4096 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.12 | 1173264.KI913949gene2450 | 3.58E-17 | 83.6 | COG3335@1|root,COG3415@1|root,COG3335@2|Bacteria,COG3415@2|Bacteria,1G39S@1117|Cyanobacteria,1HCKE@1150|Oscillatoriales | 1117|Cyanobacteria | L | COGs COG3415 Transposase and inactivated derivatives | - | - | - | ko:K07494 | - | - | - | - | ko00000 | - | - | - | DDE3,HTH32,HTHTnpIS630 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.13 | 794903.OPIT503400 | 3.03E-30 | 114 | COG3335@1|root,COG3335@2|Bacteria | 2|Bacteria | L | DDE superfamily endonuclease | - | - | - | ko:K07494 | - | - | - | - | ko00000 | - | - | - | DDE3,HTHTnpIS630 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.15 | 502025.Hoch2790 | 2.78E-50 | 191 | 2AY84@1|root,31QA9@2|Bacteria,1QMYF@1224|Proteobacteria,4374U@68525|delta/epsilon subdivisions,2X20E@28221|Deltaproteobacteria,2YWTZ@29|Myxococcales | 28221|Deltaproteobacteria | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.111 | 105420.BBPO01000003gene1121 | 2.00E-11 | 72.8 | COG2887@1|root,COG2887@2|Bacteria,2GJC5@201174|Actinobacteria,2NGJC@228398|Streptacidiphilus | 201174|Actinobacteria | L | Protein of unknown function (DUF2800) | recB | - | - | ko:K07465 | - | - | - | - | ko00000 | - | - | - | PDDEXK1 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.112 | 1122915.AUGY01000071gene4398 | 2.13E-37 | 152 | COG1201@1|root,COG1201@2|Bacteria,1UHYQ@1239|Firmicutes,4ISB0@91061|Bacilli,277Q5@186822|Paenibacillaceae | 91061|Bacilli | L | helicase superfamily c-terminal domain | - | - | - | - | - | - | - | - | - | - | - | - | DUF1998,Helicase_C |

1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ⁠ORFid # start # end # strand # ...⁠) and Sequence (e.g., head(`seqdata`))

Same with Case 1

1.3 (Optional) KO/gene group (e.g., head(KO_group))

| gene | genegroup | genelabel | |------------|------------|------------| | ko:K04035 | acsF | acsF | | ko:K08226 | assembly | bch2 | | ko:K04039 | bch | B | | ko:K11337 | bch | C | | ko:K03404 | bch | D | | ko:K11336 | bch | F |

1.4 (Optional) Candidate KO/gene list

ko:K04035 ko:K08226 ko:K04039 ko:K11337 ko:K03404 ko:K11336

2 Output Data Preview

2.1 Gene cluster information (GC_meta)

Similar with Case 1

2.2 Gene cluster sequence (GC_seq)

Similar with Case 1

2.3 Gene cluster plot (GC_plot)

gc_plot case2

Documentation

Full function reference: r ?gclink::gclink

Citation

If you use gclink in your research, please cite:

Li, L., Huang, D., Hu, Y., Rudling, N. M., Canniffe, D. P., Wang, F., & Wang, Y. "Globally distributed Myxococcota with photosynthesis gene clusters illuminate the origin and evolution of a potentially chimeric lifestyle." Nature Communications (2023), 14, 6450. https://doi.org/10.1038/s41467-023-42193-7

Dependencies

  • R (≥ 3.5)
  • dplyr (≥ 1.1.4)
  • gggenes (≥ 0.5.1)
  • ggplot2 (≥ 3.5.2)

License

GPL-3 © Liuyang Li

Contact

Owner

  • Name: Liuyang Li
  • Login: LiuyangLee
  • Kind: user
  • Company: Shanghai Jiaotong University

GitHub Events

Total
  • Watch event: 1
  • Push event: 3
  • Create event: 7
Last Year
  • Watch event: 1
  • Push event: 3
  • Create event: 7

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
cran.r-project.org: gclink

Gene-Cluster Discovery, Annotation and Visualization

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 0 Last month
Rankings
Dependent packages count: 25.6%
Forks count: 29.0%
Dependent repos count: 31.5%
Stargazers count: 33.0%
Average: 40.9%
Downloads: 85.3%
Maintainers (1)
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5 depends
  • dplyr >= 1.1.4 imports
  • gggenes >= 0.5.1 imports
  • ggplot2 >= 3.5.2 imports
  • utils * imports