Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.8%) to scientific vocabulary
Repository
Gene-Cluster Discovery, Annotation and Visualization
Basic Info
- Host: GitHub
- Owner: LiuyangLee
- Language: R
- Default Branch: main
- Size: 1.61 MB
Statistics
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
gclink: Gene-Cluster Discovery, Annotation and Visualization
Overview
gclink performs end-to-end analysis of gene clusters (e.g., photosynthesis, carbon/nitrogen/sulfur cycling, carotenoid, antibiotic, or viral genes) from (meta)genomes. It provides:
- Parsing of Basic Local Alignment Search Tool (BLAST) results in tab-delimited format produced by tools like NCBI BLAST+ and Diamond BLASTp
- Contiguous cluster detection
- Publication-ready visualization
Key Features
Adaptive Workflow
- Works with or without coding sequences input
- Skips plotting when functional grouping is absent
- Supports custom gene lists for universal cluster detection
Cluster Detection
- Density-based identification via
AllGeneNumandMinConSeqparameters - Handles incomplete gene annotation coverage
- Optional insertion of hypothetical ORFs at cluster boundaries
Visualization
- Publication-ready arrow plots with customizable based on
gggenes:- Color themes
- Functional group levels
- Genome subsets
Installation
```r
Install from CRAN
install.packages("gclink")
Install from GitHub
if (!require("devtools")) install.packages("devtools") devtools::install_github("LiuyangLee/gclink") ```
Case 1: Using blastp result
```r
Case 1: Using blastp result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)
library(gclink) data(blastpdf) data(seqdata) data(photosynthesisgenelist) data(PGCgroup) gclist <- gclink(inblastpdf = blastpdf, inseqdata = seqdata, ingenelist = photosynthesisgenelist, inGCgroup = PGCgroup, AllGeneNum = 50, MinConSeq = 25, applylengthfilter = TRUE, downIQR = 10, upIQR = 10, orfbeforefirst = 0, orfafterlast = 0, levelsgenegroup = c('bch','puh','puf','crt','acsF','assembly','regulator', 'hypothetical ORF'), colortheme = c('#3BAA51','#6495ED','#DD2421','#EF9320','#F8EB00', '#FF0683','#956548','grey'), genomesubset = NULL) gcmeta = gclist[["GCmeta"]] gcseq = gclist[["GCseq"]] gcplot = gclist[["GCplot"]] head(gcmeta) # Cluster metadata head(gcseq) # FASTA sequences print(gc_plot) # Visualization ```
1 Input Data Preview
1.1 A dataframe of Diamond BLASTp output (e.g., head(blastp_df))
| qaccver | saccver | pident | length | mismatch | gapopen | qstart | qend | sstart | send | evalue | bitscore | |----------------------------------------------------------|-------------------------------------------------------------------------|--------|--------|----------|---------|--------|------|--------|------|-----------|----------| | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.167 | enzymerhodopsinXP002954798.1Volvoxcarteri | 26.6 | 576 | 343 | 15 | 157 | 666 | 332 | 893 | 8.18e-41 | 161 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1113 | petBCandidatusMethylomirabilisoxyferaDAMO1671MOX | 76.6 | 248 | 58 | 0 | 14 | 261 | 9 | 256 | 5.43e-149 | 417 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1114 | petCCandidatusNitronautalitoralisG3M7016785NLI | 50.8 | 177 | 73 | 2 | 8 | 184 | 27 | 189 | 3.83e-59 | 184 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1523 | cruCHumisphaeraborealisIPV6918620HBS | 31.5 | 365 | 208 | 11 | 42 | 378 | 48 | 398 | 1.45e-41 | 151 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1616 | rfpBKL6621921938 | 33.0 | 227 | 137 | 3 | 4 | 223 | 3 | 221 | 2.53e-32 | 124 | | Kuafubacteriaceae--GCA016703535.1---JADJBV010000002.1754 | bchIpMyxococcota--cWYAZ01--oWYAZ01--GCA016703535.1---JADJBV010000002.1754 | 100.0 | 343 | 0 | 0 | 1 | 343 | 1 | 343 | 4.73e-249 | 677 |
1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ORFid # start # end # strand # ...) and Sequence (e.g., head(`seqdata`))
| SeqName | Sequence | |---------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Houyibacteriaceae--LLY-WYZ-153---k1411028641 # 3 # 266 # 1 # ID=851;partial=10;starttype=Edge;rbsmotif=None;rbsspacer=None;gccont=0.807 | CCGGACGCGCCGCCCGCCCCGAAGGCCCCGCCGGCCGCCCCCACCTATCCGCTCGAAGGCGCGCTCGGTATCAGCCGCGTGCGCCTCGTGCGCGCCACGCCCTGCGGCCTCACCGGCCGCGAGCTCGGCGCCGGCGAGGAGGCCCTCCTCGTCCACTTCGACGACGGACGCCCGCCCCTCGCGGTCGCCCCCGACGCGCTCCCGACGCCCCCCGGCGACGGGACGCCCCCCACCGGCGCTCCGCCGGAAGGAGACCCCGCATGA | | Houyibacteriaceae--LLY-WYZ-153---k1411028642 # 263 # 490 # 1 # ID=852;partial=00;starttype=ATG;rbsmotif=AGGAG;rbsspacer=5-10bp;gccont=0.737 | ATGACCCGCCCCGAAGACGCCCCGCCCACCCACGAAGCCGCGGACCGCGCCGTGCGCTCCCTCTTCCAGATCGGTCGCCTCTGGGCCTCCCACGGCCTCGAGATGGGTCGCATGACCTTGCGGACCGCCGCCAAGACCCTCGAGAGCACCGCCGAGACCCTCGAGGACCTCTCCCAGCGCGTCGCCCCCGACGACGAGCGCCCCGCGGACGAACGCGCCGCCGACTGA | | Houyibacteriaceae--LLY-WYZ-153---k1411028643 # 667 # 2184 # -1 # ID=853;partial=00;starttype=ATG;rbsmotif=AGGAGG;rbsspacer=5-10bp;gccont=0.775 | ATGAGCGCGATCGAAGGGACCCGGCCTCGGGACGGCGAGGCCCGCATGCCCGTGGAGGCGACCCCCGTGGAGGCCATCGGGGGCCTCGTCGCCCGGGCGCGTGACGCCGGCTTCGACCACGCGGCCCGGCCCCTCGCCGAGCGCGCGGGGCTGCTGCGCGCGCTCGCGGACGCCATCCTCGCCGACGGGGAGGCCATCGTCGCGCTCCTCGAGGAGGAGACGGGCAAGCCGGCGGCGGAGGCGTGGCTCCACGAGGTCGTGCCGACGGCGGACCTCGGGAGCTGGTGGAGCAGCCAGGGGCCGGCGCACCTCGCGACGGAAGCCGTGCGCCTCGACCCGCTCGCCTACCCTGGCAAGCGCGCGCGCGTCGAGGTGGTCCCGCGTGGCGTCGTGGCGCTGATCACGCCTTGGAACTTCCCGGTGGCGATCCCGCTGCGGACGCTCTTCCCGGCGCTCCTCGCGGGCAACGGCGTCGTCTGGAAGCCGTCCGAGCACACGCCGCGGGTGGCGGCGCGCGTGCACGGGATCGTGCGCGAGGTCTTCGGGCCGGACCTGGTCGAGCTGGTGCAGGGCGCCGGCGCGCAGGGGGCGGCGCTGGTCGAGGCGGACGTGGACGCGGTGGTGTTCACGGGCAGCGTGGCGACCGGGCGGAAGGTCGGCGCGGCGGCGGGGCGGGCGCTCACGCCGGCGTCGCTCGAGCTCGGCGGCAAGGACGCGGCCGTGGTGCTCGACGACGCGGACCTGGAGCGCACGGCCCGGGGCCTGCTCTGGGCGGCGATGGCGAACGCGGGGCAGAACTGCGCCGGGCTCGAGCGCGTCTACGCGGTGGCGGAGGTCGCCGGCCCGCTGAAGGCGCGGCTCGGTGAGCTGGCCGGAGAGCTGGTGCCCGGGCGCGACGTGGGGCCGCTGGTGACCGAGGCGCAGCTCGCGACGGTGGAGCGGCACGTGCGCGAGGCGGTCGACGGGGGCGCGGAGGTGCTGGCCGGCGGCGAGCGGCTCGAGCGGGGCGGGCGCTGGTTCGCGCCGACCGTGCTGGCGGAGGTCGAGCCGTCTTCGGCGGCGCTCCGGGAGGAGACGTTCGGGCCGGTGGTCGTCGTGCAGACGGTGGCGGACGAGGCGGCGGCCGTGGCGGCGGCGAACGACTCGCGCTTCGGGCTGACGGCGAGCGTCTGGACGCGGGACGCGGCGCGCGGGGAGGCGGTCGCACGGCGGCTCCGGGCGGGCGTCGTGACGGTGAACAACCACGCCTTCACCGGGGCCATCCCGGCGCTGCCCTGGGGCGGCGTCGGCGAGACGGGCTTCGGGGTGACGAACTCGCCGCACGCGCTCCACGCATTGGTGCGGCCGCGGGCCGTGGTCGTGGACGGCAACGCGCGGCCGGAGCTCTACTGGCACCCCTACGACGAGGCGCTCGAGCGGCTCGGGAAGGGCATGGCGGCGCTCCGCGGCAAGGGCGGGCCGATCACGAAGGTGCGCGCCGTGGCCAGGCTGCTCGGGGCGCTCCGCCGGCGCTTCTGA |
1.3 (Optional) Gene group (e.g., head(PGC_group))
| gene | genegroup | genelabel | |----------|------------|------------| | bciE | bci | E | | bchB | bch | B | | bchC | bch | C | | bchD | bch | D |
1.4 (Optional) Candidate gene list (e.g., head(photosynthesis_gene_list))
bciE bchB bchC bchD bchE
2 Output Data Preview
2.1 Gene cluster information (GC_meta)
| gene | qaccver | saccver | pident | length | mismatch | gapopen | qstart | qend | sstart | send | evalue | bitscore | genome | orf | contig | genomecontig | orfposition | genecluster | GCorfposition | GCpresentlength | GCabsentlength | GClength | SeqName | Sequence | start | end | direction | genegroup | genelabel | Pgenome | Pstart | Pend | Pdirection | |------|---------|---------|--------|--------|----------|---------|--------|------|--------|------|--------|----------|--------|-----|--------|--------------|-------------|--------------|----------------|------------------|-----------------|----------|---------|----------|-------|-----|-----------|------------|------------|---------|--------|------|------------| | pufC | Houyibacteriaceae--LLY-WYZ-153---k14110286497 | pufCRhodospirillumcentenumRC12101RCE | 53.1 | 335 | 147 | 7 | 3 | 329 | 6 | 338 | 7.66E-112 | 333 | Houyibacteriaceae--LLY-WYZ-153 | k14110286497 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 97 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 1 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k14110286497 # 117640 # 118917 # -1 # ID=8597;partial=00;starttype=GTG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.710 | GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA | 117640 | 118917 | -1 | puf | C | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 0 | 1277 | FALSE | | pufM | Houyibacteriaceae--LLY-WYZ-153---k14110286498 | pufMpMyxococcota--cPolyangia--oPolyangiales--ERR1726576bin.13---k1411027383 | 100 | 437 | 0 | 0 | 1 | 437 | 1 | 437 | 4.73E-308 | 834 | Houyibacteriaceae--LLY-WYZ-153 | k14110286498 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 98 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 2 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k14110286498 # 118914 # 120224 # -1 # ID=8598;partial=00;starttype=ATG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.704 | ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA | 118914 | 120224 | -1 | puf | M | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 1274 | 2584 | FALSE | | pufL | Houyibacteriaceae--LLY-WYZ-153---k14110286499 | pufLpMyxococcota--cPolyangia--oPolyangiales--ERR1726567bin.15---k1411843592 | 100 | 275 | 0 | 0 | 1 | 275 | 1 | 275 | 2.63E-214 | 583 | Houyibacteriaceae--LLY-WYZ-153 | k14110286499 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 99 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 3 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k14110286499 # 120270 # 121094 # -1 # ID=8599;partial=00;starttype=ATG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.648 | ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA | 120270 | 121094 | -1 | puf | L | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 2630 | 3454 | FALSE | | bchO | Houyibacteriaceae--LLY-WYZ-153---k141102864100 | bchOPararhodospirillumphotometricumRSPPHO00117RPM | 44.9 | 265 | 144 | 1 | 33 | 295 | 28 | 292 | 6.97E-60 | 194 | Houyibacteriaceae--LLY-WYZ-153 | k141102864100 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 100 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 4 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k141102864100 # 121191 # 122102 # -1 # ID=85100;partial=00;starttype=ATG;rbsmotif=GGAG/GAGG;rbsspacer=5-10bp;gccont=0.762 | ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA | 121191 | 122102 | -1 | bch | O | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 3551 | 4462 | FALSE | | bchD | Houyibacteriaceae--LLY-WYZ-153---k141102864101 | bchDpMyxococcota--cPolyangia--oPolyangiales--GCA002699025.1---PABA01000098.181 | 100 | 587 | 0 | 0 | 1 | 587 | 1 | 587 | 0 | 1064 | Houyibacteriaceae--LLY-WYZ-153 | k141102864101 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 101 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 5 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k141102864101 # 122099 # 123859 # -1 # ID=85101;partial=00;starttype=ATG;rbsmotif=None;rbsspacer=None;gccont=0.792 | ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA | 122099 | 123859 | -1 | bch | D | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 4459 | 6219 | FALSE | | bchI | Houyibacteriaceae--LLY-WYZ-153---k141102864102 | bchIpMyxococcota--cPolyangia--oPolyangiales--GCA002699025.1---PABA01000098.182 | 100 | 339 | 0 | 0 | 1 | 339 | 1 | 339 | 1.97E-239 | 652 | Houyibacteriaceae--LLY-WYZ-153 | k141102864102 | k141102864 | Houyibacteriaceae--LLY-WYZ-153---k141102864 | 102 | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 6 | 34 | 2 | 36 | Houyibacteriaceae--LLY-WYZ-153---k141102864102 # 123863 # 124879 # -1 # ID=85102;partial=00;starttype=ATG;rbsmotif=GGA/GAG/AGG;rbsspacer=5-10bp;gccont=0.745 | ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA | 123863 | 124879 | -1 | bch | I | Houyibacteriaceae--LLY-WYZ-153---k141102864---1 | 6223 | 7239 | FALSE |
2.2 Gene cluster sequence (GC_seq)
```
pufCHouyibacteriaceae--LLY-WYZ-153---k141102864---1 GTGAAGAAGATCGCCATCGCCTTCGTGAGCACCTGGCTCCTCATCGGGGCCGTCTACGCCTACGAGCCGACCGAGACCTCGCAGATCGGCGCCGACGGCGTCGCCATGCAGGTCACGCAGACCGAGGACGAGCTCGCCGCGCGCGTGGAGGCGAACACCGTCCCGCCGGCCATCCCGATGCCCCAGAGCAGCGGCGTGCTGGCGGCCGAGGAGTACGAGAACGTGCAGGTCCTCGGCCACCTCAACACGGCCCAGTTCACCCGGCTGATGACCTCCATCACGCTCTGGGTCGCGCCGGAGCAGGGCTGCGCCTACTGCCACAACACGAACAACCTGGCCTCCGACGAGCTCTACACGAAGCGCGTGGCGCGTCGGATGATCCAGATGACCTGGCACATCAACGAGAACTGGCAGTCGCACGTCCAGGAGACCGGCGTGACCTGCTACACGTGCCACCGCGGCAACAACGTGCCCCAGCACATCTGGTTCGAGACGCCGCCCGACGACCACGGCATGGTGGGCTGGCGTGGCTCGCAGAACGCCCCGAACGACCGGACGGGGATCAGCTCCCTGCCGAACGACGTGTTCGAGGTGTTCCTCGAGGAGGACGCGAGCATCCGGGTCCAGTCGGCCGGGGAGGCCTTCCCGAACGAGAACCGCGCGTCCATCAAGCAGGCCGAGTGGACCTATGGGCTGATGATGCACTTCTCCGAGTCGCTCGGGGTGAACTGCACGGCTTGCCACAACTCGCGCTCCTGGAACGACTGGAGCCAGAGCCCGGCCCGCCGCGGGACGGCCTGGCACGGCATCCGGATGGCGCGAAACCTCAACAACCACTGGCTGACGCCGCTGCGCGATCAGTTCCCGCCGAACCGGCTCGGCGAGCTGGGTGACGCCCCGAAGGCCAACTGCGCGACGTGCCACCAGGGCGCGTACCGCCCCCTGCTCGGGCACCGCATGCTCGAGGACTTCCCGTCCCTCGTACGGGCGATGCCGCAGCCCGAGATCGAGCCGGAGCCGGAGCCGGAGCCCGAGCTGGAAGGCGAGGGCGAGGCCGGCGGGCAGCTCGAGCCGGAGGGGGAGGCGCCCGCCGCCGAAGCCCCCGAGGGCACGAACGCTGCGCCGACGGCGATGGCTGCGCCGGCGGCGATGGCCGCTCCGACGGGGATGGCCGCGCCGGCGGCGATGGCTGCGCCGGCGGCGATGGCTGCTCCGGCGGTGGCCGAGCCGACGCCCATGGCCGCGCCGGCGGCGATGGCGGCCCCGGCACCGAACTGA pufMHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGGCCCGCTACCAGAACATCTTCACGCAGATCCAAGTCGTCGGTCCGCCGGACACGCCGCCGCCGATCGACCCGGACTTCCGTACGAAGAAGACGCGCATGTCGCGGCTCCTCGGGTGGTTCGGCAACCCGCAGATCGGCCCCGTCTACCTGGGCTACACCGGCCTGGCGTCCGCGATCAGCTTCTTCATCGCTTTCGAGATCATCGGGCTCAACATGCTGGCCTCGGTGGACTGGGACGTCGTTCAGTTCATCCGCCAGCTCCCCTGGCTCGCGCTCGAACCGCCCCCGCCCTCTGCCGGGCTCTCCATCCCGACGCTTCAGGAGGGCGGCTGGTGGCTCATGGCCGGCTTCTTCCTCACGGCGTCGGTCATTCTCTGGTGGATTCGCACCTATCGGCGCGCACGCGCCCTGAAGATGGGCACGCACGTCGCGTGGGCCTTCGCCTCGGCGATCTGGCTCTACCTCGTCCTCGGCTTCATTCGCCCCTTGCTGATGGGGAGCTGGGGGGAGGCGGTGCCCTTCGGCATCTTCCCGCACCTCGACTGGACCGCCGCCTTCTCCGTTCGCTACGGCAACCTCTTCTACAACCCCTTCCACTGCCTCTCGATCGTCTTCCTCTACGGGTCGACGCTCCTCTTCGCCATGCACGGCGCGACGGTGCTCGCGCTCGGGCACGTGGGCGGTGAGCGTGAGGTGAGCCAGGTGGTCGACCGCGGCACGGCGGCCGAGCGCGGGGCGCTCTTCTGGCGCTGGACGATGGGCTTCAACGCGACCTTCGAGTCCATCCACCGCTGGGCCTGGTGGTTCGCGGTGCTCACGCCGCTCACCGGAGGCATCGGCATCCTCCTGACCGGCACCGCCGTCGACAACTGGTATCAGTGGGCCGTCGAGCACGACTTCGCGCCGGCCTATGAGGAGTCCTACGAGGTCGTCCCCGACCCGGTCGACGACCCGGCGAACGAGGACCTGCCCGGTATGCGCGGTGAGTCCACCGCGCAGTGGGAGCCGACCCCCTACGTGCCCGCCGAGGAGCCGGAGGCGCCCGAGGATGGTGCGGACGGCGCGGCCGCGGTCGAAGGCGTCGACGCCGAGGGCGGCGAGGATGCCGCCGCGGATCCCGCGAGCGAGGGCACGAGCGGCCAGCCGGAGACCGGCGCCGCGGCCCCGGAGAGCGAGCGCCTTCCGGACGAAGCGGCGGCGGCCGAGCCCGAAGGGGCTGCGCCGGAGCCCGAACCCCCCGCGCCGTCCGAGACGGCTGCCCCGAGCGAACCCGAGGCGCCCAGCGCGATGACCCCGGAGCAACCGTGA pufLHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGGGCCTACTGAGCTTCGAGCGGCGATATCGAGTCCGAGGAGGCACGCTCCTCGGGGGCGACCTATTCGATTTCTGGGTCGGGCCCTTCTACGTGGGGCTCTTCGGCGTCACGACGATCTTCTTCACGATCGTCGGCACCGCGCTGATCCTCTGGGAGGCCTCCCGGGGTGACACCTGGAACCCCTGGCTGATCAACATCCAGCCGCCTCCAATCGAGTACGGGCTCGCCTTCGCGCCCCTCGATCAGGGGGGCATCTGGCAGCTGGTCACCATCTGCGCCATCGGCGCCTTCGGATCCTGGGCGCTCCGACAGGCGGAGATCAGCCGCAAGCTCGGCATGGGCTACCACGTGCCCATCGCCTACGGCGTCGCGGTCTTCGCCTACGTCACGCTCGTGGTGATTCGCCCGGTGATGCTGGGCGCCTGGGGCCACGGCTTCCCCTACGGCATCTTCAGCCACCTCGATTGGGTGTCGAACGTCGGGTACCAGTACCTGCACTTCCACTACAACCCGGCCCACATGATCGCGGTGAGCTTCTTCTTCACCACGACGCTCGCGCTCTCCCTCCACGGCGGTTTGATCCTCTCCGCCGTGAATCCGCCGAAGGGAGAGAAGGTGAAGACCGCCGAGTACGAGGACGGGTTCTTCCGTGACCACATCGGCTACTCGATCGGCGCCCTGGGCATTCATCGACTCGGCCTCTTCCTGGCGCTGAGCGCCGGGATCTGGAGCGCGATCTGCATTCTCATCAGCGGCCCGATGTGGACCAAGGGGTGGCCCGAGTGGTGGGACTGGTGGCTCAACCTCCCCGTGTGGAGCTGA bchOHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGAGCTCGGCCGTCGAAGAGCAGCGCGTCGAGCACCCGCGGGTCGAGCAGCAGCCCATCGAGCAGCAGCGCGTCGAGCACCAGCGCGTCGAGCGTTCGGGCGTGCGGTGGAACGTCGCCCGCCGCGGCGCCGGACCCACGCTCCTGGCGCTCCACGGGACCGGCAGCTCGAGCCGCTCCTTCTGCGCCCTCGCGGCCACGCTCGGTGCTCGCTTCACCGTCGTGGCGCCCGATCTACCCGGCCACGCCGGGAGCCGGATCGATCGCCGCTTCCGCCTCTCGCTCCCCTCGATCGCCGCCGCCCTCGGCGAGCTCATCGAGGCGCTCGCCGTCCAGCCGGCGCTGGTCCTCGCTCACTCCGCGGGCGCGGCGGTGGCGGCGCGCGCCATGCTCGACGGGGCTCTCCGCCCGGCGCTCTTCGTCGGGCTCGGCGCGGCCCTGACGCCCCTCGAGGGGCTCGCCCGGCTCGGCGCGCGCCCGGCGGCCGCGATGCTCGCCCGCTCGCCCATCACGCGGCGGGTGGCGCGCCGGGCTGGAGGCGCCCTCGTCGGACCGATCCTGCGCAGCGTCGGATCCACCGTCGGCCCCGAGGCCACACAGCGCTATCGGGAGCTCGCCCGCGATCCCGCCCACGTCGGGGCGGTCTTCTCGATGCTCGCCCAGTGGGATCTCGACGGGCTCCACGCGGCGCTACCACGCCTGGACGTACCGACCCTGCTCCTCGGCGGCGCCCGCGACGGCGCCACCCCGATCGCCCAGCAGCGCGCCCTCGCACGTCGCCTCCCGGCCGCGCGCGCGCACGTCGTCCTCGGCGCCGGGCACCTGCTCCACGAGGAGCGACCCGCCGAGATCGCGCGCCTCGTCGAGGCCGAGTGGAACAGATTGGACGGCGGTCGTGTCAAAAATGCTTGA bchDHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGAGCGGCTGGCCCGACGTGGCGCGCGTCGCCGAGCTCCTGAGCGTCGACCCGGACGGCCTCGGAGGCGTGCGCCTGCGGGGTCGCCCGGGGCCGCACCGGCGCCGGGTGCTCGAGTGGGTGCGCGAGAGGCTGGCCCCGGAGGCGCCCTTCCGGCGCCTGCCCGCGCACGTGACCGAGGATCGGCTCCTCGGGGGCCTCGCGCTCGCGGAGACCTTGCGTTCGGGGCGGGCCGTCATGGAGCAGGGCGTGCTCGCGCGGAGCGACGGCGGCCTGCTCGTCGTGGCCATGGCCGAGCGGGCCGAGCGGGAGGTCGTGGCGCACCTCTGCGCGGCCCTCGACCGCGGCGCGATCACCGTCGAACGCGACGGCATGAGCGCCGAGGCGTCCTGCCGCGTGGGCCTCATCGCGCTCGACGAGGGCATCGACGAGGAGCACGTCGACCCGGCGCTCGCCGACCGGCTCGCCTTCGCGCTGGACCTCGACGCGCTCGATCCGCGGGGAGGGGCGGCGCCGGAACACGGACCCGAGGAGGTCGCGCGAGCCCGCGCCCGCCTCCCGCACGTGAGCCTCGGCGACGACATCATCGCGGCCCTCTCGGAGGCGGCCCAGGCCCTCGGCGTGGAGGCGCTCCGGCCGCTCCTGCTCGCGGCGAAGGCGGCCCGCGCGCACGCGGCGCTCCTCGGCCGGACCCGCGTCGAGGAGGAAGACGCCGGGATGGCGGCGCGCCTCGTCCTCGGCCCGAGGGCGACGCGAGCGCCGAGCGCCGAGCCCGAAGAGGCGGCCGAGCGCGAGGCCGAAGAGGGCGACCCCGACCCGGGAGGCGCCGGCGCGGCTGCAGCCGGCGAACGGGCGGACGGCGCCGACGAGGCCCCGCCGGGCGAGGTCCCGCTCGGCGATCTCGTCTTGGCGGCGGCCGAGAGCGGCATCCCGGCGGGGCTGCTCGACGCCCTCGACGTCGGGACCACCCGGCGGGCCGGCGCGACCGGTCGGAGCGGGGCGACGCGCATCGGCCCGAGCGGCGGCCGCCCGGCGGGGACGCGCGCCGCGCCGCCCACCCGAGGCCAGCGCCTGAACGTCGTCGAGACCCTCCGCGCCGCCGCGCCCTGGCAGCGGCTCCGCGGGGGCGGCTTCGGCGCGGGCGTGCGCGTCCGGCCGGAGGACTTCCGTGTCACCCGTCACCGGCAGCCGATCGAGAGCTGCGTGATCTTCGCCGTCGACGCGTCCGGCTCCGCCGCGCTTCGACGCCTGGCCGAGGCGAAGGGCGCCGTCGAGCGCGTGCTCGGCGACTGCTACGTGCGGCGCGACCACGTCGCCCTCGTCGCGTTCCGCCAGGACGGCGCCGAGCTGCTCCTGCCCCCGACGCGCTCCCTCGCCCGCGTGCGTCGCAGCCTGGCTGCCCTCGCCGGCGGCGGCGCGACCCCCCTCGCCGCGGGGATCGACGCCGCCCATCGGCTCGCCCTCGACGCCCGCGGGCGCGGCCGCGAGCCCATCGTGGTCGTCATGACCGACGGGCGGGCGAACGTGACCCGGGACGGCCGCCGGGACCCCGCGGTCGCCACCACGGACGCCCTCGAGAGCGCGCGCGGGCTCCAGCGAGCCGCCGTGCCGACCCTCTTCCTCGACACGGCCCCACGCCCCCGGCGCCGTGCCCGCGAGCTCGCCGAGGCCATGGACGCCCGCTACCTGCCGCTGCCCTACCTCGACGCGGCGGGGATCTCACGCCACGTCCAAGCGCTCGCCCGCGAGGGAGCCCGATGA bchIHouyibacteriaceae--LLY-WYZ-153---k141102864---1 ATGACGCCCTATCCCTTCACCGCCATCGTCGCGCAGGACGAGCTCAAGCTCGCCCTGCAGATCGCCACCGTCGACCGCAGCATCGGCGGGGTCCTCGCCTTCGGCGACCGCGGCACCGGCAAGTCGACCACCATCCGCGCGCTCGCCCGGCTCCTGCCGCCGATGCGCGTCGTCGCCAGCTGCCCGTACCACTGTGATCCGGCCGACGCGCGCGCTCGCTGTCCGCACTGTGCCGAAGCCGCAGGGGAGCGGGAGGCGATCGAGACGCCCGTGCCGGTCGTGGACCTGCCCCTCGGCGCCACCGAGGATCGCGTCGTCGGCGCGCTCGATCTCGAGGCGGCCCTCACGCGCGGGGAGCGCCGCTTCTCACCGGGCCTGCTCGCCGCGGCGCATCGAGGCTTCCTCTACATCGACGAGGTCAACCTCCTCCCCGATCACCTCGTGGATCTGCTGCTCGACGTCGCGGCCTCGGGCGAGAACGTGGTCGAGCGCGAGGGCCTGAGCGTGCGCCACCCCGCGCGCTTCGTGCTGATCGGCAGCGGAAACCCGGAGGAGGGCGAGCTGCGCCCCCAGCTGCTCGATCGCTTCGGCCTCTCGCTCGAGGTCCGCACGCCGGACGAGGTCGCGACGCGCGTCGAGGTCGTCAAGCGGCGCATGCGCTACGATCAGGACCCGGAGGCCTTCGCGGCCGCCTGGGCGGAGGACGAGGCGGCCCTCATCGTTCGCCTCCGGGACGCGCGGGCGCGCTTGCCCGAGGTGGCCGTCAGCGACGCCGTGATCGAGCGCGCGAGCCGGCTCTGCCAGGCGCTCGGCACCGACGGGCTCCGGGGGGAGCTGACCTTGATCCGCGCCGCGCGCGCGGCCGCCAGCCTCGACGCGCAGCGGGAGGTCGCCGACGTGCACCTCGCCCAGGTCGCCCCCCTCGCGCTCCGCCACCGGCTGCGACGCGCCCCCCTGGACGACGTCGGCTCGGGCGCGCGCGTGCAGAAGGCCGTCGAGGACGTGCTCGGGGGCTGA ```
2.3 Gene cluster plot (GC_plot)
Case 2: Using eggNOG (evolutionary gene genealogy Nonsupervised Orthologous Groups) format result
```r
Case 2: Using eggNOG result with Full pipeline (Find Cluster + Extract FASTA + Plot Cluster)
library(gclink)
data(eggnogdf)
data(seqdata)
data(KOgroup)
KOs = c("K02291","K09844","K20611","K13789",
"K09846","K08926","K08927","K08928",
"K08929","K13991","K04035","K04039",
"K11337","K03404","K11336","K04040",
"K03403","K03405","K04037","K03428",
"K04038","K06049","K10960","K11333",
"K11334","K11335","K08226","K08226",
"K09773")
renameKOs = paste0("ko:", KOs)
eggnogdf$qaccver = eggnogdf$#query
eggnogdf$saccver = eggnogdf$KEGGko
eggnogdf$evalue = eggnogdf$evalue
eggnogdf$bitscore = eggnogdf$score
eggnogdf$gene = eggnogdf$KEGGko
gclist2 = gclink(inblastpdf = eggnogdf,
inseqdata = seqdata,
ingenelist = renameKOs,
inGCgroup = KOgroup,
AllGeneNum = 50,
MinConSeq = 25,
applyevaluefilter = FALSE,
minevalue = 1,
applyscorefilter = TRUE,
minscore = 10,
orfbeforefirst = 1,
orfafterlast = 1,
levelsgenegroup = c('bch','puh','puf','crt',
'acsF','assembly','hypothetical ORF'),
colortheme = c('#3BAA51','#6495ED','#DD2421','#EF9320',
'#F8EB00','#FF0683','grey'))
gcmeta2 = gclist2[["GCmeta"]]
gcseq2 = gclist2[["GCseq"]]
gcplot2 = gclist2[["GCplot"]]
head(gcmeta2) # Cluster metadata
head(gcseq2) # FASTA sequences
print(gcplot2) # Visualization
```
1 Input Data Preview
1.1 A dataframe of Diamond BLASTp output from eggNOG (e.g., head(eggnog_df))
| #query | seedortholog | evalue | score | eggNOGOGs | maxannotlvl | COGcategory | Description | Preferredname | GOs | EC | KEGGko | KEGGPathway | KEGGModule | KEGGReaction | KEGGrclass | BRITE | KEGGTC | CAZy | BiGGReaction | PFAMs | |--------|--------------|--------|-------|------------|---------------|--------------|-------------|---------------|-----|----|---------|--------------|-------------|---------------|-------------|-------|---------|------|---------------|-------| | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.11 | 439375.Oant2732 | 1.57E-45 | 162 | COG3293@1|root,COG3293@2|Bacteria,1PVIT@1224|Proteobacteria,2TURP@28211|Alphaproteobacteria,1J3RT@118882|Brucellaceae | 28211|Alphaproteobacteria | L | Transposase DDE domain | - | - | - | ko:K07492 | - | - | - | - | ko00000 | - | - | - | DDETnp1,DDETnp12,DUF4096 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.12 | 1173264.KI913949gene2450 | 3.58E-17 | 83.6 | COG3335@1|root,COG3415@1|root,COG3335@2|Bacteria,COG3415@2|Bacteria,1G39S@1117|Cyanobacteria,1HCKE@1150|Oscillatoriales | 1117|Cyanobacteria | L | COGs COG3415 Transposase and inactivated derivatives | - | - | - | ko:K07494 | - | - | - | - | ko00000 | - | - | - | DDE3,HTH32,HTHTnpIS630 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.13 | 794903.OPIT503400 | 3.03E-30 | 114 | COG3335@1|root,COG3335@2|Bacteria | 2|Bacteria | L | DDE superfamily endonuclease | - | - | - | ko:K07494 | - | - | - | - | ko00000 | - | - | - | DDE3,HTHTnpIS630 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.15 | 502025.Hoch2790 | 2.78E-50 | 191 | 2AY84@1|root,31QA9@2|Bacteria,1QMYF@1224|Proteobacteria,4374U@68525|delta/epsilon subdivisions,2X20E@28221|Deltaproteobacteria,2YWTZ@29|Myxococcales | 28221|Deltaproteobacteria | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.111 | 105420.BBPO01000003gene1121 | 2.00E-11 | 72.8 | COG2887@1|root,COG2887@2|Bacteria,2GJC5@201174|Actinobacteria,2NGJC@228398|Streptacidiphilus | 201174|Actinobacteria | L | Protein of unknown function (DUF2800) | recB | - | - | ko:K07465 | - | - | - | - | ko00000 | - | - | - | PDDEXK1 | | Kuafuiibacteriaceae--GCA016703535.1---JADJBV010000001.112 | 1122915.AUGY01000071gene4398 | 2.13E-37 | 152 | COG1201@1|root,COG1201@2|Bacteria,1UHYQ@1239|Firmicutes,4ISB0@91061|Bacilli,277Q5@186822|Paenibacillaceae | 91061|Bacilli | L | helicase superfamily c-terminal domain | - | - | - | - | - | - | - | - | - | - | - | - | DUF1998,Helicase_C |
1.2 (Optional) A dataframe with SeqName (ORF identifier, Prodigal format: ORFid # start # end # strand # ...) and Sequence (e.g., head(`seqdata`))
Same with Case 1
1.3 (Optional) KO/gene group (e.g., head(KO_group))
| gene | genegroup | genelabel | |------------|------------|------------| | ko:K04035 | acsF | acsF | | ko:K08226 | assembly | bch2 | | ko:K04039 | bch | B | | ko:K11337 | bch | C | | ko:K03404 | bch | D | | ko:K11336 | bch | F |
1.4 (Optional) Candidate KO/gene list
ko:K04035 ko:K08226 ko:K04039 ko:K11337 ko:K03404 ko:K11336
2 Output Data Preview
2.1 Gene cluster information (GC_meta)
Similar with Case 1
2.2 Gene cluster sequence (GC_seq)
Similar with Case 1
2.3 Gene cluster plot (GC_plot)
Documentation
Full function reference:
r
?gclink::gclink
Citation
If you use gclink in your research, please cite:
Li, L., Huang, D., Hu, Y., Rudling, N. M., Canniffe, D. P., Wang, F., & Wang, Y. "Globally distributed Myxococcota with photosynthesis gene clusters illuminate the origin and evolution of a potentially chimeric lifestyle." Nature Communications (2023), 14, 6450. https://doi.org/10.1038/s41467-023-42193-7
Dependencies
- R (≥ 3.5)
- dplyr (≥ 1.1.4)
- gggenes (≥ 0.5.1)
- ggplot2 (≥ 3.5.2)
License
GPL-3 © Liuyang Li
Contact
- Maintainer: Liuyang Li cyanobacteria@yeah.net
- Bug reports: https://github.com/LiuyangLee/gclink/issues
Owner
- Name: Liuyang Li
- Login: LiuyangLee
- Kind: user
- Company: Shanghai Jiaotong University
- Repositories: 1
- Profile: https://github.com/LiuyangLee
GitHub Events
Total
- Watch event: 1
- Push event: 3
- Create event: 7
Last Year
- Watch event: 1
- Push event: 3
- Create event: 7
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
- Total maintainers: 1
cran.r-project.org: gclink
Gene-Cluster Discovery, Annotation and Visualization
- Homepage: https://github.com/LiuyangLee/gclink
- Documentation: http://cran.r-project.org/web/packages/gclink/gclink.pdf
- License: GPL-3
-
Latest release: 1.1
published 10 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.5 depends
- dplyr >= 1.1.4 imports
- gggenes >= 0.5.1 imports
- ggplot2 >= 3.5.2 imports
- utils * imports