Recent Releases of proteinfamilies
proteinfamilies - Nickel Spider
Changed
- #104 - Pulling
paramsfrom local subworkflows into main workflow. - #103 - Parallelized execution for the
EXTRACT_FAMILY_REPSlocal module and changed its input fromfull_msatofasta. - #100 -
CAT_CATmodule replaced withFIND_CONCATENATEto avoid large scaleArgument list too longerrors. - #98 - nf-core tools template update to 3.3.2.
Added
- #105 -
CHECK_QUALITYsubworkflow added at the start of the pipeline. It utilizes theseqkit/statsnf-core module to generate aMultiQC-ready report with statistics for the input amino acid sequences. The metro-map has been updated to reflect this change.
- Nextflow
Published by vagkaratzas 7 months ago
proteinfamilies - Aluminium Frog
Added
- #93
- Added nf-test and
meta.ymlfile for local subworkflowGENERATE_FAMILIES. - Added nf-test and
meta.ymlfile for local subworkflowREMOVE_REDUNDANCY. - Added nf-test and
meta.ymlfile for local subworkflowUPDATE_FAMILIES.
- Added nf-test and
- #88
- Added nf-test and
meta.ymlfile for local moduleBRANCH_HITS_FASTA. - Added nf-test and
meta.ymlfile for local moduleFILTER_NON_REDUNDANT_FAMS. - Added nf-test and
meta.ymlfile for local moduleIDENTIFY_REDUNDANT_FAMS. - Added nf-test and
meta.ymlfile for local moduleEXTRACT_FAMILY_REPS. - Added the default pipeline end-to-end nf-test.
- Added nf-test and
Changed
- #81 - nf-core tools template update to 3.3.1.
Fixed
- #80 - Fixed a bug where, due to a missing check for equal family sizes, non-redundant families were erroneously marked as redundant through transitive relationships and were removed
- Nextflow
Published by vagkaratzas 9 months ago
proteinfamilies - Lead Sparrow
Changed
- #77 - Default branch changed from
mastertomain. - #73 - Changed the fasta parsing library of the
CHUNK_CLUSTERSlocal module, frompyfastxback to the latest version ofbiopython, and parallelized its writing mechanism, achieving decreased execution time.
Dependencies
| Tool | Previous version | New version | | --------- | ---------------- | ----------- | | biopython | 1.84 | 1.85 | | pyfastx | 2.2.0 | |
Removed
- #73 - Deprecated
pyfastxmodule version ofCHUNK_CLUSTERS, since it was struggling performance-wise with larger datasets.
- Nextflow
Published by vagkaratzas 9 months ago
proteinfamilies - Nickel Elephant
Added
- #69 - Added the
hhsuite/reformatnf-core module to reformat.stoalignments to.faswhen in-family sequence redundancy is not removed. Also added the option to save intermediate and final family fasta files throughout the workflow with varioussaveparameters. - #58 - Added nf-test and
meta.ymlfile for local moduleREMOVE_REDUNDANCY_SEQS(Hackathon 2025) - #56 - Added nf-test and
meta.ymlfile for local moduleFILTER_RECRUITED(Hackathon 2025) - #55 - Added nf-test and
meta.ymlfile for local moduleCHUNK_CLUSTERS(Hackathon 2025) - #54 - Added nf-test for local subworkflow
ALIGN_SEQUENCES(Hackathon 2025) - #53 - Added nf-test for local subworkflow
EXECUTE_CLUSTERING(Hackathon 2025) - #51 - Added nf-test and
meta.ymlfile for local moduleCALCULATE_CLUSTER_DISTRIBUTION(Hackathon 2025) - #34 - Added the
EXTRACT_UNIQUE_CLUSTER_REPSmodule, that calculates initialMMseqsclustering metadata, for each sample, to print withMultiQC(Id,Cluster Size,Number of Clusters)
Fixed
- #69 - Fixed a bug where redundant family alignments were not published properly, if intra-family redundancy removal mechanism was switched off #68
- #65 - Fixed a bug in
CHUNK_CLUSTERS, where pipeline would crash if the module filtered out all clusters, due to a high membership threshold #64 - #35 - Fixed a bug in
remove_redundant_fams.py, where comparison was between strings instead of integers to keep larger family - #33 - Fixed an always-true condition at the
filter_non_redundant_hmms.pyscript, by adding missing parentheses - #29 - Fixed
hmmalignempty input crash error, by preventing theFILTER_RECRUITEDmodule from creating an empty output .fasta.gz file, when there are no remaining sequences after filtering thehmmsearchresults #28
Changed
- #69 - Changed the publish directory architecture for HMMs, seed MSAs, full MSAs and family FASTA files, to make it more intuitive.
REMOVE_REDUNDANT_FAMSlocal module converted toIDENTIFY_REDUNDANT_FAMSto extract redundant family ids which will then be used downstream.FILTER_NON_REDUNDANT_HMMSlocal module converted toFILTER_NON_REDUNDANT_FAMSand reused four times (HMM, seed MSA, full MSA, FASTA). Changed the output format of theEXTRACT_FAMILY_REPSandREMOVE_REDUNDANT_SEQSlocal modules from.fato.faa. Metro map updated with newhhsuite/reformatmodule. - #57 - slight improvements of
nextflow_schema.json(Hackathon 2025) - #57 - slight improtmenets of
assets/schema_input.json(Hackathon 2025) - #34 - Swapped the
SeqIOpython library withpyfastxfor theCHUNK_CLUSTERSmodule, quartering its duration - #32 - Updated
ClipKIT2.4.0 -> 2.4.1, that now also allows ends-only trimming, to completely replace the customCLIP_ENDSmodule. Users can now also define its output format by setting the--clipkit_out_formatparameter (default:clipkit)
Dependencies
| Tool | Previous version | New version | | ------- | ---------------- | ----------- | | ClipKIT | 2.4.0 | 2.4.1 | | pyfastx | | 2.2.0 | | hhsuite | | 3.3.0 | | multiqc | 1.27 | 1.28 |
Deprecated
- #32 - Deprecated
CLIP_ENDSmodule and--clipping_toolparameter. The only option now isClipKIT, covering both previous modes, via setting--trim_ends_only
- Nextflow
Published by vagkaratzas 10 months ago
proteinfamilies - Iron Rhinoceros
Initial release of nf-core/proteinfamilies, created with the nf-core template.
Added
- Amino acid sequence clustering (mmseqs)
- Multiple sequence alignment (famsa, mafft, clipkit)
- Hidden Markov Model generation (hmmer)
- Between families redundancy removal (hmmer)
- In-family sequence redundancy removal (mmseqs)
- Family updating (hmmer, seqkit, mmseqs, famsa, mafft, clipkit)
- Family statistics presentation (multiqc)
By @vagkaratzas and @mberacochea.
- Nextflow
Published by vagkaratzas about 1 year ago
proteinfamilies - Iron Rhinoceros
Initial release of nf-core/proteinfamilies, created with the nf-core template.
Added
- Amino acid sequence clustering (mmseqs)
- Multiple sequence alignment (famsa, mafft, clipkit)
- Hidden Markov Model generation (hmmer)
- Between families redundancy removal (hmmer)
- In-family sequence redundancy removal (mmseqs)
- Family updating (hmmer, seqkit, mmseqs, famsa, mafft, clipkit)
- Family statistics presentation (multiqc)
By @vagkaratzas and @mberacochea.
- Nextflow
Published by vagkaratzas about 1 year ago