Recent Releases of mosca
mosca - Merging of paired reads when no assembly is performed
MOSCA was calling genes directly from the preprocessed reads.
Now, it merges paired-end reads first, and then calls the genes on those reads.
When gene calling, MOSCA still considers the data as reads (-complete=0), not complete genomes (-complete=1).
Update on sortmerna functions
SortMeRNA databases have been updated, and are now provided as a tar file multiple database files. Each of these databases can be used separately for a specific type of search. MOSCA now provides the sortmerna_database parameter, which sets which database will be used:
* if fast, MOSCA will use the smr_v4.3_fast_db.fasta database.
* if default, MOSCA will use the smr_v4.3_ default_db.fasta database.
* if sensitive, MOSCA will use the smr_v4.3_sensitive_db.fasta database.
* if sensitive_with_rfam, MOSCA will use the smr_v4.3_sensitive_db_rfam_seeds.fasta database.
Only one database file can be used at a time.
minimum_read_length parameter split for MG and MT
Now, minimum length of reads for further analysis can be set with the minimum_mg_read_length and minimum_mt_read_length parameters.
Added minimum_envs folder and contents
For commands and resources to update envs when needed
Also, some fixes
- Converting readcounts (for MG and MT) to
intwas turning them all to zeros (because they are normalized). MOSCA now keeps them asfloat. - Blocked the print of MOSCA's TXT logo. Don't know why it doesn't work on the tests.
- Fix on
Summary Report, now rows have information for both "Name" and "Sample" levels (before, there were rows for "Name" and rows for "Sample"). - Another fix on
Summary Report, counting annotated genes was not done properly. - When not performing assembly,
General Reportwas not importing correctly the readcounts. Now, it does.
- Python
Published by iquasere about 2 years ago
mosca - Added default parameters JSON
I hadn't updated MOSCA's recipe in Bioconda to include the new default_config.json file. This release has no code updates, but serves to include the file in MOSCA's recipe.
- Python
Published by iquasere about 2 years ago
mosca - Default parameters, input sanitization and final reports updates
MOSCA now has default parameters
These default parameters are set by the default_config.json file.
Input quality checking
Implemented checking of invalid names in experiments - names can't start with number, even a float (e.g., 5AA or .5Name).
Updates on final reports
Renamed Protein report to General report.
New report - Expression. This report includes only genes expressed.
Technical report was renamed to Versions. It is also exported as EXCEL now, because it brings information on every environment.
Implemented minimum value imputation
For MP analysis, but it's still not an option to use. For now, is a feature in preparation.
No more build_deps in Dockerfile
It's no longer needed, conda handles it all.
Dependencies update
- Fixed
snakemakeversion to<8- some of its new functionality is incompatible with MOSCA implementation. - Added
pandasas dependency -mosca.pynow has functions that require it. - Updated to newest versions of UPIMAPI, reCOGnizer and KEGGCharter - allowed to remove the parameters related to database download.
Blocked MGMT test
Because GitHub actions doesn't provide enough disk space for it.
Also, several fixes
- Fix on DE handling multiple samples
- Fix on KEGGCharter handling multiple samples
- Fix on multi_sheet excel handling multiple samples and numbering
- Fix on converting RAW spectra to MGF outside a container environment
- MOSCA now prints snakemake command properly
- Fix on adding normalized matrices to entry report
- Several fixes on summary report
- Necessary reparations on EC numbers and KEGG IDs, as those come from UPIMAPI in non-compatible format for KEGGCharter
- Fix on inputting mods to
generate_parameters_filefunction
- Python
Published by iquasere about 2 years ago
mosca - Reintroduction of MOSCA into Bioconda
Reintroduction of MOSCA into Bioconda
Since MOSCA 1.3.6, the list of dependencies of the pipeline has become too complex for conda to manage.
This release makes use of snakemake environments to simplify the minimal environment required to install MOSCA. MOSCA ow only requires snakemake.
Now MOSCA uses snakemake's rules
All the rules have been moved to corresponding .smk files. This has simplified a lot the main script. Script files can no longer be run through the command line, however. Interface is with snakemake directly. First step into producing a web-service.
Added schema for validating config.json
config.schema.yaml checks if all needed informations are present, and in correct format, on the input config file.
New parameter
metaproteomics_add_reference_proteomes: New option for not searching for reference proteomes for organisms identified. Helps save a lot of time during Peptide-to-Spectrum Matching.
Tests have been reformatted
Complete MGMP has been reintroduced, however, it still fails for too much disk usage. It'll be a problem for another time.
Several fixes and improvements
params.method was not being correctly read on de_analysis.R.
config.json is now explicitly required.
tmp directory when handling SortMeRNA is created inside SortMeRNA output directory.
Removed pandas warnings concerning reading files without low_memory=False.
Memory allocated in metaproteomics now in G instead of M.
Removed UPIMAPI apt dependencies - are no longer needed.
Fix on reading method for normalization.
Fix on parsing conditions in de_analysis.R.
- Python
Published by iquasere almost 3 years ago
mosca - Metaproteogenomics - a new level of omics analyses
New workflow of metaproteomics analyses, based on metagenomics (MG) results.
This new layer of analysis allows to input spectra - both in raw and standard formats - to MOSCA for metaproteomics (MP) analysis
MOSCA's MP workflow is as follows:
1. Database construction
A database is built from MG results, aiming to include all sequences that can possibly be in the datasets. This include: * the genes identified by FragGeneScan on the MG gene calling step * reference proteomes retrieved from UniProt of the taxa identified in the annotation step with UPIMAPI * the cRAP database * the protease sequence - only automatically available sequence is Trypsin for now, all others must be inputted manually
This database will then be submitted for a first round of Peptide-to-Spectrum matching with SearchCLI and PeptideShaker. All proteins with at least one Peptide-to-Spectrum match (PSM) are collected for the final database - the metaproteogenomics database.
2. Peptide-to-Spectrum matching
SearchCLI is used for obtaining PSMs from inputted spectra, using as reference the database constructed in the previous step. SearchCLI is used with three search engines - X!Tandem, MyriMatch and MS-GF+. More engines might be added in the future.
3. Protein inference
PeptideShaker is used for protein inference and quantification, based on spectracounts. PSMs are selected at a 5 % local False Discovery Rate, and only peptides with two or more PSMs and only proteins with two or more peptides identified are selected for further analysis
4. Normalization, imputation and differential protein expression analysis
Spectracounts are normalized with Variance Stabilizing Normalization. Missing values are imputed using Local Least Squares Imputation.
Normalized and imputed spectracounts are then submitted for differential protein expression analysis with Reproducibility-Optimized Test Statistics. Log2foldchange and p-values are retrieved for reporting.
5. Metabolic pathway representation and final reportings
All following steps are performed as close as possible to metatranscriptomics (gene expression) analysis.
Metabolic maps are built with KEGGCharter, showing protein expression levels from MP and genomic potential from MG.
Final reports include all results from MG, and report on differential expression analysis of proteins.
Other updates
MOSCA has increased its workflow in around 40 %.
MOSCA is now compatible with the six months old updates of UniProt, through UPIMAPI. It includes the parsing of taxonomic columns, to continue representing taxonomic kronas.
Snakemake conda environments are now used, instead of one single environment. This has made possible again to build MOSCA's environments, and may signal the return of MOSCA to Bioconda.
- Python
Published by iquasere about 3 years ago
mosca - Re-added KEGGCharter to workflow
KEGGCharter is again run from "MOSCAEntryReport". Changed its output filename in the rule because the tool now only outputs in TSV.
Also some fixes in environment.yml
- fixed perl version
- added subversion
- Python
Published by iquasere over 3 years ago
mosca - Stand-alone metatranscriptomics worflow implemented
Metatranscriptomics can be used as reference without metagenomics
- If MG is not inputted, MT will be used for the MG part of MOSCA's workflow - assembly, binning, gene calling and annotation.
- Trinity and RNAspades now available as assembler options
- rule
join_readsnow considers possibility of MT as reference
Changes in config.json
experiments.tsvintegrated intoconfig.jsonas a parameter (list of dictionaries)- adapted config.json column names to MOSGUITO
- New parameter - "suffix"
- This parameter allows to specify a suffix to follow the
_R1/_R2special characters in files names, MOSCA will consider that those characters are followed by the "suffix" (e.g.,_L001would serve for the filesmg_R1_L001.fqandmg_R2_L001.fq)
- This parameter allows to specify a suffix to follow the
Adaptations for new versions of tools
- SortMeRNA 4 fully implemented
- Always gzips SortMeRNA output
- UPIMAPI used directly instead of DIAMOND
- MOSCA now accepts UPIMAPI's three options for database: "taxids", "uniprot" or "swissprot"
- Small adjustment on CI to allow running reCOGnizer with mini
cdd.tar.gz - Fixed krona version (to
2.5) for compatibility with MaxBin2 - MaxBin2 dependencies are presenting problems for higher versions, and krona's more recent versions would force to install those damaged dependencies
Added technical files, removed old scripts
- added
.gitignore join_information.pydeprecated, replaced by mosca_tools functions and rules inSnakefile
Changes in environment and CI files
install.bashno longer installs mamba- added gmcloser to
environment.yml - added simplified
cdd.tar.gzfor CI - added test for complete workflow of MOSCA
- new default for
max-ref-numberwith metaquast - is now 0 to allow running CI
Miscellaneous fixes
- fix on snakefile - checks if "Name" in "experiments" is ""
- bins and DE results go to the folders of their respective "samples"
- several fixes on reporting
- fix on alignment functions in
mosca_tools.py - fix on de_analysis.R
- fix on obtaining directories for Illumina adapters and rRNA databases on preprocessing step
- Python
Published by iquasere over 3 years ago
mosca - Fixed high quality bins evaluation
MOSCA was evaluating wrongly the high quality bins.
Best probability threshold is now written at the end of iterative binning.
Assigned minus 1 thread in Snakefile for quantification rule.
* Allows upimapi to run simultaneously.
metaSPAdes upped to version 3.15 to not run out of memory.
Fixed some bugs in name assignment.
- Python
Published by iquasere over 4 years ago
mosca - Iterative binning for best binning
do_iterative_binning option now available!
* Iterative binning cycles between MaxBin and CheckM - MaxBin obtains the bins, CheckM checks their quality
* Iterative binning cycles by many probability thresholds to determine the value for the best binning
New option for differential expression - minimum_fold_change!
* Determine padj for up or down expression, instead of just 0 difference
- Python
Published by iquasere over 4 years ago
mosca - Can now be installed from source code
Automatic setup from source code is now functional, and suggested installation method is through the bash script.
- Python
Published by iquasere over 4 years ago
mosca - Continuous Integration is set
But at what cost?
Install with mamba seems to be required now. No problem on that, just sad for conda
MetaQUAST does not download references for now - in the future, it will become argument, so CI can happen on assembly
- Python
Published by iquasere almost 5 years ago
mosca - Can now run all parts without replicates (except DE analysis)
MOSCA cannot be run without replicates, but now the partial running of its functionalities is possible in all steps except for differential expression
* added option --no-differential-expression to reporter script, set if haven't performed differential expression analysis
Memory is now read in Gb
Several fixes in reporter columns
- Python
Published by iquasere almost 5 years ago
mosca - Option for downloading CDD resources
New option added for configuration: download_cdd.
* to be used if MOSCA has already been run once
* removes the --download-resources parameter from reCOGnizer's command
* also impacts the downloading of the other resources of reCOGnizer
- Python
Published by iquasere almost 5 years ago
mosca - Fixed mismanagement of read fixing
Reads fixed after rRNA removal were being trimmed in the last 2 characters of read name. This is now fixed.
Also, MOSCA now removes reads with less than 20 nucleotides after adapter removal, to avoid the spam from SortMeRNA.
- Python
Published by iquasere about 5 years ago
mosca - Reimplemented fixing of reads after rRNA removal of paired-end
After unmerging reads, sometimes they will come messed up in a variety of ways: * preprocessing removes reads messed up (not four lines long) * also removes orphans - reads with their pair missing
- Python
Published by iquasere about 5 years ago
mosca - Minor fix in adapter removal
Adapter removal now considers the "Adapter content" module of FastQC
- Python
Published by iquasere about 5 years ago
mosca - Run MT without MG
Run MT without MG * MOSCA is now capable of running MT without MG * Combined with approach to not use assembly * In the future might implement the use of assembly to analyze MT alone
Also added rejected and accepted files from rRNA removal to list of files to delete at preprocessing
- Python
Published by iquasere about 5 years ago
mosca - Names can be customized
Names can be customized * also several fixes on preprocessing of SE reads * several methods for metaproteomics with Compomics software, still not reachable * MINLEN and AVGQUAL now directly set from parameters * shell commands now printed
Also several adaptations to reCOGnizer version 1.4
- Python
Published by iquasere about 5 years ago
mosca - Several major fixes concerning snakemake integration
- scripts now called from the Snakefile directory
- added taxonomic and proteomics analysis tools
- added unlock option to mosca's main script
- adapted several methods for MEGAHIT use
- added bash script for downloading adapters and rRNA databases
- now accepts experiments files in EXCEL format
- config files from MOSGUITO now work integrally
- Python
Published by iquasere about 5 years ago
mosca - Snakemake's integration into MOSCA
- MOSCA's main script is now wrapper to Snakefile
- Snakefile reads from config file, specified through the main script
- Experiments file specifies all information from --files, --samples and --conditions
- Removed all legacy scripts (tools' specific classes) except for rnaseqsimer and reab (to be removed in the future)
- mosca_tools no longer in classes - all functions
- Python
Published by iquasere over 5 years ago
mosca - Several improvements in reporting
- Moved the lists of columns of the final quality report to external files
- No more report on % of reads/contigs/ORFs, only quantifications
- Reporting now reaches Binning and Gene expression analysis
- Python
Published by iquasere over 5 years ago
mosca - New tools and integration into bioconda
New tools, same functionalities * reCOGnizer now a standalone tool for domain-based functional annotation * UPIMAPI now a standalone tool for retrieving information from UniProt using its API * KEGGCharter now a standalone tool for mapping genomic potential and gene expression information into KEGG metabolic pathways * Allowed to easily solve hella lot of bugs (and publish a little more :D)
Integration into Bioconda * Some adaptations in the main scripts * meta.yaml hopefully well configured * MOSCA will now have the new tools as dependencies
- Python
Published by iquasere over 5 years ago
mosca - MOSCA 1.0
Dockerfile was fixed, and MOSCA's image is now fully functional again!
- Python
Published by iquasere about 6 years ago
mosca - MOSCA 1.0
Same tool, new features Meta-Omics Software for Community Analysis was previsously published as a pipeline for integrated metagenomics (MG) and metatranscriptomics (MT) data analysis. This new version includes the integration of a metaproteomics (MP) workflow for raw data analysis and binning for contigs resulting from assembly, and improvements on the assemly and annotation steps.
Integration of metaproteomics: Two workflows have been developed for the handling of MP spectra, both for label-free quantification and analysis. While this feature is not complete, it already allows for identification and quantification of proteins, using the results from the MG workflow as a reference database.
Integration of binning: contigs resulting from MG assembly can now be binned into contigs. In the next few weeks, CheckM will be integrated for quality check of bins.
Improvement of assembly: assembly was performed with one MG sample at a time. This made sense in the scenario where each MG sample came from a different microbial community, but if some samples come from the same community, it is advisable to perform assembly with all data together. It is now possible to perform 3 different assembly strategies in MOSCA: "unique" for performing assembly as previously, one assembly for MG sample; "all" for performing assembly with all MG data together, use it only if the samples come from very similar communities; "samples", which requires the "--mg-samples" to specify which samples go together for each assembly/annotation.
Improvement of annotation: domain annotation with PSI-BLAST and COG database is now possible within MOSCA. This type of annotation is focused on functional classification by identifying domains specific to metabolic and/or structural functions.
- Python
Published by iquasere about 6 years ago