Releases | Open Source Science

mosca - Merging of paired reads when no assembly is performed

MOSCA was calling genes directly from the preprocessed reads. Now, it merges paired-end reads first, and then calls the genes on those reads. When gene calling, MOSCA still considers the data as reads (-complete=0), not complete genomes (-complete=1).

Update on `sortmerna` functions

SortMeRNA databases have been updated, and are now provided as a tar file multiple database files. Each of these databases can be used separately for a specific type of search. MOSCA now provides the sortmerna_database parameter, which sets which database will be used: * if fast, MOSCA will use the smr_v4.3_fast_db.fasta database. * if default, MOSCA will use the smr_v4.3_ default_db.fasta database. * if sensitive, MOSCA will use the smr_v4.3_sensitive_db.fasta database. * if sensitive_with_rfam, MOSCA will use the smr_v4.3_sensitive_db_rfam_seeds.fasta database.

Only one database file can be used at a time.

`minimum_read_length` parameter split for MG and MT

Now, minimum length of reads for further analysis can be set with the minimum_mg_read_length and minimum_mt_read_length parameters.

Added minimum_envs folder and contents

For commands and resources to update envs when needed

Also, some fixes

Converting readcounts (for MG and MT) to int was turning them all to zeros (because they are normalized). MOSCA now keeps them as float.
Blocked the print of MOSCA's TXT logo. Don't know why it doesn't work on the tests.
Fix on Summary Report, now rows have information for both "Name" and "Sample" levels (before, there were rows for "Name" and rows for "Sample").
Another fix on Summary Report, counting annotated genes was not done properly.
When not performing assembly, General Report was not importing correctly the readcounts. Now, it does.

- Python
Published by iquasere about 2 years ago

mosca - Added default parameters JSON

I hadn't updated MOSCA's recipe in Bioconda to include the new default_config.json file. This release has no code updates, but serves to include the file in MOSCA's recipe.

- Python
Published by iquasere about 2 years ago

mosca - Default parameters, input sanitization and final reports updates

MOSCA now has default parameters

These default parameters are set by the default_config.json file.

Input quality checking

Implemented checking of invalid names in experiments - names can't start with number, even a float (e.g., 5AA or .5Name).

Updates on final reports

Renamed Protein report to General report. New report - Expression. This report includes only genes expressed. Technical report was renamed to Versions. It is also exported as EXCEL now, because it brings information on every environment.

Implemented minimum value imputation

For MP analysis, but it's still not an option to use. For now, is a feature in preparation.

No more build_deps in Dockerfile

It's no longer needed, conda handles it all.

Dependencies update

Fixed snakemake version to <8 - some of its new functionality is incompatible with MOSCA implementation.
Added pandas as dependency - mosca.py now has functions that require it.
Updated to newest versions of UPIMAPI, reCOGnizer and KEGGCharter - allowed to remove the parameters related to database download.

Blocked MGMT test

Because GitHub actions doesn't provide enough disk space for it.

Also, several fixes

Fix on DE handling multiple samples
Fix on KEGGCharter handling multiple samples
Fix on multi_sheet excel handling multiple samples and numbering
Fix on converting RAW spectra to MGF outside a container environment
MOSCA now prints snakemake command properly
Fix on adding normalized matrices to entry report
Several fixes on summary report
Necessary reparations on EC numbers and KEGG IDs, as those come from UPIMAPI in non-compatible format for KEGGCharter
Fix on inputting mods to generate_parameters_file function

- Python
Published by iquasere about 2 years ago

mosca - Reintroduction of MOSCA into Bioconda

Reintroduction of MOSCA into Bioconda

Since MOSCA 1.3.6, the list of dependencies of the pipeline has become too complex for conda to manage. This release makes use of snakemake environments to simplify the minimal environment required to install MOSCA. MOSCA ow only requires snakemake.

Now MOSCA uses snakemake's rules

All the rules have been moved to corresponding .smk files. This has simplified a lot the main script. Script files can no longer be run through the command line, however. Interface is with snakemake directly. First step into producing a web-service.

Added schema for validating config.json

config.schema.yaml checks if all needed informations are present, and in correct format, on the input config file.

New parameter

metaproteomics_add_reference_proteomes: New option for not searching for reference proteomes for organisms identified. Helps save a lot of time during Peptide-to-Spectrum Matching.

Tests have been reformatted

Complete MGMP has been reintroduced, however, it still fails for too much disk usage. It'll be a problem for another time.

Several fixes and improvements

params.method was not being correctly read on de_analysis.R. config.json is now explicitly required. tmp directory when handling SortMeRNA is created inside SortMeRNA output directory. Removed pandas warnings concerning reading files without low_memory=False. Memory allocated in metaproteomics now in G instead of M. Removed UPIMAPI apt dependencies - are no longer needed. Fix on reading method for normalization. Fix on parsing conditions in de_analysis.R.

- Python
Published by iquasere almost 3 years ago

mosca - Metaproteogenomics - a new level of omics analyses

New workflow of metaproteomics analyses, based on metagenomics (MG) results.

This new layer of analysis allows to input spectra - both in raw and standard formats - to MOSCA for metaproteomics (MP) analysis

MOSCA's MP workflow is as follows:

1. Database construction

A database is built from MG results, aiming to include all sequences that can possibly be in the datasets. This include: * the genes identified by FragGeneScan on the MG gene calling step * reference proteomes retrieved from UniProt of the taxa identified in the annotation step with UPIMAPI * the cRAP database * the protease sequence - only automatically available sequence is Trypsin for now, all others must be inputted manually

This database will then be submitted for a first round of Peptide-to-Spectrum matching with SearchCLI and PeptideShaker. All proteins with at least one Peptide-to-Spectrum match (PSM) are collected for the final database - the metaproteogenomics database.

2. Peptide-to-Spectrum matching

SearchCLI is used for obtaining PSMs from inputted spectra, using as reference the database constructed in the previous step. SearchCLI is used with three search engines - X!Tandem, MyriMatch and MS-GF+. More engines might be added in the future.

3. Protein inference

PeptideShaker is used for protein inference and quantification, based on spectracounts. PSMs are selected at a 5 % local False Discovery Rate, and only peptides with two or more PSMs and only proteins with two or more peptides identified are selected for further analysis

4. Normalization, imputation and differential protein expression analysis

Spectracounts are normalized with Variance Stabilizing Normalization. Missing values are imputed using Local Least Squares Imputation.

Normalized and imputed spectracounts are then submitted for differential protein expression analysis with Reproducibility-Optimized Test Statistics. Log2foldchange and p-values are retrieved for reporting.

5. Metabolic pathway representation and final reportings

All following steps are performed as close as possible to metatranscriptomics (gene expression) analysis.

Metabolic maps are built with KEGGCharter, showing protein expression levels from MP and genomic potential from MG.

Final reports include all results from MG, and report on differential expression analysis of proteins.

Other updates

MOSCA has increased its workflow in around 40 %.

MOSCA is now compatible with the six months old updates of UniProt, through UPIMAPI. It includes the parsing of taxonomic columns, to continue representing taxonomic kronas.

Snakemake conda environments are now used, instead of one single environment. This has made possible again to build MOSCA's environments, and may signal the return of MOSCA to Bioconda.

- Python
Published by iquasere about 3 years ago

mosca - Re-added KEGGCharter to workflow

KEGGCharter is again run from "MOSCAEntryReport". Changed its output filename in the rule because the tool now only outputs in TSV.

Also some fixes in environment.yml

fixed perl version
added subversion

- Python
Published by iquasere over 3 years ago

mosca - Stand-alone metatranscriptomics worflow implemented

Metatranscriptomics can be used as reference without metagenomics

If MG is not inputted, MT will be used for the MG part of MOSCA's workflow - assembly, binning, gene calling and annotation.
Trinity and RNAspades now available as assembler options
rule join_reads now considers possibility of MT as reference

Changes in config.json

experiments.tsv integrated into config.json as a parameter (list of dictionaries)
adapted config.json column names to MOSGUITO
New parameter - "suffix"
- This parameter allows to specify a suffix to follow the _R1/_R2 special characters in files names, MOSCA will consider that those characters are followed by the "suffix" (e.g., _L001 would serve for the files mg_R1_L001.fq and mg_R2_L001.fq)

Adaptations for new versions of tools

SortMeRNA 4 fully implemented
Always gzips SortMeRNA output
UPIMAPI used directly instead of DIAMOND
- MOSCA now accepts UPIMAPI's three options for database: "taxids", "uniprot" or "swissprot"
Small adjustment on CI to allow running reCOGnizer with mini cdd.tar.gz
Fixed krona version (to 2.5) for compatibility with MaxBin2 - MaxBin2 dependencies are presenting problems for higher versions, and krona's more recent versions would force to install those damaged dependencies

Added technical files, removed old scripts

added .gitignore
join_information.py deprecated, replaced by mosca_tools functions and rules in Snakefile

Changes in environment and CI files

install.bash no longer installs mamba
added gmcloser to environment.yml
added simplified cdd.tar.gz for CI
added test for complete workflow of MOSCA
new default for max-ref-number with metaquast - is now 0 to allow running CI

Miscellaneous fixes

fix on snakefile - checks if "Name" in "experiments" is ""
bins and DE results go to the folders of their respective "samples"
several fixes on reporting
fix on alignment functions in mosca_tools.py
fix on de_analysis.R
fix on obtaining directories for Illumina adapters and rRNA databases on preprocessing step

- Python
Published by iquasere over 3 years ago

mosca - Fixed high quality bins evaluation

MOSCA was evaluating wrongly the high quality bins.

Best probability threshold is now written at the end of iterative binning.

Assigned minus 1 thread in Snakefile for quantification rule. * Allows upimapi to run simultaneously.

metaSPAdes upped to version 3.15 to not run out of memory.

Fixed some bugs in name assignment.

- Python
Published by iquasere over 4 years ago

mosca - Iterative binning for best binning

do_iterative_binning option now available! * Iterative binning cycles between MaxBin and CheckM - MaxBin obtains the bins, CheckM checks their quality * Iterative binning cycles by many probability thresholds to determine the value for the best binning

New option for differential expression - minimum_fold_change! * Determine padj for up or down expression, instead of just 0 difference

- Python
Published by iquasere over 4 years ago

mosca - Can now be installed from source code

Automatic setup from source code is now functional, and suggested installation method is through the bash script.

- Python
Published by iquasere over 4 years ago

mosca - Continuous Integration is set

But at what cost?

Install with mamba seems to be required now. No problem on that, just sad for conda MetaQUAST does not download references for now - in the future, it will become argument, so CI can happen on assembly

- Python
Published by iquasere almost 5 years ago

mosca - Can now run all parts without replicates (except DE analysis)

MOSCA cannot be run without replicates, but now the partial running of its functionalities is possible in all steps except for differential expression * added option --no-differential-expression to reporter script, set if haven't performed differential expression analysis

Memory is now read in Gb

Several fixes in reporter columns

- Python
Published by iquasere almost 5 years ago

mosca - Option for downloading CDD resources

New option added for configuration: download_cdd. * to be used if MOSCA has already been run once * removes the --download-resources parameter from reCOGnizer's command * also impacts the downloading of the other resources of reCOGnizer

- Python
Published by iquasere almost 5 years ago

mosca - Fixed mismanagement of read fixing

Reads fixed after rRNA removal were being trimmed in the last 2 characters of read name. This is now fixed.

Also, MOSCA now removes reads with less than 20 nucleotides after adapter removal, to avoid the spam from SortMeRNA.