https://github.com/aureme/aucome

Automatic Comparison of Metabolism

Keywords from Contributors

metabolic-network function-estimation protein-sequences proteomes taxon uniprot

Last synced: 7 months ago · JSON representation

Repository

Automatic Comparison of Metabolism

Basic Info

Host: GitHub
Owner: AuReMe
License: gpl-3.0
Language: Python
Default Branch: master
Homepage:
Size: 11.2 MB

Statistics

Stars: 6
Watchers: 1
Forks: 0
Open Issues: 2
Releases: 3

Created almost 7 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License

README.rst

.. image:: https://img.shields.io/pypi/v/aucome.svg
	:target: https://pypi.python.org/pypi/aucome
.. image:: https://img.shields.io/github/license/AuReMe/metage2metabo.svg
	:target: https://github.com/AuReMe/aucome/blob/master/LICENSE
.. image:: https://img.shields.io/badge/doi-10.1101/gr.277056.122-blueviolet.svg
	:target: https://doi.org/10.1101/gr.277056.122

AuCoMe: Automatic Comparison of Metabolism
==========================================

**WORK IN PROGRESS** Workflow to reconstruct multiple metabolic networks in order to compare them.

.. contents:: Table of contents
   :backlinks: top
   :local:

License
--------
This workflow is licensed under the GNU GPL-3.0-or-later, see the `LICENSE `__ file for details.

Installation
------------

Dependencies
~~~~~~~~~~~~

These tools are needed:

	- `Exonerate `__

	- `Orthofinder `__ (which needs `Diamond `__, `FastME `__, and `MMseqs2 `__)

	- `Pathway Tools `__ (which needs `Blast `__)

	- `R `__

And some python packages:

	- `matplotlib `__

	- `mpwt `__

	- `padmet `__

	- `rpy2 `__

	- `seaborn `__

	- `supervenn `__

	- `tzlocal `__

Installation of Pathway Tools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To run annotation based reconstruction, you need to install Pathway Tools. This tool is 
available at the `Pathway Tools `__ website. A 
command in the package install the tool:

.. code:: sh

        aucome --installPWT=path/to/pathway/tools/installer
	source ~/.bashrc

You can also provide an option to this commande: --ptools=ptools_path


This option let you choose the path where the ptools-local folder will be installed. PGDBs 
created by `Pathway Tools `__ are stored in this 
folder.


Getting the MetaCyc PADMET file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You also should install the MetaCyc_XX.X.padmet (the version number of 
`MetaCyc `__  is replaced with XX.X), and then you should update your 
config.txt files for each study. This is the way to 
getting a MetaCyc_XX.padmet file: Firstly, download the flat files of 
`MetaCyc `__ in DAT format at the
`https://biocyc.org/download.shtml `__ webpage. Secondly, 
put all the downloaded DAT files in a directory (it is named FLAT_DIR here). Thirdly run this 
command:

.. code:: sh

	padmet pgdb_to_padmet --pgdb=FLAT_DIR --output=metacyc_XX.X.padmet --version=XX.X --db=Metacyc -v


Docker
~~~~~~

1. From git repository, download the Dockerfile in recipes/.

2. Install docker and docker.io if is not done yet.

3. Build the AuCoMe Docker image, like this: 

.. code:: sh

		docker build -t aucome .

4. Enter in the Aucome Docker image.

5. Install Pathway Tools and metacyc_XX.padmet.

6. Run the Aucome commands.


Singularity
~~~~~~~~~~~

You need to have a pathway tools installer on the same path as the recipe.

From git repository:

.. code:: sh

	sudo singularity build aucome.sif Singularity

If you have the issue:

.. code:: sh

	FATAL:   While performing build: while creating squashfs: create command failed: exit status 1: Write failed because No space left on device
	FATAL ERROR: Failed to write to output filesystem

It is because Singularity has not enough space in its temporary folder due to the size of the
tools needed by aucome. You can modify manually this path using the ``SINGULARITY_TMPDIR`` 
variable (the temporary folder must exist), for example:

.. code:: sh

	sudo SINGULARITY_TMPDIR=/home/user/tmp_folder singularity build  aucome.sif Singularity

Then you can run the container with command like:

.. code:: sh

	singularity run  aucome.sif aucome workflow --run data  --filtering --cpu 10

But using only these commands can produce errors due to the compartmentalization of singularity.
So it is better to use the ``-c`` to avoid sharing filesystem with host.
And the ``-B`` allows to give a shared folder between the host and the singularity container 
so Singularity can also access to the data in the host.

.. code:: sh

	singularity run -c -H /path/outside/singularity/to/shared:/path/in/singularity/container aucome.sif aucome workflow --run /path/in/singularity/container/data  --filtering --cpu 10


pip
~~~

If you have installed all the dependencies, you can just install acuome with:

.. code:: sh

	pip install aucome

Usage
-----

Initialization
~~~~~~~~~~~~~~

You have to create the working folder for AuCoMe, with the --init argument:

.. code:: sh

    aucome --init=run_ID [-v]

This command will create a folder name "run_ID" inside the working folder. In this "run_ID"
folder, the command will create all the folders used during the analysis.

.. code-block:: text

	run_ID
	├── analysis
		├── group_template.tsv
		├──
	├── annotation_based
		├── PADMETs
			├──
		├── PGDBs
			├──
		├── SBMLs
			├──
	├── config.txt
	├── logs
		├──
	├── networks
		├── PADMETs
			├──
		├── SBMLs
			├──
	├── orthology_based
		├── 0_Orthofinder_WD
			├── OrthoFinder
		├── 1_sbml_orthology
		├── 2_padmet_orthology
		├── 3_padmet_filtered
	├── structural_check
		├── 0_specifics_reactions
		├── 1_blast_results
			├── analysis
			├── tmp
		├── 2_reactions_to_add
		├── 3_PADMETs
	├── studied_organisms
		├──

**analysis** will store the various analysis of the 
`PADMET `__ files which 
are in the networks folder.

**annotation_based** includes three subfolders. The PGDBs folder will contain all the results 
from `Pathway Tools `__ (in DAT format). These results
will also be stored in `PADMET `__ 
and `SBML `__ files inside PADMETs and SBMLs.

**config.txt** contains numerous paths used by the script: paths to programs, directories and 
databases. It also inclues the `Pathway Tools `__ 
and `MetaCyc `__  versions. 

**networks** will contain one metabolic network per studied organism, created thanks to AuCoMe,
in `PADMET `__ and 
`SBML `__ formats that are stored into two
directories (PADMETs and SBMLs). It also includes the panmetabolism of all the studied 
organisms in `PADMET `__
and `SBML `__ format. 

**orthology_based** contains four subfolders. Firstly the 0_Orthofinder_WD directory folder 
will include all the run of `Orthofinder `__. 
Secondly, the 1_sbml_orthology folder will contain one subdirectory per studied organims, and 
each subfolders include `SBML `__  files with the
orthogroups of other species that `OrthoFinder `__ 
found. Thirdly, the 2_padmet_orthology directory will contain the 
`PADMET `__ files created 
with the orthology step. Fourthly, the 3_padmet_filtered folder will contain 
`PADMET `__ files created
thanks to the orthology step, but in this subfolder only the robust reactions are kept in 
these `PADMET `__ files.  

**structral_check** relies on the search on the genomes for missing Gene-Proteins-Reactions 
associations. All the metabolic networks previously created are be pairwise compared. If one 
metabolic network has a Gene-Protein-Reaction association that another one has not, a genomic 
search will be performed between both genomes corresponding with the both metabolic networks.
Gene-Protein-Reaction associated with the first metabolic network will be used to search for 
match with the genome sequence corresponding with of the second metabolic network.
It contains four subdirectories. Firstly 0_specifics_reactions folder will include numerous 
TSV files with lists of Gene-Protein-Reaction associations that are present in a metabolic 
network and that are absent in another metabolic network. Secondly, the 1_blast_results 
directory will contain the search results between genomes of studied organisms and selected 
genes in the previous TSV files. Here orther TSV files will also be created with another format. These TSV 
files will include the results of genomic search programs. 
`BlastP `__, `TblastN `__, 
and `Exonerate `__ are 
used as genomic search programs. Thirdly the 2_reactions_to_add folder will contain a PADMET 
form with the reactions to add for each studied organisms. Fourthly, the 3_PADMETs will include
the `PADMET `__ files 
created with the structural step.

**studied_organisms**: you put all the species that you want to study in this folder. For each 
species, you create a folder and in this folder you put the 
`GenBank `__ file of this species. Each
files and folders must have the same name. Then, the 
`GenBank `__ file must end with a 
'.gbk'.

.. code-block:: text

	├── studied_organisms
		├── species_1
			├── species_1.gbk
		├── species_2
			├── species_2.gbk
		├── species_3
			├── species_3.gbk

.. warning:: Remember to check the versions of `Pathway Tools `__ and `MetaCyc `__ before running the check command. 

Once you have put your species in the studied_organisms folder, a check must be done on the data using:

Check command
~~~~~~~~~~~~~

.. code:: sh

    aucome check --run=ID [--cpu=INT] [-v] [--vv]

This command will check if there is no character that will cause trouble. It will also create
the proteome `FASTA `__ file from 
the `GenBank `__. Also, this command
will fill the 'all' row of analysis/group_template.tsv, with all the species from the 
studied_organisms folder. And for the annotation_based folder, if PGDBs contains folder, it 
will create the `PADMET `__
and the `SBML `__ corresponding to these draft in 
PADMETs and SBMLs folders.

Reconstruction command
~~~~~~~~~~~~~~~~~~~~~~~

A run of `Pathway Tools `__ can be launched using
the command:

.. code:: sh

    aucome reconstruction --run=ID [--cpu=INT] [-v] [--vv]

.. code-block:: text

	├── annotation_based
		├── PADMETs
			├── output_pathwaytools_species_1.padmet
			├── output_pathwaytools_species_2.padmet
			├── output_pathwaytools_species_3.padmet
		├── PGDBs
			├── species_1
				├── PGDB dat files
				├── ...
			├── species_2
				├── PGDB dat files
				├── ...
				├── species_3
				├── PGDB dat files
				├── ...
		├── SBMLs
			├── output_pathwaytools_species_1.sbml
			├── output_pathwaytools_species_2.sbml
			├── output_pathwaytools_species_3.sbml
	├── logs
		├── log_error.txt
		├── resume_inference.tsv

Using the package `mpwt `__, it will create the input file for
`Pathway Tools `__ inside studied_organisms/ directory.
Then, for each species that has correctly run in 
`Pathway Tools `__, a species/ directory is created 
inside annotation_based/PGDBs/ which containing all the DAT files of the draft metabolic 
network; two other files will also be written: output_pathwaytools_species.padmet (in 
annotation_based/PADMETs/) and output_pathwaytools_species.sbml (inside annotation_based/SBMLs).
At the end of the reconstruction step, the resume_inference.tsv file will be generated too. 
This file is useful to detect which species were not correctly run with 
`Pathway Tools `__.

Orthology command
~~~~~~~~~~~~~~~~~

Orthofinder can be launched using:

.. code:: sh

	aucome orthology --run=ID [-S=STR] [--orthogroups] [--cpu=INT] [-v] [--vv] [--filtering] [--threshold=FLOAT]

.. code-block:: text

	├── orthology_based
		├── 0_Orthofinder_WD
			├── species_1.faa
			├── species_2.faa
			├── species_3.faa
			├── OrthoFinder
				├── Results_MonthDay
					├── Orthogroups
					├── Orthologues
					├── ..
		├── 1_sbml_orthology
			├── species_1
				├── output_orthofinder_from_species_2.sbml
				├── output_orthofinder_from_species_3.sbml
			├── species_2
				├── output_orthofinder_from_species_1.sbml
				├── output_orthofinder_from_species_3.sbml
			├── species_3
				├── output_orthofinder_from_species_1.sbml
				├── output_orthofinder_from_species_2.sbml
		├── 2_padmet_orthology
			├── species_1.padmet
			├── species_2.padmet
			├── species_3.padmet
		├── 3_padmet_filtered
			├── propagation_to_remove.tsv
			├── reactions_to_remove.tsv
			├── species_1.padmet
			├── species_2.padmet
			├── species_3.padmet

Then the proteome from the studied organisms and from the models will be moved to the Orthofinder_WD folder and orthofinder will be launch on them. Orthofinder result will be in this folder and in orthology_based, there will be all the metabolic network reconstructed from orthology.

Structural command
~~~~~~~~~~~~~~~~~~

To assure that no reactions are missing due to missing gene structures a genomic search is performed for all reactions appearing in one organism but not in another.

.. code:: sh

    aucome structural --run=ID [--keep-tmp] [--cpu=INT] [-v]

.. code-block:: text

	├── structural_check
		├── 0_specifics_reactions
			├── species_1_VS_species_2.tsv
			├── species_1_VS_species_3.tsv
			├── species_2_VS_species_1.tsv
			├── species_2_VS_species_3.tsv
		├── 1_blast_results
			├── analysis
				├── species_1_VS_species_2.tsv
				├── species_1_VS_species_3.tsv
				├── species_2_VS_species_1.tsv
				├── species_2_VS_species_3.tsv
			├── tmp
		├── 2_reactions_to_add
			├── species_1.tsv
			├── species_2.tsv
			├── species_3.tsv
		├── 3_PADMETs
			├── species_1.padmet
			├── species_2.padmet
			├── species_3.padmet


Spontaneous command
~~~~~~~~~~~~~~~~~~~

In this command, spontaneous reactions will be added to each metabolic network, if they complete at least one `MetaCyc `__ pathway. Then you can spontaneous all the metabolic network with:

.. code:: sh

    aucome spontaneous --run=ID [--cpu=INT] [-v] [--vv]

.. code-block:: text

	├── networks
		├── PADMETs
			├── species_1.padmet
			├── species_2.padmet
			├── species_3.padmet
		├── panmetabolism.padmet
		├── panmetabolism.sbml
		├── SBMLs
			├── species_1.sbml
			├── species_2.sbml
			├── species_3.sbml

This will output the result inside the networks folder.

Workflow command
~~~~~~~~~~~~~~~~

You can launch the all workflow with the command:

.. code:: sh

    aucome workflow --run=ID [-S=STR] [--orthogroups] [--keep-tmp] [--cpu=INT] [-v] [--vv] [--filtering] [--threshold=FLOAT]

Analysis command
~~~~~~~~~~~~~~~~

You can launch group analysis with the command:

.. code:: sh

    aucome analysis --run=ID [--cpu=INT] [--pvclust] [-v]

You must write the groups of species that you want to analyze in the analysis/group_template.tsv file:
The first line of the file contains 'all' (it will launch the analysis on all the species).

When you create the repository with --init, the file will only contain 'all' row:

+--------------+------------+-------------+--------------+--------------+
|   all        |            |             |              |              |
+--------------+------------+-------------+--------------+--------------+

After the check (with check or workflow command), it will add all the species that you have in your studied_organisms folder:

+--------------+------------+-------------+--------------+--------------+
|   all        | species_1  | species_2   | species_3    | species_4    |
+--------------+------------+-------------+--------------+--------------+

Then you can create a new row to add another group. The name of the group is in the first column. Then for each species you add a column with the species name.
You must at least give 2 species.

Example:

+--------------+------------+-------------+--------------+--------------+
|   all        |species_1   | species_2   | species_3    | species_4    |
+--------------+------------+-------------+--------------+--------------+
|   group_1    | species_1  | species_2   |              |              |
+--------------+------------+-------------+--------------+--------------+
|   group_2    | species_1  | species_2   | species_4    |              |
+--------------+------------+-------------+--------------+--------------+

This script will create one folder for each group:

.. code-block:: text

	├── analysis
		├── group_template.tsv
		├── all
			├──
		├── group_1
			├──
		├── group_2
			├──

Compare command
~~~~~~~~~~~~~~~~

You can launch group analysis with the command:

.. code:: sh

    aucome compare --run=ID [--cpu=INT] [-v]

This script will read the group_template.tsv file and create a folder containing an upset graph comparing the group that you selected:

.. code-block:: text

	├── analysis
		├── group_template.tsv
		├── upgset_graph
			├── genes.csv
			├── Intervene_upset.R
			├── Intervene_upset.svg
			├── Intervene_upset_combinations.txt
			├── metabolites.csv
			├── pathways.csv
			├── reactions.csv
			├── tmp_data
				├──

Owner

Name: AuReMe
Login: AuReMe
Kind: organization

Website: http://aureme.genouest.org/
Repositories: 7
Profile: https://github.com/AuReMe

AUtomated REconstruction of MEtabolic models

GitHub Events

Total

Issues event: 1
Watch event: 2
Push event: 4

Last Year

Issues event: 1
Watch event: 2
Push event: 4

Committers

Last synced: about 3 years ago

All Time

Total Commits: 238
Total Committers: 6
Avg Commits per committer: 39.667
Development Distribution Score (DDS): 0.374

Top Committers

Name	Email	Commits
Arnaud Belcour	a**r@i**r	149
AITE Meziane	m**e@i**r	45
Arnaud Belcour	a**r@g**m	26
Jeanne GOT	j**t@i**r	16
Arnaud Belcour	1**r@u**m	1
Jeanne GOT	4**t@u**m	1

Committer Domains (Top 20 + Academic)

irisa.fr: 2 inria.fr: 1

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 2
Total pull requests: 3
Average time to close issues: over 1 year
Average time to close pull requests: about 1 month
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 3.5
Average comments per pull request: 0.33
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

NeolithEra (1)
glucksfall (1)
WANGchuang715 (1)

Pull Request Authors

jeannegot (1)
dependabot[bot] (1)
mezianeAITE (1)

Top Labels

Issue Labels

bug (1)

Pull Request Labels

dependencies (1)

Packages

Total packages: 1
Total downloads:
- pypi 45 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 3
Total maintainers: 3

pypi.org: aucome

Automatic Comparison of Metabolism

Homepage: https://github.com/aureme/aucome
Documentation: https://aucome.readthedocs.io/
License: GPLv3+
Latest release: 0.5.1
published about 4 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 45 Last month

Rankings

Dependent packages count: 10.1%

Dependent repos count: 21.5%

Stargazers count: 27.8%

Average: 29.5%

Forks count: 29.8%

Downloads: 58.2%

Maintainers (3)

ARNb jgot meziane.a

Last synced: 8 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aureme/aucome

Science Score: 46.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.rst

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: aucome

Rankings

Maintainers (3)

Dependencies