https://github.com/dalmolingroup/leadnpc

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: dalmolingroup
License: gpl-3.0
Language: R
Default Branch: master
Size: 51.3 MB

Statistics

Stars: 0
Watchers: 6
Forks: 0
Open Issues: 0
Releases: 0

Created almost 6 years ago · Last pushed almost 6 years ago

https://github.com/dalmolingroup/LeadNPC/blob/master/

# LeadNPC
## Systems Biology-Based Analysis Indicates Global Transcriptional Impairment in Lead-Treated Human Neural Progenitor Cells

This repository contains the files necessary to reproduce the results reported in our analysis of lead-treated human neural progenitor cells. The *bin* folder contains the code used to generate analysis and figures. It's based on R software (version 3.5.1) and uses the libraries biomaRt, affy, data.table, dplyr, edgeR, factoextra, FactoMineR, ggplot2, ggrepel, grid, gridExtra, hgu95av2.db, igraph, limma, purrr, RColorBrewer, RCy3, readxl, RedeR, scales, stats, stringr, tibble, tidyr, topGO, and transcriptogramer (version 1.3.4).

The *Data*  folder contains intermediate data files generated by the pipeline. So you can start the pipeline [from scratch](#scratch), downloading the raw data files from the experiment, or just [run any phase](#any) using these files. Be sure to correct the path to data in the R scripts.

### Running the entire pipeline from scratch
In order, to run this pipeline you will need to download the  raw data sequence reads from Sequence Read Archive, accession SRP079342, Gene Expression Omnibus, [accession GSE84712](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE84712 "accession GSE84712"), get the [Ensembl GRCh38 Human genome reference](ftp://ftp.ensembl.org/pub/release-91/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz) and [annotation](ftp://ftp.ensembl.org/pub/release-91/gtf/homo_sapiens/Homo_sapiens.GRCh38.91.gtf.gz) (release 91), and have the software [Hisat2](http://ccb.jhu.edu/software/hisat2/dl/hisat2-2.1.0-Linux_x86_64.zip) and [FeatureCounts](https://sourceforge.net/projects/subread/files/subread-1.6.3/subread-1.6.3-Linux-x86_64.tar.gz) installed.

##### Create hisat2 index
> hisat2-build -p 8 ${genRefDir}"/Homo_sapiens.GRCh38.dna.primary_assembly.fa" ${genIndexDir}

##### Extract the splice sites
> extract_splice_sites.py ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" > ${genRefDir}"/Homo_sapiens.GRCh38.94.txt"

##### Generate the aligns
For each sample file do:
> hisat2 -x ${genIndexDir} --known-splicesite-infile ${genRefDir}"/Homo_sapiens.GRCh38.94.txt" -p 8 -1 ${sampleDir}/${file}_1".fastq.gz" -2 ${sampleDir}/${file}_2".fastq.gz"| samtools view -bS - > ${resultDir}/${file}".hisat.bam"; 

##### Count aligned reads 
First, generate the bam files list in a single line
> ls *.hisat.bam | tr '\n' ' '> bamList.txt

Then generate the count's file
> featureCounts -T 32  -t gene -g gene_id -a ${genRefDir}"/Homo_sapiens.GRCh38.94.gtf" -o ./allCountsHisat.txt $(cat bamList.txt)

Now you can use the R scripts below.

### Or take a shortcut, and start here...
Is recommended run [00base.R](./bin/00base.R) and [02analisePCA.R](./bin/02analisePCA.R) before try to run any other script. Those scripts will setup the R environment and create some necessary files.

#### Set up the enviroment and instaling the R packages 
> [00base.R](./bin/00base.R)


#### Create logCPM file
> [01ProcessCounts.R](./bin/01ProcessCounts.R)

#### PCA analysis and create transcriptogramer objects
Objects generated here will be necessary later. If you are using a more recent version of transcriptogramer than version 1.3.4, the results can be a little bit different, because it uses data from STRINGdb release 11.
> [02analisePCA.R](./bin/02analisePCA.R)

#### Plot transcriptogramer graphics
> [03plotTrancript.R](./bin/03plotTrancript.R)

#### Plot circlize graphics
> [04circosPlot.R](./bin/04circosPlot.R)


#### Cluster superposition analysis
> [05intersecClusters.R](./bin/05intersecClusters.R)

#### Create clusters graphos
This is not a completely automatic process. You will need to use Cytoscape manually.
> [06graphTemposManual.R](./bin/06graphTemposManual.R)

#### Generate Figure 2 components
This is not a completely automatic process. The complete figure was composed by hand using Inkscape.

##### Nodes
> [07BarNodes.R](./bin/07BarNodes.R)

##### Conectivity
> [08BarConect.R](./bin/08BarConect.R)

#### Generate Figures 3 and 4 graphos
This is not a completely automatic process. You will need to use RedeR and [Cytoscape](https://cytoscape.org/download.html) manually. For more information on how to connect Cytoscape and R, see [Cytoscape](https://cytoscape.org/) and [RCy3](https://bioconductor.org/packages/release/bioc/html/RCy3.html) documentation.
> [09GODendo.R](./bin/09GODendo.R)

#### Generate Figures 3 and 4 dendograms
This is not a completely automatic process. You will need to use RedeR and [Cytoscape](https://cytoscape.org/download.html) manually. For more information on how to connect Cytoscape and R, see [Cytoscape](https://cytoscape.org/) and [RCy3](https://bioconductor.org/packages/release/bioc/html/RCy3.html) documentation.
> [10graphoClustersRCy3.R](./bin/09graphoClustersRCy3.R)

#### Generate Markers figures
> [11Marquers.R](./bin/11Marquers.R)

#### Other analysis
Another auxiliary analysis was performed using several scripts placed inside *bin* folder.

Owner

Name: Dalmolin Systems Biology Group
Login: dalmolingroup
Kind: organization
Location: Natal, RN - Brazil

Website: dalmolingroup.imd.ufrn.br
Repositories: 5
Profile: https://github.com/dalmolingroup

Research group in Systems Biology at UFRN

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science