asap_crn_data_analysis_scripts

Scripts used in data analysis for ASAP CRN projects

https://github.com/mhammell-laboratory/asap_crn_data_analysis_scripts

Science Score: 52.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization mhammell-laboratory has institutional domain (hammelllab.labsites.cshl.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Scripts used in data analysis for ASAP CRN projects

Basic Info
  • Host: GitHub
  • Owner: mhammell-laboratory
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Size: 41 KB
Statistics
  • Stars: 1
  • Watchers: 5
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

ASAPCRNdataanalysisscripts

Overview

These are the scripts used in data analysis for ASAP CRN dataset, "Single nuclei sequencing of brain regions from healthy and Parkinson's Disease individuals", from Team Jakobsson. It comprises two sets of information: 1) Shell script to generate a Cell Ranger compatible reference database containing transposable element annotations 2) YAML file containing software, database and parameters used fo the Cell Ranger count runs to generate the snRNA count matrices

System Requirements & Dependencies

  • Cell Ranger software (tested on version 5.0.1)
  • wget (tested on version 1.19.5)
  • Standard Linux tools (e.g. awk, grep, zcat)

Since the pipeline requires the 10x Genomics Cell Ranger software, you will need to fulfil their system requirements

Cell Ranger pipelines run on Linux systems that meet these minimum requirements: - 8-core Intel or AMD processor (16 cores recommended), with support for instruction sets including at least SSE4.2. This includes Intel CPUs released since 2008 (Core i5/i7 or newer) and any AMD CPU since 2011. Future versions of Cell Ranger will require CPUs supporting AVX instructions; Intel and AMD CPUs have supported these since 2011 (Intel Xeon E3/E5 or newer). - 64GB RAM (128GB recommended). - 1TB free disk space. - 64-bit CentOS/RedHat 7.0 or Ubuntu 14.04

Installation

The script to generate the custom Cell Ranger reference database can be donwloaded and run without additional installation steps. The YAML file contains the command line parameters used to run the analysis using Cell Ranger, and require no installation steps.

Usage

Shell script

bash $ sh generate_custom_cell_ranger_database.sh This should generate a folder, GRCh38_GCv35_TE, containing the Cell Ranger custom reference database. bash $ ls GRCh38_GCv35_TE/ GRCh38_GCv35_TE.gtf GRCh38.primary_assembly.genome.fa.gz GRCh38_GENCODE_rmsk_TE_filtered.gtf GRCh38_GENCODE_rmsk_TE.gtf.gz gencode.v35.primary_assembly.annotation_CRfiltered.gtf gencode.v35.primary_assembly.annotation.allowed gencode.v35.primary_assembly.annotation.modified gencode.v35.primary_assembly.annotation.gtf.gz generate_custom_cell_ranger_database.sh

YAML

The YAML file contains the command to be used for Cell Ranger to generate the count matrices. For example bash $ cellranger count --id=ASAP1_SN --jobmode=local --localcores=16 --localmem=128 --transcriptome=GRCh38_GCv35_TE --fastqs=fastqs --sample=ASAP1_PD_NP16-162_SN --include-introns

Limitation

The custom databases generation script is specific to GENCODE annotation version 35. If you require another version, please modify the script to download the corresponding annotation.

Questions and issues

Please feel free to use the Issues page to post any questions or issues.

You can also contact mghcompbio@gmail.com if you have not received a response more than a week after posting on the Issues page.

Citation

Please cite this software using the citation file format file included with the repository. Further citations will be added upon publication of associated project.

Licence

This software is distributed under the MIT licence per ASAP Open Access (OA) policy, which facilitates the rapid and free exchange of scientific ideas and ensures that ASAP-funded research fund can be leveraged for future discoveries.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

A copy of this licence is included along with the software, and can be accessed here.

Acknowledgment

  • Contributors: Anita Adami, Talitha Forcier, Raquel Garza, Annelies Quagebeur, Yogita Sharma, Oliver Tam, Cole Wunderlich, Roger Barker, Molly Gale Hammell, Agnete Kirkeby and Johan Jakobsson

This research was funded in whole by Aligning Science Across Parkinson’s (ASAP-000520) through the Michael J. Fox Foundation for Parkinson’s Research (MJFF).

Owner

  • Name: MHammell Laboratory
  • Login: mhammell-laboratory
  • Kind: organization
  • Email: tam@cshl.edu
  • Location: Cold Spring Harbor Lab

The Hammell Lab uses algorithms to integrate high throughput sequencing data to model regulatory re-wiring events in human diseases.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: ASAP_CRN_data_analysis_scripts
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Oliver
    family-names: Tam
  - given-names: Molly
    family-names: Gale Hammell
    email: mghcompbio@gmail.com
repository-code: >-
  https://github.com/mhammell-laboratory/ASAP_CRN_data_analysis_scripts
abstract: >
  These are the scripts used in data analysis for ASAP CRN
  dataset, "Single nuclei sequencing of brain regions from
  healthy and Parkinson's Disease individuals", from Team
  Jakobsson. It comprises two sets of information:


  1) Shell script to generate a Cell Ranger compatible
  reference database containing transposable element
  annotations

  2) YAML file containing software, database and parameters
  used fo the Cell Ranger count runs to generate the snRNA
  count matrices
license: MIT

GitHub Events

Total
Last Year