https://github.com/alexslemonade/medulloblastoma-classifier
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: AlexsLemonade
- License: bsd-3-clause
- Language: HTML
- Default Branch: main
- Size: 35.2 MB
Statistics
- Stars: 0
- Watchers: 6
- Forks: 0
- Open Issues: 19
- Releases: 0
Metadata Files
README.md
medulloblastoma-classifier
Input data and models
The following table summarizes how different data types are handled by default. If a method applies additional transformations, that is captured in the next table.
| Dataset type | Baseline normalization/processing | |--------------|-----------------------------------| | Microarray | refine.bio processed with Single Channel Array Normalization, quantile normalization is skipped | | Bulk RNA-seq | TPM | | Smart-seq2 scRNA-seq | TPM | | 10X scRNA-seq | Counts |
This table summarizes the models used in this work, the packages from which they originate, and any transformations applied to the input gene expression measures.
| Model | Package | Additional transformations (if applicable) |
|-------|---------|-------|
| k-Top Scoring Pairs (kTSP) | multiclassPairs | N/A |
| Random Forest (RF) | multiclassPairs | N/A |
| MM2S (Gendoo and Haibe-Kains. 2016.) | MM2S | N/A |
| medulloPackage (Rathi et al. 2020.) | medulloPackage | All RNA-seq data is log2-transformed |
| LASSO Logistic Regression | glmnet | Each sample is scaled to sum to 1 |
Internal development guidelines
These guidelines are intended to be used by Data Lab members and collaborators.
Dependency management
Docker
We expect development to primarily occur within the project Docker container.
We use renv and conda as part of the build process, so please make use of those approaches when updating the Dockerfile (see sections below).
A GitHub Actions workflow builds and pushes the Docker image to the GitHub Container Registry any time the relevant environment files or Dockerfile are updated.
It also checks on pull requests that alter relevant files that the image can be built.
To pull the most recent copy of the Docker image, use the following command:
sh
docker pull ghcr.io/alexslemonade/medulloblastoma-classifier:latest
To run the container, use the following command from the root of this repository:
sh
docker run \
--mount type=bind,target=/home/rstudio/medulloblastoma-classifier,source=$PWD \
-e PASSWORD={PASSWORD} \
-p 8787:8787 \
ghcr.io/alexslemonade/medulloblastoma-classifier:latest
Be sure to replace {PASSWORD}, including the curly braces, with a password of your choice.
You can then access the RStudio at http://localhost:8787 using the username rstudio and the password you just set.
For Apple Silicon users, include the --platform linux/amd64 tag in the docker pull and docker run commands.
Managing R packages with renv
We manage R package dependencies using renv.
When you install additional packages, please update the lockfile with the following command:
r
renv::snapshot()
When prompted, respond y to save the new packages in your renv.lock file.
Commit the changes to the renv.lock file.
To pin any packages that are not automatically captured in the lockfile, you can add loading them to the dependencies.R file in the root of the repository.
Managing command-line tools and Python packages with Conda
We use Conda to manage command-line tools and Python packages.
To create and activate the environment, run the following from the root of the repository (requires conda-lock to be installed):
sh
conda-lock install --name medulloblastoma-classifier conda-lock.yml
conda activate medulloblastoma-classifier
To add new packages to the Conda environment, add them to environment.yml, and then update the conda-lock.yml file:
sh
conda-lock --file environment.yml
Pre-commit
We use pre-commit to make sure large files or secrets are not committed to the repository. The Conda environment contains pre-commit.
To setup the pre-commit hooks for this project, run the following from the root of the repository:
sh
pre-commit install
Additional hooks for local development
If you would like to add additional hooks to use locally (e.g., to style and lint R files), you can by creating and using a .pre-commit-local.yaml file like so:
```sh
make and activate a local pre-commit configuration
cp .pre-commit-config.yaml .pre-commit-local.yaml pre-commit install --config .pre-commit-local.yaml ```
.pre-commit-local.yaml is ignored by Git, so you can modify that file without affecting other contributors.
Data and model management
We use an S3 bucket (s3://data-lab-mb-ssp) with versioning enabled to manage the files in the following directories:
datamodelsprocessed_dataplots/dataresults
Which are all present in the .gitignore file.
To push files to S3, use the following command from the root of the repository:
sh
aws s3 sync {directory} s3://data-lab-mb-ssp/{directory}
Where {directory} should be one of: data, models, processed_data, plots/data, or results.
To pull files locally, use the following command from the root of the repository:
sh
aws s3 sync s3://data-lab-mb-ssp/{directory} {directory}
A non-exhaustive list of aws s3 sync flags that may be useful:
--delete: Delete files that exist in the destination that are not in the source.--dryrun: Performs a dry run without running the command.--profile: A profile from your credential file.--exclude: Exclude objects or files that match this pattern.--include: Don't exclude objects or files that match this pattern.
Owner
- Name: Alex's Lemonade Stand Foundation
- Login: AlexsLemonade
- Kind: organization
- Website: https://www.alexslemonade.org
- Repositories: 70
- Profile: https://github.com/AlexsLemonade
Childhood Cancer Data Lab of ALSF
GitHub Events
Total
- Issues event: 46
- Delete event: 32
- Issue comment event: 35
- Push event: 219
- Pull request event: 61
- Pull request review event: 85
- Pull request review comment event: 67
- Create event: 27
Last Year
- Issues event: 46
- Delete event: 32
- Issue comment event: 35
- Push event: 219
- Pull request event: 61
- Pull request review event: 85
- Pull request review comment event: 67
- Create event: 27
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 25
- Total pull requests: 40
- Average time to close issues: about 1 month
- Average time to close pull requests: 11 days
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.2
- Average comments per pull request: 0.15
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 25
- Pull requests: 40
- Average time to close issues: about 1 month
- Average time to close pull requests: 11 days
- Issue authors: 2
- Pull request authors: 3
- Average comments per issue: 0.2
- Average comments per pull request: 0.15
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jaclyn-taroni (20)
- envest (5)
Pull Request Authors
- jaclyn-taroni (31)
- envest (8)
- jashapiro (1)