dataloading-curation

Data curation files for IPA (iReceptor Public Archive) - includes sample metadata template file

https://github.com/sfu-ireceptor/dataloading-curation

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Data curation files for IPA (iReceptor Public Archive) - includes sample metadata template file

Basic Info

Host: GitHub
Owner: sfu-ireceptor
License: lgpl-3.0
Default Branch: production-v4
Homepage:
Size: 78.2 MB

Statistics

Stars: 2
Watchers: 5
Forks: 1
Open Issues: 0
Releases: 1

Created over 8 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Contributing License Code of conduct Citation

iReceptor Data Curation

This GIT repository contains example files and documentation for loading data into iReceptor repositories. Examples for metadata as well as rearrangement files for a number of widely used annotation tools are provided. The README files in each of the subfolders contain more documentation. The Zenodo link for this release is here:

For more information on Repertoire metadata curation, please refer to: * The iReceptor Metadata documentation * The iReceptor metadata example * The AIRR repertoire metadata example

For more details on Rearrangement data curation, please refer to: * The test data set documentation * The AIRR Rearrangement format (including igblast) example * The MiXCR Rearrangement format example * The IMGT V-QUEST Rearrangement example

For more details on Clone data curation, please refer to: * The AIRR Clone format example * The MiXCR Clone format example

For more details on Cell and Expression (GEX) data curation, please refer to: * The AIRR Cell format example

The iReceptor Data Curation process

The iReceptor team follows a relatively strict data curation process. This process is documented on the iReceptor Curation page. We do not discuss this process in detail here, but instead suggest simple processes that can make data curation easier to manage.

The iReceptor curation process is focused around the curation of data for a single study. As such, we recommend that all data that is being curated for a specific study be stored in a single directory. As an example, we will use one of the IMGT example data sets.

As mentioned, we recommended that all files relevant to the curation of data from a single study be located in a single directory. This would include the Repertoire Metadata file for the study as well as all of the Rearrangement, Clone, Cell, and Expression files for each Repertoire. In the case of the IMGT example, this includes a single metadata file (PRJNA248411Palanichamy2018-12-18.csv). We tend to structure the metadata file name using the studies Study ID from NCBI, the principal (or contact) author, and the date the file was last modified. In addition, for each of the 8 sample repertoires in the study, in this case there is a single IMGT annotation file. Again, we use the NCBI accession number for the file in the filename to help manage the data. Note that it is possible to have more than one file for a single repertoire. Both the Repertoire Metadata file and the iReceptor Data Loader support having multiple files per repertoire sample.

Given the above structure, it is quite simple to use the iReceptor Data Loading code to load AIRR-seq data in such a form. Please refer to the iReceptor Turnkey Documentation for examples on how to load these data sets.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

dataloading-curation

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

iReceptor Data Curation

The iReceptor Data Curation process

Owner

GitHub Events

Total

Last Year