data_steward_assignment
Repository accompanying the task given for the Data Steward position interview
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Keywords
Repository
Repository accompanying the task given for the Data Steward position interview
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Data Steward Assignment
Repository accompanying the task given for the Data Steward position interview
Overview
Scripts are found in the scripts folder and can be run individually or through the Assignment_wrapper.sh bash script.
These script manipulate PLINK .map and .ped files, manipulate them and produce tranposed .tped and .tfam files.
manipulate_ped_prefix_and_columns.py will manipulate the pep file in place to change the prefix of a column of choice, by default column 0 and the prefix "INCH_". It will also change the phenotype status in column 5 from not recorded (-9) to affected (2) in the case of even family IDs and unaffected (1) in the case of odd family IDs. This is currently not flexibly changeable.
ped_map_to_tped_tfam_transpose.py transposes the .ped file and .map file to a .tped and .tfam file. Note that .tped can also be generated by PLINK itself using the --recode transpose command. This does not generate a .tfam file though
Features
- manipulates a column of choice in the .ped file to add a prefix to its values. NOTE: it does that in place, no new .pep file is generated
- manipulates column 5 of the .ped file to have an altered phenotype status: 2 (affected) for even family IDs, 1 (unaffected) for uneven family IDs. It ignores any ID that is 27. NOTE: it does that in place, no new .pep file is generated
- generates a .tped file from a .map and .ped file
- generates a .tfam file from a .ped file and a .map file
Folder structure
All scripts are found in 'scripts'. The .ped and .map files are found in the 'Datastewardinterviewtask' folder. You can see example output .tped and .tfam files in 'Datastewardinterviewtask/example_output'
Usage
This is a work in progress. Currently it assumes you are running the scripts from the repo folder (the base folder) and that the scripts are in scripts. It makes no assumptions per se about where the .map and .ped files are.
Git clone the repo
git clone https://github.com/ThWacker/Data_Steward_Assignment.git
Change into the repo
cd ~/your_path_to_the_repo/Data_Steward_Assignment
Run the wrapper script like so (example):
./scripts/Assignment_wrapper.sh -m ./Data_steward_interview_task/My_SNPS.map -p ./Data_steward_interview_task/My_SNPS.ped
THANK YOU FOR CONSIDERING ME AS AN APPLICANT!
Owner
- Name: Theresa Wacker
- Login: ThWacker
- Kind: user
- Location: Exeter
- Company: MRC Center for Medical Mycology
- Twitter: theresa_wacker
- Repositories: 1
- Profile: https://github.com/ThWacker
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: 'Data Steward Assignment '
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Theresa
family-names: Wacker
email: t.wacker2@exeter.ac.uk
affiliation: 'University of Exeter '
orcid: 'https://orcid.org/0000-0002-1789-2346'
license: MIT
version: v.1.0.0
date-released: '2025-02-13'
CodeMeta (codemeta.json)
{
"@context": "https://w3id.org/codemeta/3.0",
"type": "SoftwareSourceCode",
"author": [
{
"id": "https://orcid.org/0000-0002-1789-2346",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "Bioscience, University of Exeter"
},
"email": "t.wacker2@exeter.ac.uk",
"familyName": "Wacker",
"givenName": "Theresa"
}
],
"codeRepository": "https://github.com/ThWacker/Data_Steward_Assignment",
"dateCreated": "2025-02-13",
"dateModified": "2025-02-13",
"datePublished": "2025-02-13",
"description": "This is the assignment for the Data Steward position at the University of Exeter. Please clone this repo and run the wrapper script.",
"license": "https://spdx.org/licenses/MIT",
"name": "Data Steward Assignment",
"operatingSystem": "macOS and Linux",
"programmingLanguage": "Python 3",
"softwareRequirements": "Python 3.9 or above",
"version": "0.1.0",
"developmentStatus": "active",
"issueTracker": "https://github.com/ThWacker/Data_Steward_Assignment/issues"
}
GitHub Events
Total
- Push event: 11
Last Year
- Push event: 11