data_steward_assignment

Repository accompanying the task given for the Data Steward position interview

https://github.com/thwacker/data_steward_assignment

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary

Keywords

assignment data fair-principles plink transpose
Last synced: 4 months ago · JSON representation ·

Repository

Repository accompanying the task given for the Data Steward position interview

Basic Info
  • Host: GitHub
  • Owner: ThWacker
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 127 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
assignment data fair-principles plink transpose
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme License Citation Codemeta

README.md

Data Steward Assignment

Repository accompanying the task given for the Data Steward position interview

Overview

Scripts are found in the scripts folder and can be run individually or through the Assignment_wrapper.sh bash script. These script manipulate PLINK .map and .ped files, manipulate them and produce tranposed .tped and .tfam files.

manipulate_ped_prefix_and_columns.py will manipulate the pep file in place to change the prefix of a column of choice, by default column 0 and the prefix "INCH_". It will also change the phenotype status in column 5 from not recorded (-9) to affected (2) in the case of even family IDs and unaffected (1) in the case of odd family IDs. This is currently not flexibly changeable.

ped_map_to_tped_tfam_transpose.py transposes the .ped file and .map file to a .tped and .tfam file. Note that .tped can also be generated by PLINK itself using the --recode transpose command. This does not generate a .tfam file though

Features

  • manipulates a column of choice in the .ped file to add a prefix to its values. NOTE: it does that in place, no new .pep file is generated
  • manipulates column 5 of the .ped file to have an altered phenotype status: 2 (affected) for even family IDs, 1 (unaffected) for uneven family IDs. It ignores any ID that is 27. NOTE: it does that in place, no new .pep file is generated
  • generates a .tped file from a .map and .ped file
  • generates a .tfam file from a .ped file and a .map file

Folder structure

All scripts are found in 'scripts'. The .ped and .map files are found in the 'Datastewardinterviewtask' folder. You can see example output .tped and .tfam files in 'Datastewardinterviewtask/example_output'

Usage

This is a work in progress. Currently it assumes you are running the scripts from the repo folder (the base folder) and that the scripts are in scripts. It makes no assumptions per se about where the .map and .ped files are.

Git clone the repo

git clone https://github.com/ThWacker/Data_Steward_Assignment.git

Change into the repo

cd ~/your_path_to_the_repo/Data_Steward_Assignment

Run the wrapper script like so (example):

./scripts/Assignment_wrapper.sh -m ./Data_steward_interview_task/My_SNPS.map -p ./Data_steward_interview_task/My_SNPS.ped

THANK YOU FOR CONSIDERING ME AS AN APPLICANT!

Owner

  • Name: Theresa Wacker
  • Login: ThWacker
  • Kind: user
  • Location: Exeter
  • Company: MRC Center for Medical Mycology

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'Data Steward Assignment '
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Theresa
    family-names: Wacker
    email: t.wacker2@exeter.ac.uk
    affiliation: 'University of Exeter '
    orcid: 'https://orcid.org/0000-0002-1789-2346'
license: MIT
version: v.1.0.0
date-released: '2025-02-13'

CodeMeta (codemeta.json)

{
  "@context": "https://w3id.org/codemeta/3.0",
  "type": "SoftwareSourceCode",
  "author": [
    {
      "id": "https://orcid.org/0000-0002-1789-2346",
      "type": "Person",
      "affiliation": {
        "type": "Organization",
        "name": "Bioscience, University of Exeter"
      },
      "email": "t.wacker2@exeter.ac.uk",
      "familyName": "Wacker",
      "givenName": "Theresa"
    }
  ],
  "codeRepository": "https://github.com/ThWacker/Data_Steward_Assignment",
  "dateCreated": "2025-02-13",
  "dateModified": "2025-02-13",
  "datePublished": "2025-02-13",
  "description": "This is the assignment for the Data Steward position at the University of Exeter. Please clone this repo and run the wrapper script.",
  "license": "https://spdx.org/licenses/MIT",
  "name": "Data Steward Assignment",
  "operatingSystem": "macOS and Linux",
  "programmingLanguage": "Python 3",
  "softwareRequirements": "Python 3.9 or above",
  "version": "0.1.0",
  "developmentStatus": "active",
  "issueTracker": "https://github.com/ThWacker/Data_Steward_Assignment/issues"
}

GitHub Events

Total
  • Push event: 11
Last Year
  • Push event: 11