cd2-survey

Code and processed data from the Connecting Data in Child Development (CD2) Vocabularies survey

https://github.com/dorienhuijser/cd2-survey

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary

Keywords

controlled metadata survey vocabularies

Last synced: 9 months ago · JSON representation ·

Repository

Code and processed data from the Connecting Data in Child Development (CD2) Vocabularies survey

Basic Info

Host: GitHub
Owner: DorienHuijser
License: mit
Language: R
Default Branch: master
Homepage: https://www.dorienhuijser.com/cd2-survey
Size: 3.11 MB

Statistics

Stars: 0
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 1

Topics

controlled metadata survey vocabularies

Created over 4 years ago · Last pushed almost 3 years ago

Metadata Files

Readme License Citation

Vocabularies Survey - Connecting Data in Child Development (CD2)

This repository contains the code to process raw data from the CD2 vocabularies survey for each participating cohort study into a much more readable and useful format. It also contains the results, but for privacy reasons (email addresses, IP addresses and demographic information), the raw survey data is not published here.

Below you will find what this survey is about and how the data are processed.

You can read the script in html-form here. The script itself is written in Rmd and can be found here

Table of contents * About the CD2 project * About the CD2 vocabularies survey * Structure of the survey * Structure of the datafile * Functionality of this code * Dependencies * License * Contributing and contact

About the CD2 project

Connecting Data in Child Development (CD2) is an infrastructure project funded by the Platform Digitale Infrastructuur - Social Sciences and Humanities (PDI-SSH). The project aims to harmonize the metadata from 6 Dutch developmental child cohort studies within the Consortium on Individual Development (CID). The end result of this harmonization is an online portal where one can find, among others, what data was collected in these cohort studies and how to get access to them (of note, the portal does not contain actual data). Additionally, the metadata underlying this web portal should be findable not only through the web portal itself, but also through existing infrastructures such as ODISSEI and HEALTH-RI. You can read the full project description here.

About the CD2 vocabularies survey

As part of making the CID metadata findable, the individual measures are labelled with keywords and categories. Because there is no fully suitable controlled vocabulary available that fits the wealth of data in CID, existing vocabularies are complemented with our own. To create such a vocabulary, input from the entire CID community was needed to help us determine the relevant keywords and categories that researchers would use to search for all the different types of data within CID. The CD2 vocabularies survey therefore asked respondents (researchers) to 1) provide keywords and 2) choose relevant categories for a subset of experiments within their own cohort.

Structure of the survey

The full survey can be found in CD2_vocabularies_survey_questions.pdf in the assets folder of this repository. Importantly, depending on their cohort, respondents answered 2 questions which were repeated for 25 measures of their cohort (e.g., experiments, questionnaires, etc.) using Qualtrics's Loop and Merge functionality:

Which keywords would you assign to this measure? Please separate your keywords with a comma (open text question)
Choose one or multiple categories that you think fit best and rank them according to their relevance using numbers (1 = most relevant, 2 = second most relevant, etc.). (ranking question with 3 additional text fields in which custom categories could be provided)

Structure of the datafile

The data file resulting from the Qualtrics survey is an extremely wide datafile, because for each of the 6 cohorts, there was a total of around 80-200 measures (dependent on the cohort) that could be shown. Although the survey for an individual respondent would only show a random selection of 25 of these, the resulting datafile contains them all and is > 13000 columns wide. Not exactly readable!

The datafile contains 3 main types of variables:

Keywords question: [instrument_number]_[cohort]_Keywords: Keywords string response separated by commas.
Category rankings: [instrument_number]_[cohort]_Cat_[category-number]: The response is a number delineanating the priority given to the category.
Category custom categories: [instrument_number]_[cohort]_Cat_1[2/3/4]_TEXT: A string indicating the custom category that was provided.

Functionality of this code

The code can be found in the src folder and does the following:

Set the parameters (e.g., filename, additional intrument numbers files, cohort names, etc.).
Read in the data.
Create mappings: lists of numbers and instrument names.
For each cohort, put the data from the Keywords question in a flat, usable format.
For each cohort, put the data from the Category ranking question in a flat, usable format.
For each cohort, combine the processed data from the Keywords and Category ranking questions into one processed datafile. These can be found in data/processed.

Dependencies

The code is located in the CD2-vocabularies-survey-v1.3.Rmd file located in the src folder. You need the following to run the code:

The raw data (not included in this repository)
The instrument number files as used during the survey (located in assets/instrumentnrs)
R, RStudio and git
The R packages rmarkdown, data.table, tidyverse, wordcloud2, webshot, and htmlwidgets

All relevant files are read in by the code. Because the code is R Markdown, you can find a lot of explanation about how the code works in there as well.

Feel free to reuse this code by cloning the repository. Warning: you will most likely have to adapt the code tremendously, since the code is currently tailored towards this specific use case.

License

This project is licensed under the terms of the MIT License

Contributing and contact

This repository is not actively maintained at the moment. However, if you see a bug, feel free to open an issue or a pull request in this repository. Alternatively, feel free to email me for comments or questions.

Owner

Name: Dorien Huijser
Login: DorienHuijser
Kind: user
Location: The Netherlands
Company: @UtrechtUniversity

Website: www.dorienhuijser.com
Twitter: DorienHuijser
Repositories: 4
Profile: https://github.com/DorienHuijser

Git beginner, creating mostly documentation | Research Data Management | Open science | The Netherlands

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: CD2 vocabularies survey
message: 'When reusing this work, please cite as folows.'
type: software
authors:
  - given-names: Dorien
    name-particle: ~
    family-names: Huijser
    email: d.c.huijser@uu.nl
    affiliation: Utrecht University
    orcid: 'https://orcid.org/0000-0003-3282-8083'
url: https://github.com/DorienHuijser/cd2-survey
abstract: >-
  This repository contains the code to process raw
  data from the Connecting Data in Child Development
  (CD2) project's vocabularies survey for each
  participating cohort study into a much more
  readable and useful format. The goal of this survey
  is to develop a harmonized vocabulary that can be
  used in the to be built CD2 metadata portal, where
  the data from 6 developmental child cohorts will be
  represented.
keywords:
  - vocabularies
  - metadata
  - survey
license: MIT

GitHub Events

Total

Last Year

Committers

Last synced: about 1 year ago

All Time

Total Commits: 66
Total Committers: 2
Avg Commits per committer: 33.0
Development Distribution Score (DDS): 0.045

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
DorienHuijser	d**r@o**m	63
Huijser	d**r@u**l	3

Committer Domains (Top 20 + Academic)

uu.nl: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 1
Total pull requests: 3
Average time to close issues: over 1 year
Average time to close pull requests: 1 minute
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

cd2-survey

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Vocabularies Survey - Connecting Data in Child Development (CD2)

About the CD2 project

About the CD2 vocabularies survey

Structure of the survey

Structure of the datafile

Functionality of this code

Dependencies

License

Contributing and contact

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels