intro-to-research-data-management-carpentries
Intro to Research Data Management
https://github.com/katiebuntic/intro-to-research-data-management-carpentries
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.8%) to scientific vocabulary
Repository
Intro to Research Data Management
Basic Info
- Host: GitHub
- Owner: katiebuntic
- License: other
- Default Branch: main
- Homepage: https://katiebuntic.github.io/intro-to-research-data-management-carpentries/
- Size: 7.34 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
Introduction to Research Data Management Lesson
🚧 This lesson is under development.
Visit the lesson built from this repository.
Lesson Description
This lesson aims to teach those just starting to undertake research how to manage their data and files.
Target Audience
- Masters/PhD/Postdoc researchers at the beginning of their projects.
- Basic digital skills required (e.g., file management, Excel, some version control exposure).
- No programming experience necessary.
Prerequisites
- Basic Excel use (open/save tables)
- File/folder management on a computer
- A research project or dataset in progress
Learning Objectives
After completing this course, the learners should be able to:
- Define research data and distinguish between different data types.
- Structure research materials using clear file naming conventions and a logical folder hierarchy
- Describe methods of data collection that make data cleaner and easier to analyse
- Detect inconsistencies and errors in a tabular dataset ("dirty data")
- Use a set of basic techniques to remove/correct errors and inconsistencies in tabular data ("cleaning data")
- Use version control to track different versions of files, and switch between them.
Maintainer(s)
Current Maintainers of this lesson are:
- Victoria Yorke-Edwards (@vyorkeedwards)
- Kimberly Meechan (@K-Meech)
- Katie Buntic (@katiebuntic)
Dataset & Narrative
Dataset: MET Dataset, this is a subset of the original dataset.
- Link to original dataset: https://github.com/metmuseum/openaccess
- Size:
- Types: CSV file including string and numerical data
- Requires noise/messiness injection for teaching
- Licensing: CC0 1.0 Universal
Episodes
🚧 This needs some work.
1. What is Research Data?
- Data types
- Sources of data
- What is research data management (collection, storage, organisation, sharing, etc)?
Need to write objectives
2. Structuring research materials
- Naming conventions
- Folder structures
- Version Control
- Introduction to version control software, Git/ Github
Objectives
After following this episode, learners will be able to:
- Organise their research data into a standard folder structure
- Name files with a consistent naming convention
- Understand why version control is important, and how to incorporate this into your naming conventions
- Explain why version control software such as Git/GitHub can be useful for certain types of data.
3. Tabular data collection
- Have a look at a 'dirty' data set
- Is there a standard set of responses?
- Is it free text?
- How do you control what data is being collected?
- Asking the right questions
- Data dictionaries
Objectives
After following this episode, learners will be able to:
- List variable types and formats
- Identify inconsistencies in data that can cause problems during analysis
- Describe methods that can be used during data collection and data entry that can prevent inconsistencies
- Write guidance for how to collect and enter data
- Create a data dictionary describing a dataset
4. How to clean a tabular dataset (using Excel)
- Finding inconsistencies
- Missing data
- Capitalisation
- Spelling mistakes
- Pros and cons of Excel
Objectives
After following this episode, learners will be able to:
- Describe what data cleaning is and why it is important
- Find and resolve inconsistencies within a tabular dataset programmatically (e.g datetime, numeric precision)
- Identify missing values within a tabular dataset using filters
- Correct spelling mistakes using spell check tools and find + replace
- Standardise text formats using spreadsheet functions
- Describe the pros and cons of using spreadsheets for data collection and cleaning
- [Note: update for using R?]
5. Introduction to R
Need to write objectives
Contributing
Please see the CONTRIBUTING.md for contributing guidelines and details on how to get involved with this project.
Also see the current list of issues
for ideas for contributing. Look for the tag .
This indicates that the issue does not require in-depth knowledge of the project and lesson infrastructure, and is a good opportunity for a new contributor to get involved.
The tag indicates issues that we would particularly appreciate contributions to fix.
To learn more about how this lesson site is built and how you can edit the pages, see the Introduction to The Carpentries Workbench.
Citation
See CITATION.cff for citation information, including a list of authors.
License
Lesson content is published with a CC-BY license.
Contact
Please get in touch with any of the maintainers above with any questions about this lesson.
Owner
- Name: Katie Buntic
- Login: katiebuntic
- Kind: user
- Location: Brighton
- Repositories: 1
- Profile: https://github.com/katiebuntic
Citation (CITATION.cff)
# This template CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to replace its contents
# with information about your lesson.
# Remember to update this file periodically,
# ensuring that the author list and other fields remain accurate.
cff-version: 1.2.0
title: FIXME
message: >-
Please cite this lesson using the information in this file
when you refer to it in publications, and/or if you
re-use, adapt, or expand on the content in your own
training material.
type: dataset
authors:
- given-names: Kimberly
family-names: Meechan
- given-names: Katie
family-names: Buntic
- given-names: Victoria
family-names: Yorke-Edwards
abstract: >-
This lesson aims to teach those just starting to undertake research how to manage their data and files. The target auduence is masters/PhD/Postdoc researchers at the beginning of their projects. Basic digital skills required (e.g., file management, Excel, some version control exposure). No programming experience necessary.
license: CC-BY-4.0
GitHub Events
Total
- Issues event: 14
- Delete event: 7
- Issue comment event: 8
- Member event: 2
- Push event: 129
- Pull request review comment event: 7
- Pull request review event: 8
- Pull request event: 7
- Create event: 14
Last Year
- Issues event: 14
- Delete event: 7
- Issue comment event: 8
- Member event: 2
- Push event: 129
- Pull request review comment event: 7
- Pull request review event: 8
- Pull request event: 7
- Create event: 14
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Katie Buntic | 9****c | 43 |
| Victoria Yorke-Edwards | 3****s | 15 |
| Kimberly Meechan | 2****h | 11 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 10
- Total pull requests: 3
- Average time to close issues: 14 days
- Average time to close pull requests: about 1 hour
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 0.2
- Average comments per pull request: 0.33
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 3
- Average time to close issues: 14 days
- Average time to close pull requests: about 1 hour
- Issue authors: 3
- Pull request authors: 2
- Average comments per issue: 0.2
- Average comments per pull request: 0.33
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- katiebuntic (8)
- K-Meech (1)
- tobyhodges (1)
Pull Request Authors
- katiebuntic (2)
- K-Meech (1)