ai_bio_project

This is the carpentries repository for our funded project "How to Build FAIR Domain-Specific Datasets for fine tuning/training NLP models"

https://github.com/sara-morsy/ai_bio_project

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This is the carpentries repository for our funded project "How to Build FAIR Domain-Specific Datasets for fine tuning/training NLP models"

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Code of conduct Citation

README.md

How to Build FAIR Domain-Specific Datasets for fine tuning/training NLP models?

Domain-specifc natural language processing (NLP) models extract data with high accuracy from unstructured text by identifying specialized vocabularies that can be used in different applications. These models are developed by using domain specific data. Through specialized datasets, researchers fine-tuned pretrained models for protein-protein relationships in the STRING database (1), and accelerated drug development by identifying chemical-gene and drug-drug interactions, as well as predicting peptide toxicity (2-4), and extracting brain connectivity data of neurological disorders (5). Despite their applications, fields like veterinary medicine and agricultural biology lack NLP-based applications. Barriers include the absence of high-quality domain-specific datasets, small and unbalanced datasets, and insufficient expertise to build datasets (6, 7). Manual annotation, a common necessity in these fields, is time-consuming and prone to bias, affecting model performance. To address this, we propose a training course focused on building FAIR (Findable, Accessible, Interoperable, Reusable) domain-specific datasets (6).

Our target audience are:

• Researchers looking to adopt NLP solutions for analyzing domain-specific text, even those who lack expertise in AI but have domain knowledge, which is crucial for building annotating data.
• Computational biology or AI researchers who work on building domain-specific NLP applications who need to overcome dataset scarcity and quality challenges.

Owner

  • Login: Sara-Morsy
  • Kind: user

Citation (CITATION.cff)

# This template CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to replace its contents
# with information about your lesson.
# Remember to update this file periodically, 
# ensuring that the author list and other fields remain accurate.

cff-version: 1.2.0
title: FIXME
message: >-
  Please cite this lesson using the information in this file
  when you refer to it in publications, and/or if you
  re-use, adapt, or expand on the content in your own
  training material.
type: dataset
authors:
  - given-names: FIXME
    family-names: FIXME
abstract: >-
  FIXME Replace this with a short abstract describing the
  lesson, e.g. its target audience and main intended
  learning objectives.
license: CC-BY-4.0

GitHub Events

Total
  • Push event: 6
Last Year
  • Push event: 6

Dependencies

.github/workflows/pr-close-signal.yaml actions
  • actions/upload-artifact v4 composite
.github/workflows/pr-comment.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/check-valid-pr main composite
  • carpentries/actions/comment-diff main composite
  • carpentries/actions/download-workflow-artifact main composite
.github/workflows/pr-post-remove-branch.yaml actions
  • carpentries/actions/download-workflow-artifact main composite
  • carpentries/actions/remove-branch main composite
.github/workflows/pr-preflight.yaml actions
  • carpentries/actions/check-valid-pr main composite
  • carpentries/actions/comment-diff main composite
.github/workflows/pr-receive.yaml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • carpentries/actions/check-valid-pr main composite
  • carpentries/actions/setup-lesson-deps main composite
  • carpentries/actions/setup-sandpaper main composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/sandpaper-main.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/setup-lesson-deps main composite
  • carpentries/actions/setup-sandpaper main composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/update-cache.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/check-valid-credentials main composite
  • carpentries/actions/update-lockfile main composite
  • carpentries/create-pull-request main composite
  • r-lib/actions/setup-r v2 composite
.github/workflows/update-workflows.yaml actions
  • actions/checkout v4 composite
  • carpentries/actions/check-valid-credentials main composite
  • carpentries/actions/update-workflows main composite
  • carpentries/create-pull-request main composite