koza

Data transformation framework for LinkML data models

https://github.com/monarch-initiative/koza

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

etl knowledge-graph koza linkml monarchinitiative obofoundry ontology

Keywords from Contributors

data-dictionaries data-modeling json-ld-context json-schema linked-data linkml-schema owl rdf semantic-web shacl
Last synced: 6 months ago · JSON representation ·

Repository

Data transformation framework for LinkML data models

Basic Info
Statistics
  • Stars: 55
  • Watchers: 22
  • Forks: 5
  • Open Issues: 43
  • Releases: 27
Topics
etl knowledge-graph koza linkml monarchinitiative obofoundry ontology
Created about 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

Koza - a data transformation framework

Pyversions PyPi Github Action

pupa

Documentation

Disclaimer: Koza is in beta - we are looking for testers!

Overview

  • Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
  • Koza also can output data in the KGX format
  • Write data transforms in semi-declarative Python
  • Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
  • Create or import mapping files to be used in ingests (eg id mapping, type mappings)
  • Create and use translation tables to map between source and target vocabularies

Installation

Koza is available on PyPi and can be installed via pip/pipx: [pip|pipx] install koza

Usage

NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp instance. Please see the updated documentation for details.

See the Koza documentation for usage information

Try the Examples

Validate

Give Koza a local or remote csv file, and get some basic information (headers, number of rows)

bash koza validate \ --file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \ --delimiter ' '

Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl

bash koza validate \ --file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \ --format jsonl

bash koza validate \ --file ./examples/data/ddpheno.json.gz \ --format json

Transform

Run the example ingest, "string/protein-links-detailed" ```bash koza transform \ --source examples/string/protein-links-detailed.yaml \ --global-table examples/translation_table.yaml

koza transform \ --source examples/string-declarative/protein-links-detailed.yaml \ --global-table examples/translation_table.yaml ```

Note: Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory: . ├── ... │ ├── your_source │ │ ├── your_ingest.yaml │ │ └── your_ingest.py │ └── some_translation_table.yaml └── ...

Owner

  • Name: Monarch Initiative
  • Login: monarch-initiative
  • Kind: organization
  • Location: Globally-distributed team (see https://monarchinitiative.org/page/team)

Cross-species disease discovery and diagnosis

Citation (CITATION.cff)

cff-version: '1.1.0'
message: 'Please cite the following works when using this software.'
abstract: 'Data transformation framework for LinkML data models'
authors:
  - family-names: 'Schaper'
    given-names: 'Kevin'
  - family-names: 'Ships'
    given-names: 'Glass'
  - family-names: 'Shefchek'
    given-names: 'Kent'
  - family-names: 'Moxon'
    given-names: 'Sierra'
  - family-names: 'Mungall'
    given-names: 'Chris'
date-released: 2022-06-15
identifiers:
  - type: 'url'
    value: 'https://github.com/monarch-initiative/koza'
title: 'monarch-initiative/koza'
url: 'https://github.com/monarch-initiative/koza'
version: '0.1.14'

GitHub Events

Total
  • Create event: 14
  • Release event: 2
  • Issues event: 11
  • Watch event: 7
  • Delete event: 6
  • Member event: 8
  • Issue comment event: 30
  • Push event: 37
  • Pull request review comment event: 2
  • Pull request review event: 8
  • Pull request event: 12
Last Year
  • Create event: 14
  • Release event: 2
  • Issues event: 11
  • Watch event: 7
  • Delete event: 6
  • Member event: 8
  • Issue comment event: 30
  • Push event: 37
  • Pull request review comment event: 2
  • Pull request review event: 8
  • Pull request event: 12

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 268
  • Total Committers: 10
  • Avg Commits per committer: 26.8
  • Development Distribution Score (DDS): 0.489
Past Year
  • Commits: 12
  • Committers: 5
  • Avg Commits per committer: 2.4
  • Development Distribution Score (DDS): 0.583
Top Committers
Name Email Commits
glass-ships g****s@o****m 137
Kevin Schaper k****r@g****m 82
kshefchek k****k@g****m 37
DnlRKorn 6****n 3
Sierra Taylor Moxon s****r@g****m 2
Harshad Hegde h****b@g****m 2
Daniel Korn d****n@h****r 2
amc-corey-cox 6****x 1
Nomi Harris n****s 1
Chris Mungall c****m@b****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 82
  • Total pull requests: 79
  • Average time to close issues: 4 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 11
  • Total pull request authors: 12
  • Average comments per issue: 1.22
  • Average comments per pull request: 0.65
  • Merged pull requests: 64
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 13
  • Pull requests: 14
  • Average time to close issues: 10 months
  • Average time to close pull requests: about 1 hour
  • Issue authors: 4
  • Pull request authors: 6
  • Average comments per issue: 0.92
  • Average comments per pull request: 0.43
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • kevinschaper (22)
  • glass-ships (18)
  • kshefchek (11)
  • DnlRKorn (7)
  • ptgolden (7)
  • caufieldjh (6)
  • putmantime (4)
  • RichardBruskiewich (3)
  • amc-corey-cox (2)
  • hrshdhgd (2)
  • matentzn (1)
Pull Request Authors
  • kevinschaper (36)
  • glass-ships (24)
  • kshefchek (10)
  • DnlRKorn (6)
  • amc-corey-cox (4)
  • ptgolden (2)
  • caufieldjh (1)
  • nlharris (1)
  • hrshdhgd (1)
  • cmungall (1)
  • dependabot[bot] (1)
  • sierra-moxon (1)
Top Labels
Issue Labels
enhancement (19) good first issue (3) documentation (2) question (2) bug (2) blocked (1)
Pull Request Labels
enhancement (2) blocked (1) dependencies (1) github_actions (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 8,348 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 5
  • Total versions: 37
  • Total maintainers: 4
pypi.org: koza

Data transformation framework for LinkML data models

  • Versions: 37
  • Dependent Packages: 0
  • Dependent Repositories: 5
  • Downloads: 8,348 Last month
Rankings
Dependent repos count: 6.6%
Average: 9.4%
Dependent packages count: 10.1%
Downloads: 11.4%
Last synced: 6 months ago

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/documentation.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
poetry.lock pypi
  • 125 dependencies
pyproject.toml pypi
  • autoflake ^1.3.1 develop
  • biolink-model >=3.0.1 develop
  • black 22.3.0 develop
  • dask >=2022.5.2 develop
  • isort ^5.0.6 develop
  • mkdocs >=1.3.0 develop
  • mkdocs-material >=8.3.4 develop
  • pytest >=6.0.0 develop
  • click 8.0.4
  • linkml-validator >=0.4.4
  • ordered-set >=4.1.0
  • pydantic ^1.0.0
  • python ^3.8
  • pyyaml ^5.3.1
  • requests ^2.24.0
  • typer ^0.7.0