koza
Data transformation framework for LinkML data models
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Data transformation framework for LinkML data models
Basic Info
- Host: GitHub
- Owner: monarch-initiative
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://koza.monarchinitiative.org/
- Size: 5.07 MB
Statistics
- Stars: 55
- Watchers: 22
- Forks: 5
- Open Issues: 43
- Releases: 27
Topics
Metadata Files
README.md
Koza - a data transformation framework
Disclaimer: Koza is in beta - we are looking for testers!
Overview
- Transform csv, json, yaml, jsonl, and xml and converting them to a target csv, json, or jsonl format based on your dataclass model.
- Koza also can output data in the KGX format
- Write data transforms in semi-declarative Python
- Configure source files, expected columns/json properties and path filters, field filters, and metadata in yaml
- Create or import mapping files to be used in ingests (eg id mapping, type mappings)
- Create and use translation tables to map between source and target vocabularies
Installation
Koza is available on PyPi and can be installed via pip/pipx:
[pip|pipx] install koza
Usage
NOTE: As of version 0.2.0, there is a new method for getting your ingest's KozaApp instance. Please see the updated documentation for details.
See the Koza documentation for usage information
Try the Examples
Validate
Give Koza a local or remote csv file, and get some basic information (headers, number of rows)
bash
koza validate \
--file https://raw.githubusercontent.com/monarch-initiative/koza/main/examples/data/string.tsv \
--delimiter ' '
Sending a json or jsonl formatted file will confirm if the file is valid json or jsonl
bash
koza validate \
--file ./examples/data/ZFIN_PHENOTYPE_0.jsonl.gz \
--format jsonl
bash
koza validate \
--file ./examples/data/ddpheno.json.gz \
--format json
Transform
Run the example ingest, "string/protein-links-detailed" ```bash koza transform \ --source examples/string/protein-links-detailed.yaml \ --global-table examples/translation_table.yaml
koza transform \ --source examples/string-declarative/protein-links-detailed.yaml \ --global-table examples/translation_table.yaml ```
Note:
Koza expects a directory structure as described in the above example
with the source config file and transform code in the same directory:
.
├── ...
│ ├── your_source
│ │ ├── your_ingest.yaml
│ │ └── your_ingest.py
│ └── some_translation_table.yaml
└── ...
Owner
- Name: Monarch Initiative
- Login: monarch-initiative
- Kind: organization
- Location: Globally-distributed team (see https://monarchinitiative.org/page/team)
- Website: https://github.com/monarch-initiative/monarch-app/blob/master/README.md#about-monarch
- Repositories: 118
- Profile: https://github.com/monarch-initiative
Cross-species disease discovery and diagnosis
Citation (CITATION.cff)
cff-version: '1.1.0'
message: 'Please cite the following works when using this software.'
abstract: 'Data transformation framework for LinkML data models'
authors:
- family-names: 'Schaper'
given-names: 'Kevin'
- family-names: 'Ships'
given-names: 'Glass'
- family-names: 'Shefchek'
given-names: 'Kent'
- family-names: 'Moxon'
given-names: 'Sierra'
- family-names: 'Mungall'
given-names: 'Chris'
date-released: 2022-06-15
identifiers:
- type: 'url'
value: 'https://github.com/monarch-initiative/koza'
title: 'monarch-initiative/koza'
url: 'https://github.com/monarch-initiative/koza'
version: '0.1.14'
GitHub Events
Total
- Create event: 14
- Release event: 2
- Issues event: 11
- Watch event: 7
- Delete event: 6
- Member event: 8
- Issue comment event: 30
- Push event: 37
- Pull request review comment event: 2
- Pull request review event: 8
- Pull request event: 12
Last Year
- Create event: 14
- Release event: 2
- Issues event: 11
- Watch event: 7
- Delete event: 6
- Member event: 8
- Issue comment event: 30
- Push event: 37
- Pull request review comment event: 2
- Pull request review event: 8
- Pull request event: 12
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| glass-ships | g****s@o****m | 137 |
| Kevin Schaper | k****r@g****m | 82 |
| kshefchek | k****k@g****m | 37 |
| DnlRKorn | 6****n | 3 |
| Sierra Taylor Moxon | s****r@g****m | 2 |
| Harshad Hegde | h****b@g****m | 2 |
| Daniel Korn | d****n@h****r | 2 |
| amc-corey-cox | 6****x | 1 |
| Nomi Harris | n****s | 1 |
| Chris Mungall | c****m@b****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 82
- Total pull requests: 79
- Average time to close issues: 4 months
- Average time to close pull requests: 9 days
- Total issue authors: 11
- Total pull request authors: 12
- Average comments per issue: 1.22
- Average comments per pull request: 0.65
- Merged pull requests: 64
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 13
- Pull requests: 14
- Average time to close issues: 10 months
- Average time to close pull requests: about 1 hour
- Issue authors: 4
- Pull request authors: 6
- Average comments per issue: 0.92
- Average comments per pull request: 0.43
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- kevinschaper (22)
- glass-ships (18)
- kshefchek (11)
- DnlRKorn (7)
- ptgolden (7)
- caufieldjh (6)
- putmantime (4)
- RichardBruskiewich (3)
- amc-corey-cox (2)
- hrshdhgd (2)
- matentzn (1)
Pull Request Authors
- kevinschaper (36)
- glass-ships (24)
- kshefchek (10)
- DnlRKorn (6)
- amc-corey-cox (4)
- ptgolden (2)
- caufieldjh (1)
- nlharris (1)
- hrshdhgd (1)
- cmungall (1)
- dependabot[bot] (1)
- sierra-moxon (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 8,348 last-month
- Total dependent packages: 0
- Total dependent repositories: 5
- Total versions: 37
- Total maintainers: 4
pypi.org: koza
Data transformation framework for LinkML data models
- Documentation: https://koza.readthedocs.io/
- License: BSD License
-
Latest release: 2.0.0
published 6 months ago
Rankings
Maintainers (4)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v3 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- 125 dependencies
- autoflake ^1.3.1 develop
- biolink-model >=3.0.1 develop
- black 22.3.0 develop
- dask >=2022.5.2 develop
- isort ^5.0.6 develop
- mkdocs >=1.3.0 develop
- mkdocs-material >=8.3.4 develop
- pytest >=6.0.0 develop
- click 8.0.4
- linkml-validator >=0.4.4
- ordered-set >=4.1.0
- pydantic ^1.0.0
- python ^3.8
- pyyaml ^5.3.1
- requests ^2.24.0
- typer ^0.7.0