pycsvy

Python reader/writer for CSV files with YAML header information

https://github.com/imperialcollegelondon/pycsvy

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    4 of 7 committers (57.1%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

hacktoberfest

Keywords from Contributors

battery decarbonisation hydraulic-modelling hydrology energy-system-model energy-system swmmanywhere swmm5 swmm stormwater
Last synced: 10 months ago · JSON representation ·

Repository

Python reader/writer for CSV files with YAML header information

Basic Info
  • Host: GitHub
  • Owner: ImperialCollegeLondon
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.53 MB
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 7
  • Open Issues: 22
  • Releases: 2
Topics
hacktoberfest
Created about 4 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

CSVY for Python

PyPI version shields.io Conda Version PyPI status PyPI pyversions PyPI license Anaconda-Server Badge Anaconda-Server Badge Test and build pre-commit.ci status codecov Codacy Badge <!-- ALL-CONTRIBUTORS-BADGE:START - Do not remove or modify this section --> All Contributors <!-- ALL-CONTRIBUTORS-BADGE:END -->

CSV is a popular format for storing tabular data used in many disciplines. Metadata concerning the contents of the file is often included in the header, but it rarely follows a format that is machine readable - sometimes is not even human readable! In some cases, such information is provided in a separate file, which is not ideal as it is easy for data and metadata to get separated.

CSVY is a small Python package to handle CSV files in which the metadata in the header is formatted in YAML. It supports reading/writing tabular data contained in numpy arrays, pandas DataFrames, polars DataFrames, and nested lists, as well as metadata using a standard python dictionary. Ultimately, it aims to incorporate information about the CSV dialect used and a Table Schema specifying the contents of each column to aid the reading and interpretation of the data.

Installation

'pycsvy' is available in PyPI and conda-forge therefore its installation is as easy as:

bash pip install pycsvy

or

bash conda install --channel=conda-forge pycsvy

In order to support reading into numpy arrays, pandas DataFrames or polars DataFrames, you will need to install those packages, too. This can be support by specifying extras, ie:

bash pip install pycsvy[pandas, polars]

Usage

In the simplest case, to save some data contained in data and some metadata contained in a metadata dictionary into a CSVY file important_data.csv (the extension is not relevant), just do the following:

```python import csvy

csvy.write("important_data.csv", data, metadata) ```

The resulting file will have the YAML-formatted header in between --- markers with, optionally, a comment character starting each header line. It could look something like the following:

```text

name: my-dataset title: Example file of csvy description: Show a csvy sample file. encoding: utf-8 schema: fields: - name: Date type: object - name: WTI

type: number

Date,WTI 1986-01-02,25.56 1986-01-03,26.00 1986-01-06,26.53 1986-01-07,25.85 1986-01-08,25.87 ```

For reading the information back:

```python import csvy

To read into a numpy array

data, metadata = csvy.readtoarray("important_data.csv")

To read into a pandas DataFrame

data, metadata = csvy.readtodataframe("important_data.csv")

To read into a polars LazyFrame

data, metadata = csvy.readtopolars("important_data.csv")

To read into a polars DataFrame

data, metadata = csvy.readtopolars("important_data.csv", eager=True) ```

The appropriate writer/reader will be selected based on the type of data:

  • numpy array: np.savetxt and np.loadtxt
  • pandas DataFrame: pd.DataFrame.to_csv and pd.read_csv
  • polars DataFrame/LazyFrame: pl.DataFrame.write_csv and pl.scan_csv
  • nested lists:' csv.writer and csv.reader

Options can be passed to the tabular data writer/reader by setting the csv_options dictionary. Likewise you can set the yaml_options dictionary with whatever options you want to pass to yaml.safe_load and yaml.safe_dump functions, reading/writing the YAML-formatted header, respectively.

You can also instruct a writer to use line buffering, instead of the usual chunk buffering.

Finally, you can control the character(s) used to indicate comments by setting the comment keyword when writing a file. By default, there is no character (""). During reading, the comment character is found automatically.

Note that, by default, these reader functions will assume UTF-8 encoding. You can choose a different character encoding by setting the encoding keyword argument to any of these reader or writer functions. For example, on Windows, Windows-1252 encoding is often used, which can be specified via encoding='cp1252'.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Diego Alonso Álvarez
Diego Alonso Álvarez

🚇 🤔 🚧 ⚠️ 🐛 💻
Alex Dewar
Alex Dewar

🤔 ⚠️ 💻
Adrian D'Alessandro
Adrian D'Alessandro

🐛 💻 📖
James Paul Turner
James Paul Turner

🚇 💻
Dan Cummins
Dan Cummins

🚇 💻
mikeheyns
mikeheyns

🚇

This project follows the all-contributors specification. Contributions of any kind welcome!

Owner

  • Name: Imperial College London
  • Login: ImperialCollegeLondon
  • Kind: organization
  • Email: icgithub-support@imperial.ac.uk
  • Location: Imperial College London

Imperial College main code repository

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: CSVY for Python
message: Please cite this software using these metadata.
type: software
authors:
  - given-names: Diego
    family-names: Alonso-Álvarez
    email: d.alonso-alvarez@imperial.ac.uk
    affiliation: Imperial College London
    orcid: 'https://orcid.org/0000-0002-0060-9495'
abstract: >-
  CSV is a popular format for storing tabular data
  used in many disciplines. Metadata concerning the
  contents of the file is often included in the
  header, but it rarely follows a format that is
  machine readable - sometimes is not even human
  readable! In some cases, such information is
  provided in a separate file, which is not ideal as
  it is easy for data and metadata to get separated.


  CSVY is a small Python package to handle CSV files
  in which the metadata in the header is formatted in
  YAML. It supports reading/writing tabular data
  contained in numpy arrays, pandas DataFrames and
  nested lists, as well as metadata using a standard
  python dictionary. 
license: BSD-3-Clause

GitHub Events

Total
  • Create event: 173
  • Release event: 2
  • Issues event: 14
  • Watch event: 1
  • Delete event: 157
  • Issue comment event: 58
  • Push event: 346
  • Pull request review comment event: 76
  • Pull request review event: 358
  • Pull request event: 325
  • Fork event: 1
Last Year
  • Create event: 173
  • Release event: 2
  • Issues event: 14
  • Watch event: 1
  • Delete event: 157
  • Issue comment event: 58
  • Push event: 346
  • Pull request review comment event: 76
  • Pull request review event: 358
  • Pull request event: 325
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 125
  • Total Committers: 7
  • Avg Commits per committer: 17.857
  • Development Distribution Score (DDS): 0.384
Past Year
  • Commits: 61
  • Committers: 7
  • Avg Commits per committer: 8.714
  • Development Distribution Score (DDS): 0.787
Top Committers
Name Email Commits
Diego d****z@i****k 77
Alex Dewar a****r@i****k 12
allcontributors[bot] 4****] 10
James Paul Turner j****r@i****k 8
Adrian D'Alessandro a****o@i****k 8
dependabot[bot] 4****] 7
pre-commit-ci[bot] 6****] 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 48
  • Total pull requests: 355
  • Average time to close issues: 3 months
  • Average time to close pull requests: 13 days
  • Total issue authors: 5
  • Total pull request authors: 11
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.43
  • Merged pull requests: 278
  • Bot issues: 1
  • Bot pull requests: 324
Past Year
  • Issues: 32
  • Pull requests: 303
  • Average time to close issues: 20 days
  • Average time to close pull requests: 3 days
  • Issue authors: 5
  • Pull request authors: 9
  • Average comments per issue: 0.69
  • Average comments per pull request: 0.23
  • Merged pull requests: 249
  • Bot issues: 1
  • Bot pull requests: 283
Top Authors
Issue Authors
  • dalonsoa (36)
  • alexdewar (8)
  • AdrianDAlessandro (3)
  • Yogeshkarma (2)
  • dependabot[bot] (1)
Pull Request Authors
  • dependabot[bot] (286)
  • pre-commit-ci[bot] (67)
  • dalonsoa (20)
  • allcontributors[bot] (13)
  • alexdewar (7)
  • AdrianDAlessandro (6)
  • github-actions[bot] (4)
  • Kaos599 (4)
  • dc2917 (3)
  • mikeheyns (2)
  • jamesturner246 (1)
Top Labels
Issue Labels
enhancement (11) Hacktoberfest (11) infrastructure (6) good first issue (5) bug (4) documentation (3) Tests (3) github_actions (3) complex (2) dependencies (2) python (1) hacktoberfest-accepted (1)
Pull Request Labels
dependencies (286) python (268) hacktoberfest-accepted (18) github_actions (18)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 559 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 8
  • Total maintainers: 2
pypi.org: pycsvy

Python reader/writer for CSV files with YAML header information.

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 559 Last month
Rankings
Downloads: 6.5%
Dependent packages count: 10.1%
Forks count: 15.3%
Average: 15.7%
Dependent repos count: 21.6%
Stargazers count: 25.0%
Maintainers (2)
Last synced: 10 months ago

Dependencies

poetry.lock pypi
  • atomicwrites 1.4.0 develop
  • attrs 21.4.0 develop
  • black 22.3.0 develop
  • certifi 2022.5.18 develop
  • cfgv 3.3.1 develop
  • charset-normalizer 2.0.12 develop
  • click 8.1.3 develop
  • codecov 2.1.12 develop
  • colorama 0.4.4 develop
  • coverage 6.3.3 develop
  • distlib 0.3.4 develop
  • filelock 3.7.0 develop
  • flake8 4.0.1 develop
  • identify 2.5.0 develop
  • idna 3.3 develop
  • iniconfig 1.1.1 develop
  • isort 5.10.1 develop
  • mccabe 0.6.1 develop
  • mypy 0.950 develop
  • mypy-extensions 0.4.3 develop
  • nodeenv 1.6.0 develop
  • numpy 1.22.3 develop
  • packaging 21.3 develop
  • pandas 1.4.2 develop
  • pathspec 0.9.0 develop
  • platformdirs 2.5.2 develop
  • pluggy 1.0.0 develop
  • pre-commit 2.19.0 develop
  • py 1.11.0 develop
  • pycodestyle 2.8.0 develop
  • pyflakes 2.4.0 develop
  • pyparsing 3.0.9 develop
  • pytest 7.1.2 develop
  • pytest-cov 3.0.0 develop
  • pytest-flake8 1.1.1 develop
  • pytest-mock 3.7.0 develop
  • pytest-mypy 0.9.1 develop
  • python-dateutil 2.8.2 develop
  • pytz 2022.1 develop
  • requests 2.27.1 develop
  • six 1.16.0 develop
  • toml 0.10.2 develop
  • tomli 2.0.1 develop
  • types-pyyaml 6.0.7 develop
  • typing-extensions 4.2.0 develop
  • urllib3 1.26.9 develop
  • virtualenv 20.14.1 develop
  • pyyaml 6.0
pyproject.toml pypi
  • black ^22.3.0 develop
  • codecov ^2.1.12 develop
  • coverage ^6.3.3 develop
  • flake8 ^4.0.1 develop
  • isort ^5.10.1 develop
  • mypy ^0.950 develop
  • numpy ^1.22.3 develop
  • pandas ^1.4.1 develop
  • pre-commit ^2.18.1 develop
  • pytest ^7.0 develop
  • pytest-cov ^3.0.0 develop
  • pytest-flake8 ^1.1.1 develop
  • pytest-mock ^3.7.0 develop
  • pytest-mypy ^0.9.1 develop
  • types-PyYAML ^6.0.7 develop
  • PyYAML ^6.0
  • python ^3.8
.github/workflows/check-links.yml actions
  • actions/checkout master composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
.github/workflows/ci.yml actions
  • abatilo/actions-poetry v2.0.0 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
  • pre-commit/action v2.0.2 composite
.github/workflows/auto-merge.yml actions
.github/workflows/ci_template.yml actions
  • abatilo/actions-poetry v4.0.0 composite
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • codecov/codecov-action v5 composite
.github/workflows/publish.yml actions
  • abatilo/actions-poetry v4.0.0 composite
  • actions/attest-build-provenance v2 composite
  • actions/checkout v3 composite
  • actions/checkout v4 composite
  • actions/download-artifact v4 composite
  • actions/setup-python v5 composite
  • hynek/build-and-inspect-python-package v2 composite
  • pypa/gh-action-pypi-publish release/v1 composite