cow

Integrated CSV to RDF converter, using CSVW and nanopublications

https://github.com/clariah/cow

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.0%) to scientific vocabulary

Keywords

converters csv2rdf linked-data rdf semantic-web
Last synced: 4 months ago · JSON representation

Repository

Integrated CSV to RDF converter, using CSVW and nanopublications

Basic Info
  • Host: GitHub
  • Owner: CLARIAH
  • License: mit
  • Language: Python
  • Default Branch: base
  • Homepage:
  • Size: 31.1 MB
Statistics
  • Stars: 47
  • Watchers: 17
  • Forks: 9
  • Open Issues: 24
  • Releases: 0
Topics
converters csv2rdf linked-data rdf semantic-web
Created over 10 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Codemeta

README.md

CSV on the Web (CoW)

CoW is a tool to convert a .csv file into Linked Data. Specifically, CoW is an integrated CSV to RDF converter using the W3C standard CSVW for rich semantic table specificatons, producing nanopublications as an output RDF model. CoW converts any CSV file into an RDF dataset.

Features

Documentation and support

For user documentation see the basic introduction video and the GitHub wiki. Technical details are provided below. If you encounter an issue then please report it. Also feel free to create pull requests.

Quick Start Guide

There are two ways to run CoW. The quickest is via Docker, the more flexible via PIP.

Docker Image

Several data science tools, including CoW, are available via a Docker image.

Install

First, install the Docker virtualisation engine on your computer. Instructions on how to accomplish this can be found on the official Docker website. Use the following command in the Docker terminal:

```

docker pull wxwilcke/datalegend

``` Here, the #-symbol refers to the terminal of a user with administrative privileges on your machine and is not part of the command.

After the image has successfully been downloaded (or 'pulled'), the container can be run as follows:

```

docker run --rm -p 3000:3000 -it wxwilcke/datalegend

``` The virtual system can now be accessed by opening http://localhost:3000/wetty in your preferred browser, and by logging in using username datalegend and password datalegend.

For detailed instructions on this Docker image, see DataLegend Playground. For instructions on how to use the tool, see usage below.

Command Line Interface (CLI)

The Command Line Interface (CLI) is the recommended way of installing CoW for most users.

Install

Check whether the latest version of Python is installed on your device. For Windows/MacOS we recommend to install Python via the official distribution page.

The recommended method of installing CoW on your system is pip3:

pip3 install cow-csvw

You can upgrade your currently installed version with:

pip3 install cow-csvw --upgrade

Possible installation issues:

  • Permission issues. You can get around them by installing CoW in user space: pip3 install cow-csvw --user.
  • Cannot find command: make sure your binary user directory (typically something like /Users/user/Library/Python/3.7/bin in MacOS or /home/user/.local/bin in Linux) is in your PATH (in MacOS: /etc/paths).
  • Please report your unlisted issue.

Usage

Start the graphical interface by entering the following command:

cow_tool

Select a CSV file and click build to generate a file named myfile.csv-metadata.json (JSON schema file) with your mappings. Edit this file (optional) and then click convert to convert the CSV file to RDF. The output should be a myfile.csv.nq RDF file (nquads by default).

Command Line Interface

The straightforward CSV to RDF conversion is done by entering the following commands:

cow_tool_cli build myfile.csv

This will create a file named myfile.csv-metadata.json (JSON schema file). Next:

cow_tool_cli convert myfile.csv This command will output a myfile.csv.nq RDF file (nquads by default).

You don't need to worry about the JSON file, unless you want to change the metadata schema. To control the base URI namespace, URIs used in predicates, virtual columns, etcetera, edit the myfile.csv-metadata.json file and/or use CoW commands. For instance, you can control the output RDF serialization (with e.g. --format turtle). Have a look at the options below, the examples in the GitHub wiki, and the technical documentation.

Options

Check the --help for a complete list of options:

``` usage: cowtoolcli [-h] [--dataset DATASET] [--delimiter DELIMITER] [--quotechar QUOTECHAR] [--encoding ENCODING] [--processes PROCESSES] [--chunksize CHUNKSIZE] [--base BASE] [--format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}]] [--gzip] [--version] {convert,build} file [file ...]

Not nearly CSVW compliant schema builder and RDF converter

positional arguments: {convert,build} Use the schema of the file specified to convert it to RDF, or build a schema from scratch. file Path(s) of the file(s) that should be used for building or converting. Must be a CSV file.

optional arguments: -h, --help show this help message and exit --dataset DATASET A short name (slug) for the name of the dataset (will use input file name if not specified) --delimiter DELIMITER The delimiter used in the CSV file(s) --quotechar QUOTECHAR The character used as quotation character in the CSV file(s) --encoding ENCODING The character encoding used in the CSV file(s)

--processes PROCESSES The number of processes the converter should use --chunksize CHUNKSIZE The number of rows processed at each time --base BASE The base for URIs generated with the schema (only relevant when building a schema) --gzip Compress the output file using gzip --format [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}], -f [{xml,n3,turtle,nt,pretty-xml,trix,trig,nquads}] RDF serialization format --version show program's version number and exit ```

Library

Once installed, CoW can be used as a library as follows:

``` from cowcsvw.csvwtool import COW import os

COW(mode='build', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\"')

COW(mode='convert', files=[os.path.join(path, filename)], dataset='My dataset', delimiter=';', quotechar='\"', processes=4, chunksize=100, base='http://example.org/my-dataset', format='turtle', gzipped=False) ```

Further Information

Examples

The GitHub wiki provides more hands-on examples of transposing CSVs into Linked Data.

Technical documentation

Technical documentation for CoW are maintained in this GitHub repository (under ), and published through Read the Docs at http://csvw-converter.readthedocs.io/en/latest/.

To build the documentation from source, change into the docs directory, and run make html. This should produce an HTML version of the documentation in the _build/html directory.

License

MIT License (see license.txt)

Acknowledgements

Authors: Albert Meroño-Peñuela, Roderick van der Weerdt, Rinke Hoekstra, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Melvin Roest, Xander Wilcke

Copyright: Vrije Universiteit Amsterdam, Utrecht University, International Institute of Social History

CoW is developed and maintained by the CLARIAH project and funded by NWO.

Owner

  • Name: CLARIAH
  • Login: CLARIAH
  • Kind: organization

CLARIAH offers humanities scholars a Common Lab providing access to large collections of digital resources and innovative tools for research

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "https://w3id.org/software-iodata",
    "https://w3id.org/nwo-research-fields",
    "https://raw.githubusercontent.com/jantman/repostatus.org/master/badges/latest/ontology.jsonld",
    "https://w3id.org/research-technology-readiness-levels",
    "https://schema.org",
    "https://w3id.org/software-types"
  ],
  "@id": "https://tools.dev.clariah.nl/cow/1.21",
  "@type": "SoftwareSourceCode",
  "author": [
    {
      "@id": "https://tools.dev.clariah.nl/person/albert-meroño-peñuela",
      "@type": "Person",
      "email": [
        "albert.merono@vu.nl",
        "albert.meronyo@gmail.com"
      ],
      "familyName": "Meroño-Peñuela",
      "givenName": "Albert"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/roderick-van-der-weerdt",
      "@type": "Person",
      "email": "rvanderweerdt@hotmail.com",
      "familyName": "van der Weerdt",
      "givenName": "Roderick"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/rinke-hoekstra",
      "@type": "Person",
      "email": "rinke.hoekstra@vu.nl",
      "familyName": "Hoekstra",
      "givenName": "Rinke"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/kathrin-dentler",
      "@type": "Person",
      "email": "kathrin@dentler.org",
      "familyName": "Dentler",
      "givenName": "Kathrin"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/auke-rijpma",
      "@type": "Person",
      "familyName": "Rijpma",
      "givenName": "Auke"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/richard-zijdeman",
      "@type": "Person",
      "email": "richard.zijdeman@iisg.nl",
      "familyName": "Zijdeman",
      "givenName": "Richard"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/melvin-roest",
      "@type": "Person",
      "email": "melvinroest@gmail.com",
      "familyName": "Roest",
      "givenName": "Melvin"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/xander-wilcke",
      "@type": "Person",
      "email": "w.x.wilcke@vu.nl",
      "familyName": "Wilcke",
      "givenName": "Xander"
    }
  ],
  "contributor": [
    {
      "@id": "https://tools.dev.clariah.nl/person/rinke-hoekstra",
      "@type": "Person",
      "email": "rinke.hoekstra@vu.nl",
      "familyName": "Hoekstra",
      "givenName": "Rinke"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/albert-meroño-peñuela",
      "@type": "Person",
      "email": [
        "albert.merono@vu.nl",
        "albert.meronyo@gmail.com"
      ],
      "familyName": "Meroño-Peñuela",
      "givenName": "Albert"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/rijpma",
      "@type": "Person",
      "email": "auke.rijpma@gmail.com",
      "familyName": "",
      "givenName": "rijpma"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/rlzijdeman",
      "@type": "Person",
      "email": "richard.zijdeman@iisg.nl",
      "familyName": "",
      "givenName": "rlzijdeman"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/kathrinrin",
      "@type": "Person",
      "email": "k.dentler@vu.nl",
      "familyName": "",
      "givenName": "kathrinrin"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/roderick-van-der-weerdt",
      "@type": "Person",
      "email": "rvanderweerdt@hotmail.com",
      "familyName": "van der Weerdt",
      "givenName": "Roderick"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/melvin-roest",
      "@type": "Person",
      "email": "melvinroest@gmail.com",
      "familyName": "Roest",
      "givenName": "Melvin"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/richard-zijdeman",
      "@type": "Person",
      "email": "richard.zijdeman@gmail.com",
      "familyName": "Zijdeman",
      "givenName": "Richard"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/xander-wilcke",
      "@type": "Person",
      "email": "w.x.wilcke@vu.nl",
      "familyName": "Wilcke",
      "givenName": "Xander"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/kathrin-dentler",
      "@type": "Person",
      "email": "kathrin@dentler.org",
      "familyName": "Dentler",
      "givenName": "Kathrin"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/melvinroest",
      "@type": "Person",
      "email": "44729293+melvinroest@users.noreply.github.com",
      "familyName": "",
      "givenName": "melvinroest"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/rubenschalk",
      "@type": "Person",
      "email": "r.schalk@uu.nl",
      "familyName": "",
      "givenName": "RubenSchalk"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/roderickvanderweerdt",
      "@type": "Person",
      "email": "14040777+RoderickvanderWeerdt@users.noreply.github.com",
      "familyName": "",
      "givenName": "RoderickvanderWeerdt"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/kathrin",
      "@type": "Person",
      "email": "Kathrin@kathrins-mbp.home",
      "familyName": "",
      "givenName": "Kathrin"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/joe",
      "@type": "Person",
      "email": "raad.joe@hotmail.com",
      "familyName": "",
      "givenName": "Joe"
    },
    {
      "@id": "https://tools.dev.clariah.nl/person/ivo-zandhuis",
      "@type": "Person",
      "email": "ivo@zandhuis.nl",
      "familyName": "Zandhuis",
      "givenName": "Ivo"
    }
  ],
  "maintainer": {
    "@id": "https://tools.dev.clariah.nl/person/richard-zijdeman",
    "@type": "Person",
    "email": "richard.zijdeman@gmail.com",
    "familyName": "Zijdeman",
    "givenName": "Richard"
  },
  "codeRepository": "https://github.com/CLARIAH/COW",
  "description": "Integrated CSV to RDF converter, using CSVW and nanopublications",
  "developmentStatus": {
    "@id": "https://www.repostatus.org/#inactive",
    "@type": "skos:Concept",
    "og:image": "https://www.repostatus.org/badges/latest/inactive.svg",
    "skos:definition": "The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows.",
    "skos:inScheme": {
      "@id": "https://www.repostatus.org",
      "@type": "skos:ConceptScheme",
      "dct:creator": "Jason Antman",
      "dct:description": "A standard to easily communicate to humans and machines the development/support and usability status of software repositories/projects.",
      "dct:title": "repostatus.org"
    },
    "skos:prefLabel": "Inactive"
  },
  "downloadUrl": "https://github.com/CLARIAH/COW/archive/refs/tags/1.21.zip",
  "issueTracker": "https://github.com/CLARIAH/COW/issues",
  "identifier": "cow",
  "keywords": [
    "csv",
    "csvw",
    "rdf"
  ],
  "license": "http://spdx.org/licenses/MIT",
  "name": "cow-csvw",
  "owl:sameAs": [
    {
      "@id": "https://tools.dev.clariah.nl/cow/snapshot"
    },
    {
      "@id": "https://tools.dev.clariah.nl/cow.contributors/snapshot"
    },
    {
      "@id": "https://tools.dev.clariah.nl/cow-csvw/1.21"
    }
  ],
  "producer": {
    "@id": "https://tools.dev.clariah.nl/org/clariah",
    "@type": "Organization",
    "name": "CLARIAH",
    "url": "http://www.clariah.nl"
  },
  "programmingLanguage": "Python",
  "readme": "https://github.com/CLARIAH/COW/blob/1.21/README.md",
  "releaseNotes": "https://github.com/CLARIAH/COW/releases/tag/1.21",
  "review": {
    "@id": "https://tools.dev.clariah.nl/validation/N01043db934fab402ca5df3a3b7c322ba",
    "@type": "Review",
    "author": "codemetapy validator using software.ttl",
    "datePublished": "2023-02-10 03:04:13",
    "name": "Automatic software metadata validation report for cow-csvw 1.21",
    "reviewBody": "Please consult the CLARIAH Software Metadata Requirements at https://github.com/CLARIAH/clariah-plus/blob/main/requirements/software-metadata-requirements.md for an in-depth explanation of any found problems\n\nValidation of cow-csvw 1.21 was successful (score=3/5), but there are some warnings which should be addressed:\n\n1. Warning: Software source code *SHOULD* link to a continuous integration service that builds the software and runs the software's tests (This is missing in the metadata)\n2. Info: Reference publications *SHOULD* be expressed (This is missing in the metadata)\n3. Info: The funder *SHOULD* be acknowledged (This is missing in the metadata)\n4. Info: The technology readiness level *SHOULD* be expressed (This is missing in the metadata)",
    "reviewRating": 3
  },
  "runtimePlatform": [
    "Python",
    "Python 3",
    "Python 3.10"
  ],
  "funding": {
    "@type": "Grant",
    "name": "CLARIAH-PLUS (NWO grant 184.034.023)",
    "funder": {
      "@type": "Organization",
      "name": "NWO",
      "url": "https://www.nwo.nl"
    }
  },
  "softwareHelp": {
    "@id": "http://csvw-converter.readthedocs.io/en/latest/",
    "@type": "WebSite",
    "name": "CoW: Converter for CSV on the Web — CSVW Converters 1.0.0 documentation",
    "url": "http://csvw-converter.readthedocs.io/en/latest/"
  },
  "softwareRequirements": [
    {
      "@id": "https://tools.dev.clariah.nl/dependency/jinja23.0.3",
      "@type": "SoftwareApplication",
      "identifier": "Jinja2",
      "name": "Jinja2",
      "runtimePlatform": "Python 3",
      "version": "3.0.3"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/js2py0.71",
      "@type": "SoftwareApplication",
      "identifier": "Js2Py",
      "name": "Js2Py",
      "runtimePlatform": "Python 3",
      "version": "0.71"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/pyyaml6.0",
      "@type": "SoftwareApplication",
      "identifier": "PyYAML",
      "name": "PyYAML",
      "runtimePlatform": "Python 3",
      "version": "6.0"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/werkzeug2.0.2",
      "@type": "SoftwareApplication",
      "identifier": "Werkzeug",
      "name": "Werkzeug",
      "runtimePlatform": "Python 3",
      "version": "2.0.2"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/chardet4.0.0",
      "@type": "SoftwareApplication",
      "identifier": "chardet",
      "name": "chardet",
      "runtimePlatform": "Python 3",
      "version": "4.0.0"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/iribaker0.2",
      "@type": "SoftwareApplication",
      "identifier": "iribaker",
      "name": "iribaker",
      "runtimePlatform": "Python 3",
      "version": "0.2"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/isodate0.6.1",
      "@type": "SoftwareApplication",
      "identifier": "isodate",
      "name": "isodate",
      "runtimePlatform": "Python 3",
      "version": "0.6.1"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/pyjsparser2.7.1",
      "@type": "SoftwareApplication",
      "identifier": "pyjsparser",
      "name": "pyjsparser",
      "runtimePlatform": "Python 3",
      "version": "2.7.1"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/pytz2021.3",
      "@type": "SoftwareApplication",
      "identifier": "pytz",
      "name": "pytz",
      "runtimePlatform": "Python 3",
      "version": "2021.3"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/rdflib6.0.2",
      "@type": "SoftwareApplication",
      "identifier": "rdflib",
      "name": "rdflib",
      "runtimePlatform": "Python 3",
      "version": "6.0.2"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/rfc39871.3.8",
      "@type": "SoftwareApplication",
      "identifier": "rfc3987",
      "name": "rfc3987",
      "runtimePlatform": "Python 3",
      "version": "1.3.8"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/tzlocal4.1",
      "@type": "SoftwareApplication",
      "identifier": "tzlocal",
      "name": "tzlocal",
      "runtimePlatform": "Python 3",
      "version": "4.1"
    },
    {
      "@id": "https://tools.dev.clariah.nl/dependency/unicodecsv0.14.1",
      "@type": "SoftwareApplication",
      "identifier": "unicodecsv",
      "name": "unicodecsv",
      "runtimePlatform": "Python 3",
      "version": "0.14.1"
    }
  ],
  "targetProduct": {
    "@id": "https://tools.dev.clariah.nl/commandlineapplication/cow_tool/1.21",
    "@type": "CommandLineApplication",
    "executableName": "cow_tool",
    "name": "cow_tool",
    "runtimePlatform": "Python 3"
  },
  "url": "https://github.com/CLARIAH/COW",
  "version": "1.21"
}

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 9
  • Total Committers: 2
  • Avg Commits per committer: 4.5
  • Development Distribution Score (DDS): 0.222
Past Year
  • Commits: 7
  • Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Xander Wilcke w****e@v****l 7
harshpundhir p****6@g****m 2
Committer Domains (Top 20 + Academic)
vu.nl: 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 93
  • Total pull requests: 13
  • Average time to close issues: 10 months
  • Average time to close pull requests: 3 months
  • Total issue authors: 23
  • Total pull request authors: 4
  • Average comments per issue: 1.58
  • Average comments per pull request: 0.31
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rlzijdeman (29)
  • Bramvdhout (14)
  • albertmeronyo (13)
  • Sozialarchiv (6)
  • Thunnisvanoort (4)
  • RubenSchalk (4)
  • RoderickvanderWeerdt (2)
  • RinkeHoekstra (2)
  • sytzevh (2)
  • raadjoe (2)
  • tkuhn (2)
  • melvinroest (2)
  • rschalkrce (1)
  • matiasf (1)
  • wagonhelm (1)
Pull Request Authors
  • dependabot[bot] (7)
  • rlzijdeman (4)
  • melvinroest (3)
  • sytzevh (1)
Top Labels
Issue Labels
enhancement (19) bug (19) question (4) feature (4) converting (2) deployment (2) outsidescope (1) task (1) documentation (1) help wanted (1) Todo (1)
Pull Request Labels
dependencies (7)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 62 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 66
  • Total maintainers: 4
pypi.org: cow-csvw

Integrated CSV to RDF converter, using CSVW and nanopublications

  • Versions: 66
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 62 Last month
Rankings
Dependent packages count: 10.1%
Stargazers count: 10.3%
Forks count: 11.4%
Average: 14.5%
Downloads: 19.3%
Dependent repos count: 21.6%
Last synced: 4 months ago

Dependencies

requirements.txt pypi
  • Jinja2 ==3.0.3
  • Js2Py ==0.71
  • PyYAML ==6.0
  • Werkzeug ==2.0.2
  • chardet ==4.0.0
  • iribaker ==0.2
  • isodate ==0.6.1
  • pyjsparser ==2.7.1
  • pytz ==2021.3
  • rdflib ==6.0.2
  • rfc3987 ==1.3.8
  • tzlocal ==4.1
  • unicodecsv ==0.14.1
setup.py pypi