datatools

A set of tools for working with JSON, CSV and Excel workbooks

https://github.com/caltechlibrary/datatools

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
    Organization caltechlibrary has institutional domain (www.library.caltech.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

csv data-munging excel-workbook json shell-scripting structured-data xlsx

Keywords from Contributors

opml opml-outline opml-to-json timesheet-notation open-screenplay-format screenplay screenwriting fountain markup markup-language
Last synced: 4 months ago · JSON representation ·

Repository

A set of tools for working with JSON, CSV and Excel workbooks

Basic Info
Statistics
  • Stars: 78
  • Watchers: 14
  • Forks: 10
  • Open Issues: 0
  • Releases: 56
Topics
csv data-munging excel-workbook json shell-scripting structured-data xlsx
Created almost 9 years ago · Last pushed 5 months ago
Metadata Files
Readme License Citation Codemeta

README.md

datatools

datatools is a rich collection of command line programs targetting data conversion, cleanup and analysis directly from your favorite POSIX shell. It has proven useful for data collaberations where individual members of a project may prefer different toolsets in their analysis (e.g. Julia, R, Python) but want to work from a common baseline. It also has been used intensively for internal reporting from various Caltech Library metadata sources.

The tools fall into three broad categories

  • data transformation and conversion
  • shell scripting helpers
  • "string", a tool providing the common string operations missing from shell

See user manual for a complete list of the command line programs. The data transformation tools include support for formats such as Excel XML, csv, tab delimited files, json, yaml and toml.

Compiled versions of the datatools collection are provided for Linux (amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases.

Use "-help" option for a full list of options for each utility (e.g. csv2json -help).

Data transformation

The tooling around transformation includes data conversion. These include tools that work with CSV, tab delimited, JSON, TOML, YAML and Excel XML.

There is also tooling to change data shapes using JSON as the intermediate data format.

For the shell

Various utilities for simplifying work on the command line.

  • findfile - find files based on prefix, suffix or contained string
  • finddir - find directories based on prefix, suffix or contained string
  • mergepath - prefix, append, clip path variables
  • range - emit a range of integers (useful for numbered loops in Bash)
  • reldate - display a relative date in YYYY-MM-DD format
  • reltime - display a relative time in 24 hour notation, HH:MM:SS format
  • timefmt - format a time value based on Golang's time format language
  • urlparse - split a URL into parts

For strings

datatools provides the string command for working with text strings (limited to memory available). This is commonly needed when cleanup data for analysis. The string command was created for when the old Unix standbys- grep, awk, sed, tr are unwieldly or inconvient. string provides operations are common in most language like, trimming, spliting, and transforming letter case. The string command also makes it easy to join JSON string arrays into single a string using a delimiter or split a string into a JSON array based on a delimiter. The form of the command is string [OPTIONS] [ACTION] [ARCTION_PARAMETERS...]

shell string toupper "one two three"

Would yield "ONE TWO THREE".

Some of the features included

  • change case (upper, lower, title, English title)
  • length, position and count of substrings
  • has prefix, suffix or contains
  • trim prefix, suffix and cutsets
  • split and join to/from JSON string arrays

See string for full details

Installation

See INSTALL.md for details for installing pre-compiled versions of the programs.

Owner

  • Name: Caltech Library
  • Login: caltechlibrary
  • Kind: organization
  • Email: helpdesk@library.caltech.edu
  • Location: Pasadena, CA 91125

We manage the physical and digital holdings of the California Institute of Technology, provide services and training, and develop open-source software.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
type: software
title: "datatools"
abstract: "A set of command line tools for working with CSV, Excel
Workbooks, JSON and structured text documents."
authors:
  - family-names: Doiel
    given-names: R. S.
    orcid: ""

maintainers:
  - family-names: Doiel
    given-names: R. S.
    orcid: ""

repository-code: "https://github.com/caltechlibrary/datatools"
version: 1.3.4
license-url: "https://data.caltech.edu/license"
keywords: [ "csv", "excel", "sql", "json", "yaml", "xlsx", "golang", "bash" ]
date-released: 2025-05-15

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "type": "SoftwareSourceCode",
  "codeRepository": "https://github.com/caltechlibrary/datatools",
  "author": [
    {
      "id": "https://orcid.org/0000-0003-0900-6903",
      "type": "Person",
      "givenName": "R. S.",
      "familyName": "Doiel",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech Library"
      },
      "email": "rsdoiel@caltech.edu"
    }
  ],
  "maintainer": [
    {
      "id": "https://orcid.org/0000-0003-0900-6903",
      "type": "Person",
      "givenName": "R. S.",
      "familyName": "Doiel",
      "affiliation": {
        "@type": "Organization",
        "name": "Caltech Library"
      },
      "email": "rsdoiel@caltech.edu"
    }
  ],
  "dateCreated": "2017-02-06",
  "dateModified": "2025-05-15",
  "datePublished": "2025-05-15",
  "description": "A set of command line tools for working with CSV, Excel Workbooks, JSON and structured text documents.",
  "funder": {
    "@id": "https://ror.org/5dxps055",
    "@type": "Organization",
    "name": "Caltech Library"
  },
  "keywords": [
    "csv",
    "excel",
    "sql",
    "json",
    "yaml",
    "xlsx",
    "golang",
    "bash"
  ],
  "name": "datatools",
  "license": "https://data.caltech.edu/license",
  "operatingSystem": [
    "Linux",
    "Windows",
    "macOS"
  ],
  "programmingLanguage": [
    "Go >= 1.23.5"
  ],
  "softwareRequirements": [
    "Golang >= 1.23.5",
    "Pandoc >= 3.1"
  ],
  "version": "1.3.4",
  "developmentStatus": "active",
  "issueTracker": "https://github.com/caltechlibrary/datatools/issues",
  "downloadUrl": "https://github.com/caltechlibrary/datatools/releases/",
  "releaseNotes": "Added json2jsonl. It will render a JSON array document as JSON lines. A `-as-dataset` option is included. If\nan top level attribute name is provided and matches the object it will render the result as a dataset load\ncomponatible object. Of the top level attribute name is not found then that object is skipped with a message\nwritten to standard error."
}

GitHub Events

Total
  • Create event: 7
  • Release event: 6
  • Issues event: 6
  • Watch event: 1
  • Issue comment event: 6
  • Push event: 23
Last Year
  • Create event: 7
  • Release event: 6
  • Issues event: 6
  • Watch event: 1
  • Issue comment event: 6
  • Push event: 23

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 433
  • Total Committers: 4
  • Avg Commits per committer: 108.25
  • Development Distribution Score (DDS): 0.076
Past Year
  • Commits: 29
  • Committers: 2
  • Avg Commits per committer: 14.5
  • Development Distribution Score (DDS): 0.069
Top Committers
Name Email Commits
R. S. Doiel r****l@g****m 400
R. S. Doiel r****l@g****l 27
R. S. Doiel = 4
Tom Morrell t****l@c****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 20
  • Total pull requests: 3
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 22 hours
  • Total issue authors: 6
  • Total pull request authors: 1
  • Average comments per issue: 1.45
  • Average comments per pull request: 0.67
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: 5 days
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rsdoiel (12)
  • tmorrell (4)
  • jtheletter (1)
  • c33s (1)
  • broeder-j (1)
  • mhucka (1)
Pull Request Authors
  • tmorrell (3)
Top Labels
Issue Labels
enhancement (4) bug (4) wontfix (3) documentation (3) question (2) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 53
proxy.golang.org: github.com/caltechlibrary/datatools

datatools package is a collection of Go based command line tools for working with JSON content @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. datatools.go is a package for working with various types of data (e.g. CSV, XLSX, JSON) in support of the utilities included in the datatools.go package. Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. datatools package is a collection of Go based command line tools for working with JSON content @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. datatools package is a collection of Go based command line tools for working with JSON content @Author R. S. Doiel, <rsdoiel@caltech.edu> Copyright (c) 2021, Caltech All rights not granted herein are expressly reserved by Caltech. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

  • Versions: 53
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Stargazers count: 3.0%
Forks count: 3.8%
Dependent packages count: 4.6%
Average: 5.2%
Dependent repos count: 9.3%
Last synced: 4 months ago

Dependencies

go.mod go
  • github.com/BurntSushi/toml v0.3.1
  • github.com/caltechlibrary/cli v0.0.16
  • github.com/caltechlibrary/doitools v0.0.1
  • github.com/caltechlibrary/dotpath v0.0.2
  • github.com/caltechlibrary/tmplfn v0.0.21
  • github.com/dexyk/stringosim v0.0.0-20170922105913-9d0b3e91a842
  • github.com/ghodss/yaml v1.0.0
  • github.com/google/uuid v1.2.0
  • github.com/tealeg/xlsx v1.0.5
  • gopkg.in/yaml.v2 v2.4.0
go.sum go
  • github.com/BurntSushi/toml v0.3.1
  • github.com/caltechlibrary/cli v0.0.16
  • github.com/caltechlibrary/doitools v0.0.1
  • github.com/caltechlibrary/dotpath v0.0.2
  • github.com/caltechlibrary/tmplfn v0.0.21
  • github.com/dexyk/stringosim v0.0.0-20170922105913-9d0b3e91a842
  • github.com/ghodss/yaml v1.0.0
  • github.com/google/uuid v1.2.0
  • github.com/kr/pretty v0.1.0
  • github.com/kr/pty v1.1.1
  • github.com/kr/text v0.1.0
  • github.com/tealeg/xlsx v1.0.5
  • gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405
  • gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15
  • gopkg.in/yaml.v2 v2.4.0