frictionless

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data

https://github.com/frictionlessdata/frictionless-py

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data

Basic Info
Statistics
  • Stars: 768
  • Watchers: 29
  • Forks: 155
  • Open Issues: 221
  • Releases: 0
Created about 11 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Citation Authors

README.md

frictionless-py

Build Coverage Release Citation Codebase Support

markdown remark type=primary Migrating from an older version? Please read **[v5](blog/2022/08-22-frictionless-framework-v5.html)** announcement and migration guide.

Data management framework for Python that provides functionality to describe, extract, validate, and transform tabular data (DEVT Framework). It supports a great deal of data sources and formats, as well as provides popular platforms integrations. The framework is powered by the lightweight yet comprehensive Frictionless Standards.

Purpose

  • Describe your data: You can infer, edit and save metadata of your data tables. It's a first step for ensuring data quality and usability. Frictionless metadata includes general information about your data like textual description, as well as, field types and other tabular data details.
  • Extract your data: You can read your data using a unified tabular interface. Data quality and consistency are guaranteed by a schema. Frictionless supports various file schemes like HTTP, FTP, and S3 and data formats like CSV, XLS, JSON, SQL, and others.
  • Validate your data: You can validate data tables, resources, and datasets. Frictionless generates a unified validation report, as well as supports a lot of options to customize the validation process.
  • Transform your data: You can clean, reshape, and transfer your data tables and datasets. Frictionless provides a pipeline capability and a lower-level interface to work with the data.

Features

  • Open Source (MIT)
  • Powerful Python framework
  • Convenient command-line interface
  • Low memory consumption for data of any size
  • Reasonable performance on big data
  • Support for compressed files
  • Custom checks and formats
  • Fully pluggable architecture
  • More than 1000+ tests

Installation

bash $ pip install frictionless

Example

```bash $ frictionless validate data/invalid.csv [invalid] data/invalid.csv

row field code message


         3  blank-header      Header in field at position "3" is blank
         4  duplicate-header  Header "name" in field "4" is duplicated
2        3  missing-cell      Row "2" has a missing cell in field "field3"
2        4  missing-cell      Row "2" has a missing cell in field "name2"
3        3  missing-cell      Row "3" has a missing cell in field "field3"
3        4  missing-cell      Row "3" has a missing cell in field "name2"
4           blank-row         Row "4" is completely blank
5        5  extra-cell        Row "5" has an extra value in field  "5"

```

Documentation

Please visit our documentation portal: - https://framework.frictionlessdata.io

Owner

  • Name: Frictionless Data
  • Login: frictionlessdata
  • Kind: organization
  • Location: Internet

Lightweight specifications and software to shorten the path from data to insight. Code of Conduct: https://frictionlessdata.io/code-of-conduct/

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'frictionless: Python library for Data Packages'
message: >-
  To cite the Frictionless Python Framework in publications
  please use:
type: software
authors:
  - given-names: Evgeny
    family-names: Karev
    affiliation: Datist
    email: eskarev@gmail.com
  - given-names: Pierre
    family-names: Camilleri
    affiliation: multi.coop
    email: pierre.camilleri@multi.coop
  - given-names: Vitor
    family-names: Baptista
    affiliation: Fiquem Sabendo
  - given-names: Georgiana
    family-names: Bere
  - given-names: Andrea
    family-names: Borruso
    affiliation: OnData
  - given-names: Peter
    family-names: Desmet
    orcid: 'https://orcid.org/0000-0002-8442-8025'
    affiliation: Research Institute for Nature and Forest (INBO)
  - given-names: Shashi
    family-names: Gharti
    affiliation: Robust IT Concepts
  - given-names: Augusto
    family-names: Herrmann
    affiliation: >-
      Ministry of Management and Innovation in Public
      Services in Brazil
  - given-names: Adam
    family-names: Kariv
    affiliation: While True Industries
  - given-names: Chris
    family-names: Shaw
    affiliation: Democracy Club
  - given-names: Paul
    family-names: Walsh
    affiliation: LinkDigital
  - given-names: Lilly
    family-names: Winfree
    affiliation: Anaconda, Inc.
    orcid: 'https://orcid.org/0000-0001-7120-8536'
  - given-names: Edgar
    family-names: Zanella Alvarenga
    affiliation: Digi Sapiens
  - given-names: Jesper
    family-names: Zedlitz
    orcid: 'https://orcid.org/0000-0003-2664-5010'
  - name: Open Knowledge Foundation
    city: London
    country: GB
  - given-names: Sara
    family-names: Petti
    affiliation: Open Knowledge Foundation
    email: sara.petti@okfn.org
identifiers:
  - type: doi
    value: https://doi.org/10.5281/zenodo.4663759
repository: 'https://pypi.org/project/frictionless/'
repository-code: 'https://github.com/frictionlessdata/frictionless-py'
url: 'https://framework.frictionlessdata.io/'
abstract: >-
  Data management framework for Python that provides
  functionality to describe, extract, validate, and
  transform tabular data (DEVT Framework). It supports a
  great deal of data sources and formats, as well as
  provides popular platforms integrations. The framework is
  powered by the lightweight yet comprehensive Frictionless
  Data Package (https://datapackage.org/).
license: MIT

GitHub Events

Total
  • Create event: 11
  • Commit comment event: 1
  • Release event: 1
  • Issues event: 48
  • Watch event: 54
  • Delete event: 7
  • Issue comment event: 63
  • Push event: 54
  • Pull request review event: 21
  • Pull request review comment event: 18
  • Pull request event: 24
  • Fork event: 7
Last Year
  • Create event: 11
  • Commit comment event: 1
  • Release event: 1
  • Issues event: 48
  • Watch event: 54
  • Delete event: 7
  • Issue comment event: 63
  • Push event: 54
  • Pull request review event: 21
  • Pull request review comment event: 18
  • Pull request event: 24
  • Fork event: 7

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 9
  • Average time to close issues: 3 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 12
  • Total pull request authors: 5
  • Average comments per issue: 0.29
  • Average comments per pull request: 0.22
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 16
  • Pull requests: 9
  • Average time to close issues: 10 days
  • Average time to close pull requests: 4 days
  • Issue authors: 11
  • Pull request authors: 5
  • Average comments per issue: 0.31
  • Average comments per pull request: 0.22
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • megin1989 (16)
  • pierrecamilleri (11)
  • amelie-rondot (5)
  • jze (4)
  • richardt-engineb (3)
  • diego-oncoramedical (3)
  • pdelboca (2)
  • fjuniorr (2)
  • dafeder (2)
  • mingjiecn (2)
  • ebAbhay (1)
  • davidgasquez (1)
  • adrien-owkin (1)
  • samqi (1)
  • paulgirard (1)
Pull Request Authors
  • pierrecamilleri (14)
  • amelie-rondot (7)
  • dependabot[bot] (6)
  • jze (5)
  • roll (3)
  • afuetterer (3)
  • richardt-engineb (2)
  • barbuz (1)
  • hansendx (1)
  • sapetti9 (1)
  • lwjohnst86 (1)
  • areleu (1)
  • megin1989 (1)
  • pdelboca (1)
  • LincolnPuzey (1)
Top Labels
Issue Labels
bug (4) general (1) feature (1) comms (1)
Pull Request Labels
dependencies (6) work-in-progress (2) general (1) feature (1)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 6
  • Total dependent repositories: 2
  • Total versions: 73
conda-forge.org: frictionless
  • Versions: 73
  • Dependent Packages: 6
  • Dependent Repositories: 2
Rankings
Dependent packages count: 9.0%
Average: 14.6%
Dependent repos count: 20.2%
Last synced: 7 months ago

Dependencies

.github/workflows/general.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v3 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v2 composite
  • pypa/gh-action-pypi-publish release/v1 composite
  • softprops/action-gh-release v1 composite
  • stefanzweifel/git-auto-commit-action v4 composite
  • mysql 8 docker
  • postgres 12 docker
.github/workflows/project.yaml actions
  • leonsteinhaeuser/project-beta-automations v1.2.1 composite
Dockerfile docker
  • ubuntu 22.04 build