Tabbed: A Python package for reading variably structured text files at scale

Tabbed: A Python package for reading variably structured text files at scale - Published in JOSS (2025)

https://github.com/mscaudill/tabbed

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: zenodo.org
✓
Committers with academic emails
1 of 2 committers (50.0%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

csv csv-files csv-parser delimited-data parsing reader text-reader txt

Last synced: 3 months ago · JSON representation

Repository

A Python package for reading variably structured text files at scale

Basic Info

Host: GitHub
Owner: mscaudill
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage: https://mscaudill.github.io/tabbed/
Size: 2.47 MB

Statistics

Stars: 1
Watchers: 1
Forks: 2
Open Issues: 0
Releases: 5

Topics

csv csv-files csv-parser delimited-data parsing reader text-reader txt

Created about 1 year ago · Last pushed 3 months ago

Metadata Files

Readme Contributing License Code of conduct

A Python package for reading variably structured text files at scale

PyPI - License

Tabbed is a Python library for reading variably structured text files. It automatically deduces data start locations, data types and performs iterative and value-based conditional reading of data rows.

Key Features

Structural Inference:
A common variant of the standard text file is one that contains metadata prior to a header or data section. Tabbed can locate the metadata, header and data locations in a file.
Type inference:
Tabbed can parse int, float, complex, time, date and datetime instances at high-speed via a polling strategy.
Conditional Reading:
Tabbed can filter rows during reading with equality, membership, rich comparison, regular expression matching and custom callables via simple keyword arguments.
Partial and Iterative Reading:
Tabbed supports reading of large text files that consumes only as much memory as you choose.

Usage

Below is a sample file with a Metadata section and Header using the tab character as the delimiter.

annotations.txt ```AsciiDoc Experiment ID Experiment Animal ID Animal Researcher Test Directory path

Number Start Time End Time Time From Start Channel Annotation 0 02/09/22 09:17:38.948 02/09/22 09:17:38.948 0.0000 ALL Started Recording 1 02/09/22 09:37:00.000 02/09/22 09:37:00.000 1161.0520 ALL start 2 02/09/22 09:37:00.000 02/09/22 09:37:08.784 1161.0520 ALL exploring 3 02/09/22 09:37:08.784 02/09/22 09:37:13.897 1169.8360 ALL grooming 4 02/09/22 09:37:13.897 02/09/22 09:38:01.262 1174.9490 ALL exploring 5 02/09/22 09:38:01.262 02/09/22 09:38:07.909 1222.3140 ALL grooming 6 02/09/22 09:38:07.909 02/09/22 09:38:20.258 1228.9610 ALL exploring 7 02/09/22 09:38:20.258 02/09/22 09:38:25.435 1241.3100 ALL grooming 8 02/09/22 09:38:25.435 02/09/22 09:40:07.055 1246.4870 ALL exploring 9 02/09/22 09:40:07.055 02/09/22 09:40:22.334 1348.1070 ALL grooming 10 02/09/22 09:40:22.334 02/09/22 09:41:36.664 1363.3860 ALL exploring ```

Dialect and Type Inference

Tabbed can detect the dialect via clevercsv and infer the data types.

```python from tabbed.reading import Reader from tabbed.samples import paths

infile = open(paths.annotations, 'r') reader = Reader(infile) dialect = reader.sniffer.dialect types, _ = reader.sniffer.types(poll=10)

print(dialect) # a clevercsv SimpleDialect print('---') print(types) ```

Output ```

SimpleDialect('\t', '"', None)

[, , , , , ] ```

Metadata and Header detection

Tabbed can automatically locate the metadata, header and data rows.

python print(reader.header) print('---') print(reader.metadata())

Output ``` Header(line=6, names=['Number', 'StartTime', 'EndTime', 'TimeFromStart', 'Channel', 'Annotation'],

string='Number\tStart Time\tEnd Time\tTime From Start\tChannel\tAnnotation')

MetaData(lines=(0, 6), string='Experiment ID\tExperiment\nAnimal ID\tAnimal\nResearcher\tTest\nDirectory path\t\n\n') ```

Filtered Reading with Tabs

Tabbed supports row and column filtering with equality, membership, rich comparison and regular expression matching. Its also fully iterative allowing users to choose the amount of memory to consume during file reading.

```python from itertools import chain

tab rows whose Start_Time is between 9:38 and 9:40 and set reader to read

only the Number and Start_Time columns

reader.tab( StartTime='>= 2/09/2022 9:38:00 and <2/09/2022 9:40:00', columns=['Number', 'StartTime'] )

read the data to an iterator reading only 2 rows at a time

gen = reader.read(chunksize=2)

convert to an in-memory list

data = list(chain.from_iterable(gen)) print(data)

close the reader when done or open under context-management

reader.close() ```

Output {'Number': 5, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 1, 262000)} {'Number': 6, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 7, 909000)} {'Number': 7, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 20, 258000)} {'Number': 8, 'Start_Time': datetime.datetime(2022, 2, 9, 9, 38, 25, 435000)}

Documentation

The official documentation is hosted on github.io.

Dependencies

Tabbed depends on the excellent clevercsv package for dialect detection. The rest is pure Python.

Installation

Tabbed is hosted on pypi and can be installed with pip into a virtual environment.

bash pip install tabbed

To get a development version of Tabbed from source start by cloning the repository

bash git clone git@github.com:mscaudill/tabbed.git

Go to the directory you just cloned and create an editable install with pip. bash pip install -e .[dev]

Contributing

We're excited you want to contribute! Please check out our Contribution guide.

Acknowledgements

We are grateful for the support of the Ting Tsung and Wei Fong Chao Foundation and the Jan and Dan Duncan Neurological Research Institute at Texas Children's that generously supports Tabbed.

Owner

Name: Matt Caudill
Login: mscaudill
Kind: user
Location: Houston, TX
Company: Baylor College of Medicine & Texas Childerns NRI

Repositories: 3
Profile: https://github.com/mscaudill

JOSS Publication

Tabbed: A Python package for reading variably structured text files at scale

Published

November 11, 2025

DOI

10.21105/joss.08964

Volume 10, Issue 115, Page 8964

Authors

Matthew S. Caudill

Department of Neuroscience, Baylor College of Medicine, Houston, TX, United States of America, Jan and Dan Duncan Neurological Research Institute at Texas Children's Hospital, Houston, TX, United States of America

Editor

Neea Rusch

GitHub Events

Total

Create event: 4
Issues event: 20
Release event: 1
Issue comment event: 23
Push event: 127
Pull request event: 2
Fork event: 1

Last Year

Create event: 4
Issues event: 20
Release event: 1
Issue comment event: 23
Push event: 127
Pull request event: 2
Fork event: 1

Committers

Last synced: 5 months ago

All Time

Total Commits: 287
Total Committers: 2
Avg Commits per committer: 143.5
Development Distribution Score (DDS): 0.038

Past Year

Commits: 287
Committers: 2
Avg Commits per committer: 143.5
Development Distribution Score (DDS): 0.038

Top Committers

Name	Email	Commits
mscaudill	m**l@g**m	276
Brad Sheppard	b**d@b**u	11

Committer Domains (Top 20 + Academic)

brads-mbp-2.dyndns.rice.edu: 1

Issues and Pull Requests

Last synced: 4 months ago

All Time

Total issues: 13
Total pull requests: 1
Average time to close issues: 5 days
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.92
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 13
Pull requests: 1
Average time to close issues: 5 days
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.92
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jolars (10)
ymahlau (3)

Pull Request Authors

BradShepps (1)

Top Labels

Issue Labels

bug (5) enhancement (3) question (1) documentation (1)

Pull Request Labels

Dependencies

pyproject.toml pypi

ipython *
matplotlib *
notebook *
numpy *
psutil *
requests *
scikit-learn *
scipy *
wget *

Tabbed: A Python package for reading variably structured text files at scale

Science Score: 95.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

A Python package for reading variably structured text files at scale

Key Features

Usage

SimpleDialect('\t', '"', None)

string='Number\tStart Time\tEnd Time\tTime From Start\tChannel\tAnnotation')

tab rows whose Start_Time is between 9:38 and 9:40 and set reader to read

only the Number and Start_Time columns

read the data to an iterator reading only 2 rows at a time

convert to an in-memory list

close the reader when done or open under context-management

Documentation

Dependencies

Installation

Contributing

Acknowledgements

Owner

JOSS Publication

Tabbed: A Python package for reading variably structured text files at scale

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies