instancelib

A generic dataset interface for Machine Learning models

https://github.com/mpbron/instancelib

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.8%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

A generic dataset interface for Machine Learning models

Basic Info
  • Host: GitHub
  • Owner: mpbron
  • License: lgpl-3.0
  • Language: Python
  • Default Branch: master
  • Size: 10.9 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License Citation Zenodo

README.md

A generic interface for datasets and Machine Learning models

PyPI Python_version License DOI


instancelib provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.

© Michiel Bron, 2021

Quick tour

Load dataset: Load the dataset in an environment ```python import instancelib as il textenv = il.readexceldataset("./datasets/testdataset.xlsx", datacols=["fulltext"], label_cols=["label"])

ds = textenv.dataset # A dict-like interface for instances labels = textenv.labels # An object that stores all labels labelset = labels.labelset # All labels that can be given to instances

ins = ds[20] # Get instance with identifier key 20 insdata = ins.data # Get the raw data for instance 20 insvector = ins.vector # Get the vector representation for 20 if any

inslabels = labels.getlabels(ins) ```

Dataset manipulation: Divide the dataset in a train and test set ```python train, test = textenv.traintestsplit(ds, trainsize=0.70)

print(20 in train) # May be true or false, because of random sampling ```

Train a model: ```python from sklearn.pipeline import Pipeline from sklearn.naivebayes import MultinomialNB from sklearn.featureextraction.text import TfidfTransformer, CountVectorizer

pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ])

model = il.SkLearnDataClassifier.build(pipeline, textenv) model.fitprovider(train, labels) predictions = model.predict(test) ```

Installation

See installation.md for an extended installation guide.

| Method | Instructions | |--------|--------------| | pip | Install from PyPI via pip install instancelib. | | Local | Clone this repository and install via pip install -e . or locally run python setup.py install.

Documentation

Full documentation of the latest version is provided at https://instancelib.readthedocs.org.

Example usage

See usage.py to see an example of how the package can be used.

Releases

instancelib is officially released through PyPI.

See CHANGELOG.md for a full overview of the changes for each version.

Citation

bibtex @misc{instancelib, title = {Python package instancelib}, author = {Michiel Bron}, howpublished = {\url{https://github.com/mpbron/instancelib}}, year = {2021} }

Library usage

This library is used in the following projects: - python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems. - text_explainability. A generic explainability architecture for explaining text machine learning models - text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.

Maintenance

Contributors

Owner

  • Name: Michiel Bron
  • Login: mpbron
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Bron
    given-names: Michiel Pieter
    orcid: https://orcid.org/0000-0002-4823-6085
title: mpbron/instancelib: v0.5.0
version: v0.5.0
date-released: 2023-09-01

GitHub Events

Total
  • Watch event: 1
  • Push event: 1
  • Create event: 1
Last Year
  • Watch event: 1
  • Push event: 1
  • Create event: 1

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 198
  • Total Committers: 2
  • Avg Commits per committer: 99.0
  • Development Distribution Score (DDS): 0.005
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michiel Bron m****n@u****l 197
Michiel Bron m****n@u****l 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 267 last-month
  • Total dependent packages: 4
  • Total dependent repositories: 1
  • Total versions: 59
  • Total maintainers: 2
pypi.org: instancelib

A generic interface for datasets and Machine Learning models

  • Versions: 59
  • Dependent Packages: 4
  • Dependent Repositories: 1
  • Downloads: 267 Last month
Rankings
Dependent packages count: 1.9%
Average: 15.4%
Dependent repos count: 21.6%
Downloads: 22.7%
Maintainers (2)
Last synced: 8 months ago

Dependencies

docs/requirements.txt pypi
  • gensim *
  • h5py *
  • numpy *
  • openpyxl *
  • pandas *
  • scikit-learn *
  • sphinx *
  • sphinx-autodoc-typehints *
  • sphinx-rtd-theme *
  • sphinx-toolbox *
  • sphinxcontrib-apidoc *
  • xlrd *
requirements.txt pypi
  • gensim *
  • h5py *
  • numpy *
  • openpyxl *
  • pandas *
  • scikit-learn *
  • tables *
  • tqdm *
  • xlrd *
setup.py pypi
  • h5py *
  • more-itertools *
  • numpy *
  • openpyxl *
  • pandas *
  • scikit-learn *
  • tqdm *
  • xlrd *