instancelib
A generic dataset interface for Machine Learning models
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary
Repository
A generic dataset interface for Machine Learning models
Basic Info
- Host: GitHub
- Owner: mpbron
- License: lgpl-3.0
- Language: Python
- Default Branch: master
- Size: 10.9 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
A generic interface for datasets and Machine Learning models
instancelib provides a generic architecture for datasets and machine learning algorithms such as classification algorithms.
© Michiel Bron, 2021
Quick tour
Load dataset: Load the dataset in an environment ```python import instancelib as il textenv = il.readexceldataset("./datasets/testdataset.xlsx", datacols=["fulltext"], label_cols=["label"])
ds = textenv.dataset # A dict-like interface for instances
labels = textenv.labels # An object that stores all labels
labelset = labels.labelset # All labels that can be given to instances
ins = ds[20] # Get instance with identifier key 20
insdata = ins.data # Get the raw data for instance 20
insvector = ins.vector # Get the vector representation for 20 if any
inslabels = labels.getlabels(ins) ```
Dataset manipulation: Divide the dataset in a train and test set ```python train, test = textenv.traintestsplit(ds, trainsize=0.70)
print(20 in train) # May be true or false, because of random sampling ```
Train a model: ```python from sklearn.pipeline import Pipeline from sklearn.naivebayes import MultinomialNB from sklearn.featureextraction.text import TfidfTransformer, CountVectorizer
pipeline = Pipeline([ ('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB()), ])
model = il.SkLearnDataClassifier.build(pipeline, textenv) model.fitprovider(train, labels) predictions = model.predict(test) ```
Installation
See installation.md for an extended installation guide.
| Method | Instructions |
|--------|--------------|
| pip | Install from PyPI via pip install instancelib. |
| Local | Clone this repository and install via pip install -e . or locally run python setup.py install.
Documentation
Full documentation of the latest version is provided at https://instancelib.readthedocs.org.
Example usage
See usage.py to see an example of how the package can be used.
Releases
instancelib is officially released through PyPI.
See CHANGELOG.md for a full overview of the changes for each version.
Citation
bibtex
@misc{instancelib,
title = {Python package instancelib},
author = {Michiel Bron},
howpublished = {\url{https://github.com/mpbron/instancelib}},
year = {2021}
}
Library usage
This library is used in the following projects: - python-allib. A typed Active Learning framework for Python for both Classification and Technology-Assisted Review systems. - text_explainability. A generic explainability architecture for explaining text machine learning models - text_sensitivity. Sensitivity testing (fairness & robustness) for text machine learning models.
Maintenance
Contributors
- Michiel Bron (
@mpbron)
Owner
- Name: Michiel Bron
- Login: mpbron
- Kind: user
- Repositories: 2
- Profile: https://github.com/mpbron
Citation (CITATION.cff)
cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Bron
given-names: Michiel Pieter
orcid: https://orcid.org/0000-0002-4823-6085
title: mpbron/instancelib: v0.5.0
version: v0.5.0
date-released: 2023-09-01
GitHub Events
Total
- Watch event: 1
- Push event: 1
- Create event: 1
Last Year
- Watch event: 1
- Push event: 1
- Create event: 1
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Michiel Bron | m****n@u****l | 197 |
| Michiel Bron | m****n@u****l | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 267 last-month
- Total dependent packages: 4
- Total dependent repositories: 1
- Total versions: 59
- Total maintainers: 2
pypi.org: instancelib
A generic interface for datasets and Machine Learning models
- Documentation: https://instancelib.readthedocs.io/
- License: GNU LGPL v3
-
Latest release: 0.5.2
published about 1 year ago
Rankings
Maintainers (2)
Dependencies
- gensim *
- h5py *
- numpy *
- openpyxl *
- pandas *
- scikit-learn *
- sphinx *
- sphinx-autodoc-typehints *
- sphinx-rtd-theme *
- sphinx-toolbox *
- sphinxcontrib-apidoc *
- xlrd *
- gensim *
- h5py *
- numpy *
- openpyxl *
- pandas *
- scikit-learn *
- tables *
- tqdm *
- xlrd *
- h5py *
- more-itertools *
- numpy *
- openpyxl *
- pandas *
- scikit-learn *
- tqdm *
- xlrd *