gismo

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

https://github.com/balouf/gismo

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary

Keywords

nlp package python research
Last synced: 6 months ago · JSON representation ·

Repository

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Basic Info
  • Host: GitHub
  • Owner: balouf
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 9.21 MB
Statistics
  • Stars: 7
  • Watchers: 3
  • Forks: 1
  • Open Issues: 1
  • Releases: 14
Topics
nlp package python research
Created almost 6 years ago · Last pushed 8 months ago
Metadata Files
Readme Changelog Contributing Citation Authors

README.md

Gismo logo

A Generic Information Search... With a Mind of its Own!

Pypi badge Build badge Documentation badge codecov License: MIT

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

Gismo stands for Generic Information Search... with a Mind of its Own.

Features

Gismo combines three main ideas:

  • TF-IDTF: a symmetric version of the TF-IDF embedding.
  • DIteration: a fast, push-based, variant of the PageRank algorithm.
  • Fuzzy dendrogram: a variant of the Louvain clustering algorithm.

Quickstart

Install gismo:

console $ pip install gismo

Use gismo in a Python project:

```pycon

from gismo.common import toysourcedict from gismo import Corpus, Embedding, CountVectorizer, Gismo corpus = Corpus(toysourcedict, totext=lambda x: x['content']) embedding = Embedding(vectorizer=CountVectorizer(dtype=float)) embedding.fittransform(corpus) gismo = Gismo(corpus, embedding) gismo.rank("Mogwaï") gismo.getfeaturesby_rank() ['mogwaï', 'gizmo', 'chinese', 'in', 'demon', 'folklore', 'is'] ```

To get the hang of a typical Gismo workflow, you can check the Toy Example notebook. For more advanced uses, look at the other tutorials or directly the reference section.

Credits

Thomas Bonald, Anne Bouillard, Marc-Olivier Buob, Dohy Hong for their helpful contribution.

This package was created with Cookiecutter and the francois-durand/package_helper project template.

Coverage

codecov

Owner

  • Name: Fabien Mathieu
  • Login: balouf
  • Kind: user
  • Location: Paris, France
  • Company: LINCS

Researcher at Lincs

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Generic Information Search with a Mind of its Own (Gismo)
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - family-names: Mathieu
    given-names: Fabien
    email: fabien.mathieu@normalesup.org
url: "https://balouf.github.io/gismo/"

GitHub Events

Total
  • Release event: 1
  • Watch event: 2
  • Delete event: 8
  • Issue comment event: 8
  • Push event: 23
  • Pull request event: 12
  • Create event: 7
Last Year
  • Release event: 1
  • Watch event: 2
  • Delete event: 8
  • Issue comment event: 8
  • Push event: 23
  • Pull request event: 12
  • Create event: 7

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 225
  • Total Committers: 3
  • Avg Commits per committer: 75.0
  • Development Distribution Score (DDS): 0.56
Top Committers
Name Email Commits
fabien f****u@n****m 99
Fabien f****u@n****g 79
pyup-bot g****t@p****o 47
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 103
  • Average time to close issues: N/A
  • Average time to close pull requests: 11 days
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.88
  • Merged pull requests: 38
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 0
  • Pull requests: 25
  • Average time to close issues: N/A
  • Average time to close pull requests: 21 days
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • pyup-bot (1)
  • hadifar (1)
Pull Request Authors
  • pyup-bot (132)
  • balouf (3)
  • dependabot[bot] (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (2)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 403 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 2
  • Total versions: 16
  • Total maintainers: 1
pypi.org: gismo

GISMO is a NLP tool to rank and organize a corpus of documents according to a query.

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 403 Last month
Rankings
Dependent packages count: 10.0%
Dependent repos count: 11.6%
Downloads: 13.4%
Average: 15.1%
Forks count: 19.1%
Stargazers count: 21.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • IPython >=7.15.0
  • beautifulsoup4 >=4.9.0
  • bs4 >=0.0.1
  • dill >=0.3.1.1
  • gismo >=0.3.0
  • lxml >=4.5.0
  • nbsphinx >=0.6.1
  • pytest >=5.4.1
  • requests >=2.23.0
  • setuptools >=46.1.3
requirements.txt pypi
  • beautifulsoup4 >=4.9.0
  • bs4 >=0.0.1
  • dill >=0.3.1.1
  • gismo >=0.3.0
  • lxml >=4.5.0
  • nbsphinx >=0.6.1
  • numba >=0.49.0
  • numpy >=1.18.4
  • pytest >=5.4.1
  • requests >=2.23.0
  • scikit-learn >=0.23.1
  • scipy >=1.4.1
  • setuptools >=46.1.3
  • spacy >=2.3.4
requirements_dev.txt pypi
  • IPython >=7.15.0 development
  • Sphinx >=3.1.1 development
  • bump2version >=1.0.0 development
  • coverage >=5.1 development
  • flake8 >=3.8.3 development
  • nbsphinx >=0.7.1 development
  • pip >=20.2.4 development
  • pytest >=5.4.3 development
  • pytest-cov >=2.10.0 development
  • pytest-runner >=5.2 development
  • sphinx_rtd_theme >=0.5.0 development
  • tox >=3.15.2 development
  • twine >=3.4.1 development
  • watchdog >=0.10.2 development
  • wheel >=0.34.2 development
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/docs.yml actions
  • JamesIves/github-pages-deploy-action v4 composite
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/publish_on_pypi.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
setup.py pypi