Talisman

Talisman: a JavaScript archive of fuzzy matching, information retrieval and record linkage building blocks - Published in JOSS (2020)

https://github.com/yomguithereal/talisman

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage
Last synced: 4 months ago · JSON representation

Repository

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

Basic Info
Statistics
  • Stars: 716
  • Watchers: 19
  • Forks: 47
  • Open Issues: 86
  • Releases: 28
Topics
clustering deduplication fuzzy-matching information-retrieval machine-learning natural-language-processing record-linkage
Created almost 10 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Funding License

README.md

Build Status DOI

Talisman

Full documentation

Talisman is a JavaScript library collecting algorithms, functions and various building blocks for fuzzy matching, information retrieval and natural language processing.

Installation

You can install Talisman through npm:

bash npm install talisman

Documentation

The library's full documentation can be found here.

Bibliography

An extensive bibliography of the methods & functions implemented by the library can be found here.

Goals

  • :package: Modular: the library is completely modular. This means that if you only need to compute a levenshtein distance, you will only load the relevant code.
  • :bulb: Straightforward & simple: just want to compute a Jaccard index? No need to instantiate a class and use two methods to pass options and then finally succeed in getting the index. Just apply the jaccard function and get going.
  • :dango: Consistent API: the library's API is fully consistent and one should not struggle to understand how to apply two different distance metrics.
  • :postal_horn: Functional: except for cases where classes might be useful (clustering notably), Talisman only uses functions, consumes raw data and order functions' arguments to make partial application & currying etc. as easy as possible.
  • :zap: Performant: the library should be as performant as possible for a high-level programming language library.
  • :globewithmeridians: Cross-platform: the library is cross-platform and can be used both with Node.js and in the browser.

How to cite

Talisman has been published as a paper on the Journal Of Open Source Software (JOSS).

Contribution

Contributions are of course welcome :)

Be sure to lint & pass the unit tests before submitting your pull request.

```bash

Cloning the repo

git clone git@github.com:Yomguithereal/talisman.git cd talisman

Installing the deps

npm install

Running the tests

npm test

Linting the code

npm run lint ```

License

This project is available as open source under the terms of the MIT License.

Owner

  • Name: Guillaume Plique
  • Login: Yomguithereal
  • Kind: user
  • Location: France
  • Company: médialab - Sciences Po

JOSS Publication

Talisman: a JavaScript archive of fuzzy matching, information retrieval and record linkage building blocks
Published
November 10, 2020
Volume 5, Issue 55, Page 2405
Authors
Guillaume Plique ORCID
médialab, SciencesPo Paris
Editor
Kakia Chatsiou ORCID
Tags
javascript fuzzy matching natural language processing phonetic algorithms stemmers inflectors deduplication record linkage entity resolution similarity metrics information retrieval search engines tokenizers

GitHub Events

Total
  • Watch event: 14
  • Fork event: 1
Last Year
  • Watch event: 14
  • Fork event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 546
  • Total Committers: 11
  • Avg Commits per committer: 49.636
  • Development Distribution Score (DDS): 0.022
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Yomguithereal g****e@g****m 534
Paul Girard p****d@s****r 3
tuzepoito 4****o 1
kakiac k****c@g****m 1
jfseb j****c@g****m 1
cbbfcd 2****9@q****m 1
Philippe Rivière f****l@r****t 1
Michael Henretty m****y@g****m 1
Mark Vasilkov m****v@g****m 1
Emily Marigold Klassen f****l@g****m 1
Aaron Meese a****7@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 84
  • Total pull requests: 22
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 2 months
  • Total issue authors: 23
  • Total pull request authors: 16
  • Average comments per issue: 1.51
  • Average comments per pull request: 1.36
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 4
Past Year
  • Issues: 2
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 0.5
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • Yomguithereal (61)
  • giorgio79 (2)
  • crinklywrappr (1)
  • etler (1)
  • david-raine-mns (1)
  • markharwood (1)
  • emmanuelvlad (1)
  • vilchesalves (1)
  • tkg-codes (1)
  • dongyuwei (1)
  • harold (1)
  • chrislit (1)
  • jfseb (1)
  • sgu-fai (1)
  • victoryosiobe (1)
Pull Request Authors
  • knokknok (5)
  • dependabot[bot] (4)
  • ajmeese7 (2)
  • cbbfcd (1)
  • mikehenrty (1)
  • drzraf (1)
  • kakiac (1)
  • mvasilkov (1)
  • garyhtou (1)
  • boblannon (1)
  • ghost (1)
  • Fil (1)
  • forivall (1)
  • jfseb (1)
  • Realignist (1)
Top Labels
Issue Labels
enhancement (44) bug (13) performance (8) documentation (4) question (4) refactor (2)
Pull Request Labels
dependencies (4)

Packages

  • Total packages: 3
  • Total downloads:
    • npm 70,476 last-month
  • Total docker downloads: 39,224
  • Total dependent packages: 48
    (may contain duplicates)
  • Total dependent repositories: 20,812
    (may contain duplicates)
  • Total versions: 32
  • Total maintainers: 1
npmjs.org: talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

  • Versions: 30
  • Dependent Packages: 48
  • Dependent Repositories: 20,812
  • Downloads: 70,476 Last month
  • Docker Downloads: 39,224
Rankings
Dependent repos count: 0.2%
Dependent packages count: 0.6%
Docker downloads count: 0.6%
Downloads: 0.7%
Average: 1.5%
Stargazers count: 2.7%
Forks count: 4.1%
Maintainers (1)
Last synced: about 1 year ago
proxy.golang.org: github.com/Yomguithereal/talisman
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.7%
Dependent repos count: 5.8%
Last synced: 4 months ago
proxy.golang.org: github.com/yomguithereal/talisman
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.7%
Dependent repos count: 5.8%
Last synced: 4 months ago

Dependencies

package-lock.json npm
  • 429 dependencies
package.json npm
  • @yomguithereal/eslint-config ^4.0.0 development
  • babel-cli ^6.6.5 development
  • babel-core ^6.7.4 development
  • babel-plugin-transform-es2015-classes ^6.18.0 development
  • babel-plugin-transform-es2015-destructuring ^6.6.5 development
  • babel-preset-es2015 ^6.6.0 development
  • chai ^4.3.4 development
  • citation-js ^0.5.0 development
  • csv ^5.5.0 development
  • csv-parse ^4.15.4 development
  • eslint ^7.25.0 development
  • leven ^3.1.0 development
  • matcha ^0.7.0 development
  • mocha ^8.3.2 development
  • rimraf ^3.0.2 development
  • seedrandom ^3.0.5 development
  • html-entities ^1.4.0
  • lodash ^4.17.21
  • long ^4.0.0
  • mnemonist ^0.38.3
  • obliterator ^1.6.1
  • pandemonium ^2.0.0