https://github.com/barahona-research-group/lgde

Python code for the paper "LGDE: Local Graph-based Dictionary Expansion" by Juni Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft and Mauricio Barahona: https://doi.org/10.1162/coli_a_00562

https://github.com/barahona-research-group/lgde

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Python code for the paper "LGDE: Local Graph-based Dictionary Expansion" by Juni Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft and Mauricio Barahona: https://doi.org/10.1162/coli_a_00562

Basic Info
  • Host: GitHub
  • Owner: barahona-research-group
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 68.8 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created almost 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

DOI

Local Graph-based Dictionary Expansion (LGDE)

Python code for the paper "LGDE: Local Graph-based Dictionary Expansion" by Juni Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft and Mauricio Barahona: https://doi.org/10.1162/colia00562

Installation

Clone the repository and open the folder in your terminal.

bash $ git clone https://github.com/barahona-research-group/LGDE.git $ cd LGDE/

Then, to install the package with pip, execute the following command:

bash $ pip install .

Using the code

To use LGDE we require a list of seed keywords seed_dict, a list of all candidate keywords word_list (for example from a domain-specific corpus) and their word embeddings word_vecs. We can then initialise a new LGDE object and expand the seed dictionary.

```python from lgde import LGDE

expand seed dictionary using LGDE method

lgde = LGDE(seeddict,wordlist,word_vecs) lgde.expand(k=5,t=1)

the discovered keywords are stored in a new attribute

print(lgde.discovereddict) ```

To discover new keywords we first construct a semantic similarity graph using CkNN [1] and then compute the semantic community of each seed keyword using fast local community detection with Severability [2]. The parameter $k$ determines the graph density and $t$ the size of the semantic communities. See documentation for more details and additional functionality for dictionary evaluation and plotting.

Experiments

:warning: Content warning: Our experiments include text and word phrases derived from Reddit, Gab and 4chan posts with potentially triggering content including anti-Semitism, racism, homophobia, misogyny and other forms of violent or hateful language.

  • Our application to a corpus of hate speech-related communication on Reddit and Gab can be found in the experiments/redgab directory.
  • Our application to a benchmark 20 Newsgroups dataset can be found in experiments/20newsgroups directory.
  • Our additional experiment of LGDE applied to a corpus of conspiracy-related communication on 4chan can be found in the experiments/4chan directory.

Contributors

  • Juni Schindler, GitHub: juni-schindler <https://github.com/juni-schindler>

We always look out for individuals that are interested in contributing to this open-source project. Even if you are just using LGDE and made some minor updates, we would be interested in your input.

Cite

Please cite our paper if you use our code or data in your own work:

@article{schindlerLGDELocalGraphbased2025, author = {Schindler, Juni and Jha, Sneha and Zhang, Xixuan and Buehling, Kilian and Heft, Annett and Barahona, Mauricio}, title = {LGDE: Local Graph-based Dictionary Expansion}, publisher = {Computational Linguistics}, year = {2025}, doi = {10.1162/coli_a_00562}, pages = {1--32}, issn = {0891-2017} }

References

[1] T. Berry and T. Sauer, 'Consistent manifold representation for topological data analysis', Foundations of Data Science, vol. 1, no. 1, p. 1-38, Feb. 2019, doi: 10.3934/fods.2019001.

[2] Y. Yu William, D. Jean-Charles, S. Yaliraki and M. Barahona, 'Severability of mesoscale components and local time scales in dynamical networks', arXiv: 2006.02972, Jun. 2020, doi: 10.48550/arXiv.2006.02972

Licence

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Owner

  • Name: Barahona Research - Applied Math - Imperial
  • Login: barahona-research-group
  • Kind: organization
  • Email: m.barahona@imperial.ac.uk

Research codes developed in the Barahona research group - Department of Mathematics - Imperial College London

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Push event: 4
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Push event: 4
  • Create event: 1