https://github.com/barahona-research-group/lgde
Python code for the paper "LGDE: Local Graph-based Dictionary Expansion" by Juni Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft and Mauricio Barahona: https://doi.org/10.1162/coli_a_00562
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Repository
Python code for the paper "LGDE: Local Graph-based Dictionary Expansion" by Juni Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft and Mauricio Barahona: https://doi.org/10.1162/coli_a_00562
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Local Graph-based Dictionary Expansion (LGDE)
Python code for the paper "LGDE: Local Graph-based Dictionary Expansion" by Juni Schindler, Sneha Jha, Xixuan Zhang, Kilian Buehling, Annett Heft and Mauricio Barahona: https://doi.org/10.1162/colia00562
Installation
Clone the repository and open the folder in your terminal.
bash
$ git clone https://github.com/barahona-research-group/LGDE.git
$ cd LGDE/
Then, to install the package with pip, execute the following command:
bash
$ pip install .
Using the code
To use LGDE we require a list of seed keywords seed_dict, a list of all candidate keywords word_list (for example from a domain-specific corpus) and their word embeddings word_vecs. We can then initialise a new LGDE object and expand the seed dictionary.
```python from lgde import LGDE
expand seed dictionary using LGDE method
lgde = LGDE(seeddict,wordlist,word_vecs) lgde.expand(k=5,t=1)
the discovered keywords are stored in a new attribute
print(lgde.discovereddict) ```
To discover new keywords we first construct a semantic similarity graph using CkNN [1] and then compute the semantic community of each seed keyword using fast local community detection with Severability [2]. The parameter $k$ determines the graph density and $t$ the size of the semantic communities. See documentation for more details and additional functionality for dictionary evaluation and plotting.
Experiments
:warning: Content warning: Our experiments include text and word phrases derived from Reddit, Gab and 4chan posts with potentially triggering content including anti-Semitism, racism, homophobia, misogyny and other forms of violent or hateful language.
- Our application to a corpus of hate speech-related communication on Reddit and Gab can be found in the
experiments/redgabdirectory. - Our application to a benchmark 20 Newsgroups dataset can be found in
experiments/20newsgroupsdirectory. - Our additional experiment of LGDE applied to a corpus of conspiracy-related communication on 4chan can be found in the
experiments/4chandirectory.
Contributors
- Juni Schindler, GitHub:
juni-schindler <https://github.com/juni-schindler>
We always look out for individuals that are interested in contributing to this open-source project. Even if you are just using LGDE and made some minor updates, we would be interested in your input.
Cite
Please cite our paper if you use our code or data in your own work:
@article{schindlerLGDELocalGraphbased2025,
author = {Schindler, Juni and Jha, Sneha and Zhang, Xixuan and Buehling, Kilian and Heft, Annett and Barahona, Mauricio},
title = {LGDE: Local Graph-based Dictionary Expansion},
publisher = {Computational Linguistics},
year = {2025},
doi = {10.1162/coli_a_00562},
pages = {1--32},
issn = {0891-2017}
}
References
[1] T. Berry and T. Sauer, 'Consistent manifold representation for topological data analysis', Foundations of Data Science, vol. 1, no. 1, p. 1-38, Feb. 2019, doi: 10.3934/fods.2019001.
[2] Y. Yu William, D. Jean-Charles, S. Yaliraki and M. Barahona, 'Severability of mesoscale components and local time scales in dynamical networks', arXiv: 2006.02972, Jun. 2020, doi: 10.48550/arXiv.2006.02972
Licence
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Owner
- Name: Barahona Research - Applied Math - Imperial
- Login: barahona-research-group
- Kind: organization
- Email: m.barahona@imperial.ac.uk
- Website: https://scholar.google.co.uk/citations?user=weulBoAAAAAJ&hl=en
- Repositories: 9
- Profile: https://github.com/barahona-research-group
Research codes developed in the Barahona research group - Department of Mathematics - Imperial College London
GitHub Events
Total
- Release event: 1
- Watch event: 1
- Push event: 4
- Create event: 1
Last Year
- Release event: 1
- Watch event: 1
- Push event: 4
- Create event: 1