stripnet
STriP Net: Semantic Similarity of Scientific Papers (S3P) Network
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.4%) to scientific vocabulary
Keywords
Repository
STriP Net: Semantic Similarity of Scientific Papers (S3P) Network
Basic Info
Statistics
- Stars: 86
- Watchers: 3
- Forks: 8
- Open Issues: 1
- Releases: 7
Topics
Metadata Files
README.md
💡 STriP Net: Semantic Similarity of Scientific Papers (S3P) Network
Do you read a lot of Scientific Papers? Have you ever wondered what are the overarching themes in the papers that you've read and how all the papers are semantically connected to one another? Look no further!
Leverage the power of NLP Topic Modeling, Semantic Similarity, and Network analysis to study the themes and semantic relations within a corpus of research papers.
✅ Generate STriP Network on your own collection of research papers with just three lines of code!
✅ Interactive plots to quickly identify research themes and most important papers
✅ This repo was hacked together over the weekend of New Year 2022. This is only the initial release, with lots of work planned.
💪 Please leave a ⭐ to let me know that STriP Net has been helpful to you so that I can dedicate more of my time working on it.
⚡ Install
Install with conda
This is perhaps the most hasslefree option for installing stripnet with conda.
sh
conda install -c conda-forge stripnet
Install with pip
If you want to install stripnet using pip, it is highly recommend to install in a conda environment.
- Create a conda environment (here we choose the environment name as
stripnet) and activate it.
sh
conda create -n stripnet python=3.8 jupyterlab -y
conda activate stripnet
- Pip install this library
sh
pip install stripnet
🔥🚀 Generate the STriP network analysis on default settings
- STriP can essentially run on any pandas dataframe column containing text.
- However, the pretrained model is hardcoded (for now), so you'll see the best results while running it on a column that combines the
titleandabstractof papers separated by[SEP]keyword. Please see below
```
Load some data
import pandas as pd data = pd.read_csv('data.csv')
Keep only title and abstract columns
data = data[['title', 'abstract']]
Concat the title and abstract columns separated with [SEP] keyword
data['text'] = data['title'] + '[SEP]' + data['abstract'] ```
```
Instantiate the StripNet
from stripnet import StripNet stripnet = StripNet()
Run the StripNet pipeline
stripnet.fit_transform(data['text']) ```
- If everything ran well, your browser should open a new window with the network graph similar to below. The graph is fully interactive! Have fun playing around by hovering over the nodes and moving them around!
- If you are not satisfied with the topics you get, just restart the kernel and rerun it. The Topic Modeling framework has some level of randomness so the topics will change slightly with every run.
- You can also tweak the paremeters of the various models, please look out for the full documentation for the details!

🏅 Find the most important paper
- After you fit the model using the above steps, you can plot the most important papers with one line of code
- The plot is fully interactive too! Hovering over any bar shows the relevant information of the paper.
stripnet.most_important_docs()

🛠️ Common Issues
- If your StripNet graph is just one big ball of moving fireflies, try these steps
- Check the value of
thresholdcurrently used by stripnetcurrent_threshold = stripnet.threshold print(current_threshold) - Increase the value of
thresholdin steps of 0.05 and try again until you see a good looking network. Remember the max value of threshold is 1! If you're threshold is already 0.95 then try increasing in steps of 0.01 instead.stripnet.fit_transform(data['text'], threshold=current_threshold+0.05) - If you're dataset is small (<500 rows) and the number of topics generated seems too less
- Try tweaking the value of
min_topic_sizeto a value lower than the default value of 10 until you get topics that look reasonable to youstripnet.fit_transform(data['text'], min_topic_size=5) - After the above two steps, if your graph looks messy, try removing isolated nodes (those nodes that don't have any connections)
stripnet.fit_transform(data['text'], remove_isolated_nodes=True) - In practice, you might have to tweak all three at the same time!
stripnet.fit_transform(data['text'], threshold=current_threshold+0.05, min_topic_size=5, remove_isolated_nodes=True)
- Check the value of
I'm testing out the network on a variety of data to pick better default values. Do let me know if some specific values worked the best for you!
🎓 Citation
To cite STriP Net in your work, please use the following bibtex reference:
@software{marie_stephen_leo_2022_5823822,
author = {Marie Stephen Leo},
title = {STriP Net: Semantic Similarity of Scientific Papers (S3P) Network},
month = jan,
year = 2022,
publisher = {Zenodo},
version = {v0.0.5.zenodo},
doi = {10.5281/zenodo.5823822},
url = {https://doi.org/10.5281/zenodo.5823822}
}
🤩 Acknowledgements
STriP Net stands on the shoulder of giants and several prior work. The most notable being 1. Sentence Transformers [Paper] [Code] 2. AllenAI Specter pretrained model [Paper] [Code] 3. BERTopic [Code] 4. Networkx [Code] 5. Pyvis [Code]
🙏 Buy me a coffee
If this work helped you in any way, please consider the following way to give me feedback so I can spend more time on this project 1. ⭐ this repository 2. ❤️ the Huggingface space 3. 👏 the Medium post (Coming End Jan 2022!) 4. ☕ Buy me a Coffee!
Owner
- Name: Marie Stephen Leo
- Login: stephenleo
- Kind: user
- Location: Singapore
- Website: https://stephen-leo.medium.com/
- Repositories: 10
- Profile: https://github.com/stephenleo
Head of Data | Towards Data Science | Towards AI contributor
GitHub Events
Total
Last Year
Committers
Last synced: almost 3 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Marie Stephen Leo | 3****o@u****m | 15 |
| stephenleo | s****7@g****m | 11 |
| Sugato Ray | s****y@u****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 4
- Average time to close issues: 11 days
- Average time to close pull requests: 8 minutes
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 4.25
- Average comments per pull request: 0.5
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sugatoray (2)
- kpkhushi15 (1)
- doubianimehdi (1)
Pull Request Authors
- stephenleo (2)
- sugatoray (1)
- sourcery-ai[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 22 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 7
- Total maintainers: 1
pypi.org: stripnet
STriP Net: Semantic Similarity of Scientific Papers (S3P) Network
- Homepage: https://github.com/stephenleo/stripnet
- Documentation: https://stripnet.readthedocs.io/
- License: apache-2.0
-
Latest release: 0.0.7
published almost 4 years ago
Rankings
Maintainers (1)
conda-forge.org: stripnet
Leverage the power of NLP Topic Modeling, Semantic Similarity and Network analysis to study the themes and semantical relations within a corpus of research papers. PyPI: [https://pypi.org/project/stripnet/](https://pypi.org/project/stripnet/)
- Homepage: https://github.com/stephenleo/stripnet
- License: Apache-2.0
-
Latest release: 0.0.7
published almost 4 years ago
Rankings
Dependencies
- bertopic *
- ipywidgets *
- networkx *
- numpy *
- pandas *
- plotly *
- pyvis *
- scikit_learn *
- sentence_transformers *
- setuptools *