SIRITVIS
SIRITVIS: Social Interaction Research Insights Topic Visualisation - Published in JOSS (2024)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Basic Info
- Host: GitHub
- Owner: CodeEagle22
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 21.5 MB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
SIRITVIS
Social Interaction Research Insights Topic Visualisation

📋 Summary
The SIRITVIS Python package helps you understand data from social media platforms like Instagram, Reddit or any other text data sources. It uses advanced techniques to find hidden patterns in large amounts of text data. SIRITVIS includes tools for gathering data, cleaning it, analyzing it, and visualizing the results. You can see where certain topics are being talked about on a map and how often they are mentioned.
SIRITVIS uses well-known methods from data science, machine learning, and mapping to ensure accurate results. It cleans the data thoroughly and uses reliable models to find meaningful topics. You can evaluate the quality of these topics using built-in tools. The package also includes visual tools to help you easily see the distribution of topics on a map.
A key feature of SIRITVIS is its ability to show where on a world map people are talking about different topics. It can categorize these places by the sentiment of the posts, such as positive, negative, or neutral. You can also search for specific keywords and see where they appear on the map.
SIRITVIS is helpful in various areas, like marketing, politics, and disaster response, by providing tools to analyze the spread of topics. It helps users understand their audience better and make informed decisions based on the analysis of social media data.
📝 How to cite
Narwade, S., Kant, G., Säfken, B., and Leiding, B. (2023), SIRITVIS: Social Interaction Research Insights Topic Visualisation. Journal of Open Source Software, https://joss.theoj.org/papers/b51be70e9634e45d8035ee20b6147d76.
Advisory
- Ensure Python version '>=3.10, <3.11'.
- Utilize IDEs like Visual Studio or platforms like Google Colab for enhanced plot visualization.
- Refer to the provided sample dataset for better comprehension.
💡 Features
- Data Streaming 💾
- Data Cleaning 🧹
- Topic Model Training and Evaluation :dart:
- Topic Visual Insights 🔍
- Trending Topic Geo Visualisation 🌏
🛠 Installation
Attention: SIRITVIS is specifically tailored for operation on Python 3.10, and its visualization capabilities are optimized for Python notebooks. Extensive testing has been conducted under these specifications. For the best compatibility and performance, we advise setting up a fresh (conda) environment utilizing Python 3.10.10.
The package can be installed via pip:
bash
pip install SIRITVIS
👩💻 Usage ([documentation])
Import Libraries
python
from SIRITVIS import insta_streamer, reddit_streamer, cleaner, topic_model, topic_visualise, topic_mapper
Streaming Reddit Data
- For authentication with the Reddit Streaming API, follow the steps outlined in this tutorial.
```python
Run the streaming process to retrieve raw data based on the specified keywords
clientid = "XXXXXXXXXX" clientsecret = "XXXXXXXXX" useragent = "XXXXXXXXXX" keywords = ['Specific','Keywords'] # default is None # Use multiple keywords for a more varied dataset during streaming data collection. savepath = '../folder/path/to/store/the/data/' rawdata = redditstreamer.RedditStreamer(clientid,clientsecret,useragent,savepath,keywords).run() ```
Streaming Instagram Data
- For authentication with the Instagram Streaming API, sign up the page apify
```python
Run the streaming process to retrieve raw data based on the specified keywords
apitoken = 'apifyapiXXXXXXXXX' savepath = '../folder/path/to/store/the/data/' instagramusername = 'XXXXXXXXX' instagrampassword = 'XXXXXXXXX' hashtags = ['Specific','Keywords'] # default is ['instagram'] # Use multiple keywords for a more varied dataset during streaming data collection. limit = 20 # number of post captions to extract. default is 100 rawdata = instastreamer.InstagramStreamer(apitoken,savepath,instagramusername,instagrampassword,hashtags,limit).run() ```
Clean Streamed Data or Any External Text Data
```python
rawdata variable might also be used as loadpath attribute value
cleanerobj = cleaner.Cleaner(datasource='../folder/path/or/csv/file/path/to/load/data/')
cleanerobj.cleandata # get cleaned dataset without saving it
cleanedfile = cleanerobj.saving('../folder/path/to/store/the/cleaned/data/',datasavename='datasetfilename') ```
Train your a topic model on corpus of short texts
- Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)
```python
cleanedfile variable might also be used as datasetsource attribute value
model = topicmodel.TopicModeling(numtopics=10, datasetsource='../csv/file/path/to/load/data.csv', learningrate=0.001, batchsize=32, activation='softplus', numlayers=3, numneurons=100, dropout=0.2, numepochs=100, savemodel=False, modelpath=None, train_model='NeuralLDA',evaluation=['topicdiversity','invertedrbo','jaccardsimilarity'])
saved_model = model.run() ```
Topic Insights Visualisation
- To investigate internal structure of topics and their relations to words and indicidual documents we recommend using pyLDAvis.
- Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)
```python
cleanedfile variable could also used as datasource attribute value
vismodel = topicvisualise.PyLDAvis(datasource='../csv/file/path/to/load/data.csv',numtopics=5,textcolumn='text') vismodel.visualize() ```
A graphical display of text data in which the importance of each word reflects its frequency or significance within the text. - Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)
```python
The cleanedfile variable might also be used as datasource attribute value
please wait for a while for the word cloud to appear.
vismodel = topicvisualise.Wordcloud(datasource='../csv/file/path/to/load/data.csv',textcolumn='text',saveimage=False) vismodel.visualize() ```
Trending Topic Geo Visualisation
Topic Mapper excels at mapping the spatial distribution of Instagram posts and other text data globally. It accomplishes this by associating each location with its top trending topics and their frequencies, all using pre-trained topic models. Furthermore, it categorizes and color-codes these locations based on sentiment, providing users with a quick overview of sentiment distribution, including counts for positive, negative, and neutral posts.
Users can effortlessly explore specific keywords through a dropdown interface, allowing them to see how frequently these keywords appear on the world map. This feature simplifies the process of grasping and navigating research findings.
- Notice: Reddit data cannot be visualized on the topic_mapper due to the absence of coordinate values.
```python
The cleanedfile variable might also be used as datasource attribute value
The savedmodel variable might also be used as the modelsource attribute value, for example, modelsource = savedmodel
datasource = '../file/path/of/data.csv' modelsource = '../file/path/of/model.pkl' topicmapper.TopicMapper(datasource, model_source) ```
📣 Community guidelines
We encourage and welcome contributions to the SIRITVIS package. If you have any questions, want to report bugs, or have ideas for new features, please file an issue.
Additionally, we appreciate pull requests via GitHub. There are several areas where potential contributions can make a significant impact, such as enhancing the quality of topics in topic models when dealing with noisy data from Reddit, Instagram or any external data sources, and improving the topic_mapper function to make it more interactive and independent from the notebook.
🖊️ Authors
- Sagar Narwade
- Gillian Kant
- Benjamin Säfken
- Benjamin Leiding
🎓 References
In our project, we utilised the "OCTIS" ^1^ tool, a fantastic library by Terragni et al., which provided essential functionalities. Additionally, we incorporated the "pyLDAvis" ^2^ by Ben Mabey Python library for interactive topic model visualisation, enriching our application with powerful data insights. The seamless integration of these resources significantly contributed to the project's success, offering an enhanced user experience and valuable research capabilities.
📜 License
This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See the LICENSE file for details.
Owner
- Login: CodeEagle22
- Kind: user
- Repositories: 1
- Profile: https://github.com/CodeEagle22
JOSS Publication
SIRITVIS: Social Interaction Research Insights Topic Visualisation
Authors
Technische Universität Clausthal, Clausthal-Zellerfeld, Germany
Tags
Text analysis tool Reddit Instagram Topic Modelling Geospatial mapping Natural Language Processing Machine LearningGitHub Events
Total
Last Year
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Sagar Narwade | 1****2 | 189 |
| Gillian Kant | 5****g | 3 |
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- n3mo (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 4 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 11
- Total maintainers: 1
pypi.org: siritvis
SIRITVIS: Social Media Interaction Reaction Insights Topic Visualisation
- Homepage: https://github.com/CodeEagle22/SIRITVIS
- Documentation: https://siritvis.readthedocs.io/
- License: MIT License
-
Latest release: 2.0.0
published over 1 year ago
