SIRITVIS

SIRITVIS: Social Interaction Research Insights Topic Visualisation - Published in JOSS (2024)

https://github.com/codeeagle22/siritvis

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 60% confidence

Biology Life Sciences - 40% confidence

Engineering Computer Science - 40% confidence

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: CodeEagle22
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 21.5 MB

Statistics

Stars: 4
Watchers: 2
Forks: 0
Open Issues: 1
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

SIRITVIS

Social Interaction Research Insights Topic Visualisation

Logo

📋 Summary

The SIRITVIS Python package helps you understand data from social media platforms like Instagram, Reddit or any other text data sources. It uses advanced techniques to find hidden patterns in large amounts of text data. SIRITVIS includes tools for gathering data, cleaning it, analyzing it, and visualizing the results. You can see where certain topics are being talked about on a map and how often they are mentioned.

SIRITVIS uses well-known methods from data science, machine learning, and mapping to ensure accurate results. It cleans the data thoroughly and uses reliable models to find meaningful topics. You can evaluate the quality of these topics using built-in tools. The package also includes visual tools to help you easily see the distribution of topics on a map.

A key feature of SIRITVIS is its ability to show where on a world map people are talking about different topics. It can categorize these places by the sentiment of the posts, such as positive, negative, or neutral. You can also search for specific keywords and see where they appear on the map.

SIRITVIS is helpful in various areas, like marketing, politics, and disaster response, by providing tools to analyze the spread of topics. It helps users understand their audience better and make informed decisions based on the analysis of social media data.

📝 How to cite

Narwade, S., Kant, G., Säfken, B., and Leiding, B. (2023), SIRITVIS: Social Interaction Research Insights Topic Visualisation. Journal of Open Source Software, https://joss.theoj.org/papers/b51be70e9634e45d8035ee20b6147d76.

Markdown:

HTML:

Advisory

Ensure Python version '>=3.10, <3.11'.
Utilize IDEs like Visual Studio or platforms like Google Colab for enhanced plot visualization.
Refer to the provided sample dataset for better comprehension.

💡 Features

Data Streaming 💾
Data Cleaning 🧹
Topic Model Training and Evaluation :dart:
Topic Visual Insights 🔍
Trending Topic Geo Visualisation 🌏

🛠 Installation

Attention: SIRITVIS is specifically tailored for operation on Python 3.10, and its visualization capabilities are optimized for Python notebooks. Extensive testing has been conducted under these specifications. For the best compatibility and performance, we advise setting up a fresh (conda) environment utilizing Python 3.10.10.

The package can be installed via pip:

bash pip install SIRITVIS

👩‍💻 Usage ([documentation])

Import Libraries

python from SIRITVIS import insta_streamer, reddit_streamer, cleaner, topic_model, topic_visualise, topic_mapper

Streaming Reddit Data

For authentication with the Reddit Streaming API, follow the steps outlined in this tutorial.

```python

Run the streaming process to retrieve raw data based on the specified keywords

clientid = "XXXXXXXXXX" clientsecret = "XXXXXXXXX" useragent = "XXXXXXXXXX" keywords = ['Specific','Keywords'] # default is None # Use multiple keywords for a more varied dataset during streaming data collection. savepath = '../folder/path/to/store/the/data/' rawdata = redditstreamer.RedditStreamer(clientid,clientsecret,useragent,savepath,keywords).run() ```

Streaming Instagram Data

For authentication with the Instagram Streaming API, sign up the page apify

```python

Run the streaming process to retrieve raw data based on the specified keywords

apitoken = 'apifyapiXXXXXXXXX' savepath = '../folder/path/to/store/the/data/' instagramusername = 'XXXXXXXXX' instagrampassword = 'XXXXXXXXX' hashtags = ['Specific','Keywords'] # default is ['instagram'] # Use multiple keywords for a more varied dataset during streaming data collection. limit = 20 # number of post captions to extract. default is 100 rawdata = instastreamer.InstagramStreamer(apitoken,savepath,instagramusername,instagrampassword,hashtags,limit).run() ```

Clean Streamed Data or Any External Text Data

```python

rawdata variable might also be used as loadpath attribute value

cleanerobj = cleaner.Cleaner(datasource='../folder/path/or/csv/file/path/to/load/data/')

cleanerobj.cleandata # get cleaned dataset without saving it

cleanedfile = cleanerobj.saving('../folder/path/to/store/the/cleaned/data/',datasavename='datasetfilename') ```

Train your a topic model on corpus of short texts

Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

```python

cleanedfile variable might also be used as datasetsource attribute value

model = topicmodel.TopicModeling(numtopics=10, datasetsource='../csv/file/path/to/load/data.csv', learningrate=0.001, batchsize=32, activation='softplus', numlayers=3, numneurons=100, dropout=0.2, numepochs=100, savemodel=False, modelpath=None, train_model='NeuralLDA',evaluation=['topicdiversity','invertedrbo','jaccardsimilarity'])

saved_model = model.run() ```

Topic Insights Visualisation

To investigate internal structure of topics and their relations to words and indicidual documents we recommend using pyLDAvis.
Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

```python

cleanedfile variable could also used as datasource attribute value

vismodel = topicvisualise.PyLDAvis(datasource='../csv/file/path/to/load/data.csv',numtopics=5,textcolumn='text') vismodel.visualize() ```

A graphical display of text data in which the importance of each word reflects its frequency or significance within the text. - Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

```python

The cleanedfile variable might also be used as datasource attribute value

please wait for a while for the word cloud to appear.

vismodel = topicvisualise.Wordcloud(datasource='../csv/file/path/to/load/data.csv',textcolumn='text',saveimage=False) vismodel.visualize() ```

The cleanedfile variable might also be used as datasource attribute value

The savedmodel variable might also be used as the modelsource attribute value, for example, modelsource = savedmodel

datasource = '../file/path/of/data.csv' modelsource = '../file/path/of/model.pkl' topicmapper.TopicMapper(datasource, model_source) ```

📣 Community guidelines

We encourage and welcome contributions to the SIRITVIS package. If you have any questions, want to report bugs, or have ideas for new features, please file an issue.

Additionally, we appreciate pull requests via GitHub. There are several areas where potential contributions can make a significant impact, such as enhancing the quality of topics in topic models when dealing with noisy data from Reddit, Instagram or any external data sources, and improving the topic_mapper function to make it more interactive and independent from the notebook.

🖊️ Authors

Sagar Narwade
Gillian Kant
Benjamin Säfken
Benjamin Leiding

🎓 References

In our project, we utilised the "OCTIS" ^1^ tool, a fantastic library by Terragni et al., which provided essential functionalities. Additionally, we incorporated the "pyLDAvis" ^2^ by Ben Mabey Python library for interactive topic model visualisation, enriching our application with powerful data insights. The seamless integration of these resources significantly contributed to the project's success, offering an enhanced user experience and valuable research capabilities.

📜 License

This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See the LICENSE file for details.

Owner

Login: CodeEagle22
Kind: user

Repositories: 1
Profile: https://github.com/CodeEagle22

JOSS Publication

SIRITVIS: Social Interaction Research Insights Topic Visualisation

Published

August 08, 2024

DOI

10.21105/joss.06243

Volume 9, Issue 100, Page 6243

Authors

Sagar Narwade

Technische Universität Clausthal, Clausthal-Zellerfeld, Germany

Gillian Kant

Georg-August-Universität Göttingen, Göttingen, Germany

Benjamin Säfken

Technische Universität Clausthal, Clausthal-Zellerfeld, Germany

Benjamin Leiding
Technische Universität Clausthal, Clausthal-Zellerfeld, Germany

Editor

Olivia Guest

GitHub Events

Total

Last Year

Committers

Last synced: 7 months ago

All Time

Total Commits: 192
Total Committers: 2
Avg Commits per committer: 96.0
Development Distribution Score (DDS): 0.016

Past Year

Commits: 6
Committers: 1
Avg Commits per committer: 6.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Sagar Narwade	1****2	189
Gillian Kant	5****g	3

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

n3mo (1)

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 4 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 11
Total maintainers: 1

pypi.org: siritvis

SIRITVIS: Social Media Interaction Reaction Insights Topic Visualisation

Homepage: https://github.com/CodeEagle22/SIRITVIS
Documentation: https://siritvis.readthedocs.io/
License: MIT License
Latest release: 2.0.0
published over 1 year ago

Versions: 11
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 4 Last month

Rankings

Dependent packages count: 7.4%

Average: 38.3%

Dependent repos count: 69.1%

Maintainers (1)

codeeagle22

Last synced: 6 months ago

SIRITVIS

Science Score: 93.0%

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

SIRITVIS

📋 Summary

📝 How to cite

Advisory

💡 Features

🛠 Installation

👩‍💻 Usage ([documentation])

Import Libraries

Streaming Reddit Data

Run the streaming process to retrieve raw data based on the specified keywords

Streaming Instagram Data

Run the streaming process to retrieve raw data based on the specified keywords

Clean Streamed Data or Any External Text Data

rawdata variable might also be used as loadpath attribute value

cleanerobj.cleandata # get cleaned dataset without saving it

Train your a topic model on corpus of short texts

cleanedfile variable might also be used as datasetsource attribute value

Topic Insights Visualisation

cleanedfile variable could also used as datasource attribute value

The cleanedfile variable might also be used as datasource attribute value

please wait for a while for the word cloud to appear.

Trending Topic Geo Visualisation

The cleanedfile variable might also be used as datasource attribute value

The savedmodel variable might also be used as the modelsource attribute value, for example, modelsource = savedmodel

📣 Community guidelines

🖊️ Authors

🎓 References

📜 License

Owner

JOSS Publication

SIRITVIS: Social Interaction Research Insights Topic Visualisation

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: siritvis

Rankings

Maintainers (1)