SIRITVIS

SIRITVIS: Social Interaction Research Insights Topic Visualisation - Published in JOSS (2024)

https://github.com/codeeagle22/siritvis

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 60% confidence
Biology Life Sciences - 40% confidence
Engineering Computer Science - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: CodeEagle22
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 21.5 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

SIRITVIS

Social Interaction Research Insights Topic Visualisation

Logo

📋 Summary

The SIRITVIS Python package helps you understand data from social media platforms like Instagram, Reddit or any other text data sources. It uses advanced techniques to find hidden patterns in large amounts of text data. SIRITVIS includes tools for gathering data, cleaning it, analyzing it, and visualizing the results. You can see where certain topics are being talked about on a map and how often they are mentioned.

SIRITVIS uses well-known methods from data science, machine learning, and mapping to ensure accurate results. It cleans the data thoroughly and uses reliable models to find meaningful topics. You can evaluate the quality of these topics using built-in tools. The package also includes visual tools to help you easily see the distribution of topics on a map.

A key feature of SIRITVIS is its ability to show where on a world map people are talking about different topics. It can categorize these places by the sentiment of the posts, such as positive, negative, or neutral. You can also search for specific keywords and see where they appear on the map.

SIRITVIS is helpful in various areas, like marketing, politics, and disaster response, by providing tools to analyze the spread of topics. It helps users understand their audience better and make informed decisions based on the analysis of social media data.

📝 How to cite

Narwade, S., Kant, G., Säfken, B., and Leiding, B. (2023), SIRITVIS: Social Interaction Research Insights Topic Visualisation. Journal of Open Source Software, https://joss.theoj.org/papers/b51be70e9634e45d8035ee20b6147d76.

Markdown: DOI

HTML: DOI badge

Advisory

  • Ensure Python version '>=3.10, <3.11'.
  • Utilize IDEs like Visual Studio or platforms like Google Colab for enhanced plot visualization.
  • Refer to the provided sample dataset for better comprehension.

💡 Features

  • Data Streaming 💾
  • Data Cleaning 🧹
  • Topic Model Training and Evaluation :dart:
  • Topic Visual Insights 🔍
  • Trending Topic Geo Visualisation 🌏

🛠 Installation

Attention: SIRITVIS is specifically tailored for operation on Python 3.10, and its visualization capabilities are optimized for Python notebooks. Extensive testing has been conducted under these specifications. For the best compatibility and performance, we advise setting up a fresh (conda) environment utilizing Python 3.10.10.

The package can be installed via pip:

bash pip install SIRITVIS

👩‍💻 Usage ([documentation])

Import Libraries

python from SIRITVIS import insta_streamer, reddit_streamer, cleaner, topic_model, topic_visualise, topic_mapper

Streaming Reddit Data

  • For authentication with the Reddit Streaming API, follow the steps outlined in this tutorial.

```python

Run the streaming process to retrieve raw data based on the specified keywords

clientid = "XXXXXXXXXX" clientsecret = "XXXXXXXXX" useragent = "XXXXXXXXXX" keywords = ['Specific','Keywords'] # default is None # Use multiple keywords for a more varied dataset during streaming data collection. savepath = '../folder/path/to/store/the/data/' rawdata = redditstreamer.RedditStreamer(clientid,clientsecret,useragent,savepath,keywords).run() ```

Streaming Instagram Data

  • For authentication with the Instagram Streaming API, sign up the page apify

```python

Run the streaming process to retrieve raw data based on the specified keywords

apitoken = 'apifyapiXXXXXXXXX' savepath = '../folder/path/to/store/the/data/' instagramusername = 'XXXXXXXXX' instagrampassword = 'XXXXXXXXX' hashtags = ['Specific','Keywords'] # default is ['instagram'] # Use multiple keywords for a more varied dataset during streaming data collection. limit = 20 # number of post captions to extract. default is 100 rawdata = instastreamer.InstagramStreamer(apitoken,savepath,instagramusername,instagrampassword,hashtags,limit).run() ```

Clean Streamed Data or Any External Text Data

```python

rawdata variable might also be used as loadpath attribute value

cleanerobj = cleaner.Cleaner(datasource='../folder/path/or/csv/file/path/to/load/data/')

cleanerobj.cleandata # get cleaned dataset without saving it

cleanedfile = cleanerobj.saving('../folder/path/to/store/the/cleaned/data/',datasavename='datasetfilename') ```

Train your a topic model on corpus of short texts

  • Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

```python

cleanedfile variable might also be used as datasetsource attribute value

model = topicmodel.TopicModeling(numtopics=10, datasetsource='../csv/file/path/to/load/data.csv', learningrate=0.001, batchsize=32, activation='softplus', numlayers=3, numneurons=100, dropout=0.2, numepochs=100, savemodel=False, modelpath=None, train_model='NeuralLDA',evaluation=['topicdiversity','invertedrbo','jaccardsimilarity'])

saved_model = model.run() ```

Topic Insights Visualisation

  • To investigate internal structure of topics and their relations to words and indicidual documents we recommend using pyLDAvis.
  • Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

```python

cleanedfile variable could also used as datasource attribute value

vismodel = topicvisualise.PyLDAvis(datasource='../csv/file/path/to/load/data.csv',numtopics=5,textcolumn='text') vismodel.visualize() ```

A graphical display of text data in which the importance of each word reflects its frequency or significance within the text. - Recommendation: Consider using a larger cleaned file with more data (at least 500 KB)

```python

The cleanedfile variable might also be used as datasource attribute value

please wait for a while for the word cloud to appear.

vismodel = topicvisualise.Wordcloud(datasource='../csv/file/path/to/load/data.csv',textcolumn='text',saveimage=False) vismodel.visualize() ```

Trending Topic Geo Visualisation

Topic Mapper excels at mapping the spatial distribution of Instagram posts and other text data globally. It accomplishes this by associating each location with its top trending topics and their frequencies, all using pre-trained topic models. Furthermore, it categorizes and color-codes these locations based on sentiment, providing users with a quick overview of sentiment distribution, including counts for positive, negative, and neutral posts.

Users can effortlessly explore specific keywords through a dropdown interface, allowing them to see how frequently these keywords appear on the world map. This feature simplifies the process of grasping and navigating research findings.

  • Notice: Reddit data cannot be visualized on the topic_mapper due to the absence of coordinate values.

```python

The cleanedfile variable might also be used as datasource attribute value

The savedmodel variable might also be used as the modelsource attribute value, for example, modelsource = savedmodel

datasource = '../file/path/of/data.csv' modelsource = '../file/path/of/model.pkl' topicmapper.TopicMapper(datasource, model_source) ```

📣 Community guidelines

We encourage and welcome contributions to the SIRITVIS package. If you have any questions, want to report bugs, or have ideas for new features, please file an issue.

Additionally, we appreciate pull requests via GitHub. There are several areas where potential contributions can make a significant impact, such as enhancing the quality of topics in topic models when dealing with noisy data from Reddit, Instagram or any external data sources, and improving the topic_mapper function to make it more interactive and independent from the notebook.

🖊️ Authors

  • Sagar Narwade
  • Gillian Kant
  • Benjamin Säfken
  • Benjamin Leiding

🎓 References

In our project, we utilised the "OCTIS" ^1^ tool, a fantastic library by Terragni et al., which provided essential functionalities. Additionally, we incorporated the "pyLDAvis" ^2^ by Ben Mabey Python library for interactive topic model visualisation, enriching our application with powerful data insights. The seamless integration of these resources significantly contributed to the project's success, offering an enhanced user experience and valuable research capabilities.

📜 License

This project is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). See the LICENSE file for details.

Owner

  • Login: CodeEagle22
  • Kind: user

JOSS Publication

SIRITVIS: Social Interaction Research Insights Topic Visualisation
Published
August 08, 2024
Volume 9, Issue 100, Page 6243
Authors
Sagar Narwade ORCID
Technische Universität Clausthal, Clausthal-Zellerfeld, Germany
Gillian Kant ORCID
Georg-August-Universität Göttingen, Göttingen, Germany
Benjamin Säfken ORCID
Technische Universität Clausthal, Clausthal-Zellerfeld, Germany
Benjamin Leiding
Technische Universität Clausthal, Clausthal-Zellerfeld, Germany
Editor
Olivia Guest ORCID
Tags
Text analysis tool Reddit Instagram Topic Modelling Geospatial mapping Natural Language Processing Machine Learning

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 192
  • Total Committers: 2
  • Avg Commits per committer: 96.0
  • Development Distribution Score (DDS): 0.016
Past Year
  • Commits: 6
  • Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Sagar Narwade 1****2 189
Gillian Kant 5****g 3

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • n3mo (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 4 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 11
  • Total maintainers: 1
pypi.org: siritvis

SIRITVIS: Social Media Interaction Reaction Insights Topic Visualisation

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 4 Last month
Rankings
Dependent packages count: 7.4%
Average: 38.3%
Dependent repos count: 69.1%
Maintainers (1)
Last synced: 4 months ago