ConTEXT Explorer

ConTEXT Explorer: a web-based text analysis tool for exploring and visualizing concepts across time - Published in JOSS (2021)

https://github.com/alicia-ziying-yang/context-explorer

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    2 of 6 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

data-analysis data-science data-visualization information-retrieval nature-language-process webapp
Last synced: 6 months ago · JSON representation

Repository

ConTEXT Explorer is an open Web-based system for exploring and visualizing concepts (combinations of occurring words and phrases) over time in the text documents.

Basic Info
  • Host: GitHub
  • Owner: alicia-ziying-yang
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 44.5 MB
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 3
  • Open Issues: 5
  • Releases: 1
Topics
data-analysis data-science data-visualization information-retrieval nature-language-process webapp
Created almost 5 years ago · Last pushed about 4 years ago
Metadata Files
Readme License

README.md

ConTEXT-Explorer

License Build Status DOI DOI

ConTEXT Explorer is an open Web-based system for exploring and visualizing concepts (combinations of co-occurring words and phrases) over time in the text documents. ConTEXT Explorer is designed to lower the barriers to applying information retrieval and machine learning for text analysis, including: - preprocessing text with sentencizer and tokenizer in a Spacy pipline; - building Gensim word2vec model for discovering similar terms, which can be used to expand queries; - indexing the cleaned text, and creating a search engine using Whoosh, which allows to rank sentences using the Okapi BM25F function; - visualizing results across time in interactive plots using Plotly.

It is designed to be user-friendly, enabling researchers to make sense of their data without technical knowledge. Users may:

  • upload (and save) a text corpus, and customize search fields;
  • add terms to the query using input from the word2vec model, sentence ranking, or manually;
  • check term frequencies across time;
  • group terms with "ALL" or "ANY" operator, and compound the groups to form more complex queries;
  • view results across time for each query (using raw counts or proportion of relevant documents);
  • save and reload results for further analysis;
  • download a subset of a corpus filtered by user-defined terms.

More details can be found in the user manual below.

How to install

Get the app

Clone this repo to your local environment:

git clone https://github.com/alicia-ziying-yang/conTEXT-explorer.git

Set up environment

ConTEXT Explorer is developed using Plotly Dash in Python. We are using Python 3.7.5 and all required packages listed in requirement.txt. To help you install this application correctly, we provide a conda environment file ce-env.yml for you to set up a virtual environment. Simply enter the folder:

cd conTEXT-explorer

and run:

conda env create -f ce-env.yml

To activate this environment, use:

conda activate ce-env

Install the application

Then, ConTEXT Explorer can be easily installed by:

pip install . 

Run the app

  • If you want to run ConTEXT Explorer on your local computer, comment the code for ubuntu server, and uncomment the last line in app.py:

    # app.runserver(debug=False, host="0.0.0.0") # ubuntu server
    app.run
    server(debug=False, port="8010") # local test

To start the application, use:

  start-ce

or

  python app.py

The IP address with app access will be displayed in the output.

  • If you want to run ConTEXT Explorer on an ubuntu server, use:

    nohup python app.py &

How to use

A sample corpus with a saved analysis is preset in this app. Feel free to explore the app features using this example. Please check more details in the manual below.

Click here to view the paged PDF version

alt text

Contact and Contribution

This application is designed and developed by Ziying (Alicia) Yang, Gosia Mikolajczak, and Andrew Turpin from the University of Melbourne in Australia.

If you encounter any errors while using the app, have suggestions for improvement, or want to contribute to this project by adding new functions or features, please submit an issue here and pull requests.

JOSS Publication

ConTEXT Explorer: a web-based text analysis tool for exploring and visualizing concepts across time
Published
December 09, 2021
Volume 6, Issue 68, Page 3347
Authors
Ziying Yang ORCID
University of Melbourne
Gosia Mikolajczak ORCID
University of Melbourne
Andrew Turpin
University of Melbourne
Editor
Fabian-Robert Stöter ORCID
Tags
Dash Data Analysis Data Visulization

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 152
  • Total Committers: 6
  • Avg Commits per committer: 25.333
  • Development Distribution Score (DDS): 0.138
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Ziying Alicia Yang 8****g 131
GosiaMi 4****i 11
Alicia Yang z****3@4****l 6
Daniel S. Katz d****z@i****g 2
Fabian-Robert Stöter m****l@f****m 1
Andrew Turpin a****n@u****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 18
  • Total pull requests: 3
  • Average time to close issues: 6 days
  • Average time to close pull requests: about 18 hours
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 2.28
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • baileythegreen (8)
  • alicia-ziying-yang (5)
  • faroit (4)
  • sara-02 (1)
Pull Request Authors
  • baileythegreen (1)
  • faroit (1)
  • danielskatz (1)
Top Labels
Issue Labels
enhancement (7)
Pull Request Labels

Dependencies

requirements.txt pypi
  • Whoosh ==2.7.4
  • backports.csv ==1.0.7
  • dash ==1.14.0
  • dash-bootstrap-components ==0.10.5
  • dash-core-components ==1.10.2
  • dash-daq ==0.5.0
  • dash-html-components ==1.0.3
  • dash-table ==4.9.0
  • dash-uploader ==0.4.1
  • gensim ==3.8.1
  • gunicorn >=19.9.0
  • nltk ==3.4.5
  • numpy >=1.16.2
  • pandas ==1.2.3
  • pytest ==5.3.5
  • selenium ==3.141.0
  • spacy ==2.2.4
  • urllib3 ==1.25.8
  • wordcloud ==1.8.1