quotation-tool

A tool to extract quotes and other useful information from a text.

https://github.com/australian-text-analytics-platform/quotation-tool

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: plos.org
  • Academic email domains
  • Institutional organization owner
    Organization australian-text-analytics-platform has institutional domain (atap.edu.au)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A tool to extract quotes and other useful information from a text.

Basic Info
  • Host: GitHub
  • Owner: Australian-Text-Analytics-Platform
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 17.5 MB
Statistics
  • Stars: 9
  • Watchers: 3
  • Forks: 6
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Quotation Tool

Abstract: This QuotationTool can be used to extract quotes from a text. In addition to extracting the quotes, the tool also provides information about who the speakers are, the location of the quotes (and the speakers) within the text, the identified named entities, etc., which can be useful for your text analysis.

Setup

This tool has been designed for use with minimal setup from users. You are able to run it in the cloud and any dependencies with other packages will be installed for you automatically. In order to launch and use the tool, you just need to click the below icon.

Binder

Note: CILogon authentication is required. You can use your institutional, Google or Microsoft account to login. If you have trouble authenticating, please refer to the CILogon troubleshooting guide.

If you do not have access to any of the above accounts, you can use the below link to access the tool (this is a free Binder version, limited to 2GB memory only).

Binder

It may take a few minutes for Binder to launch the notebook and install the dependencies for the tool. Please be patient.

User Guide

For instructions on how to use the Quotation Tool, please refer to the Quotation Tool User Guide.

Load the data

Using this tool, you can extract quotes directly from a text file (or a number of text files). Alternatively, you can also extract quotes from a text column inside your excel spreadsheet. You just need to upload your files (.txt, .xlsx or .csv) and access them via the Notebook.

Note: If you have a large number of text files (more than 10MB in total), we suggest you compress (zip) them and upload the zip file instead. If you need assistance on how to compress your file, please check the user guide.

Extract and Display the Quotes

Once your files have been uploaded, you can use the QuotationTool to extract quotes from the text. The quotes, along with their metadata, will be stored in a table format inside a pandas dataframe.

Additionally, using the interactive tool, you can display the text, along with the extracted quotes, speakers and named entities, on the Notebook for further analysis.

Reference

This code has been adapted (with permission) from the GenderGapTracker GitHub page and modified to run on a Jupyter Notebook. The quotation tool’s accuracy rate is evaluated in this article.

Citation

If you find the Quotation Tool useful in your research, please cite the following:

Jufri, Sony & Sun, Chao (2022). Quotation Tool. v1.0. Australian Text Analytics Platform. Software. https://github.com/Australian-Text-Analytics-Platform/quotation-tool

Owner

  • Name: Australian-Text-Analytics-Platform
  • Login: Australian-Text-Analytics-Platform
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Quotation Tool
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Sony
    family-names: Jufri
    email: sony.jufri@sydney.edu.au
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
  - given-names: Chao
    family-names: Sun
    email: chao.sun@sydney.edu.au
    affiliation: >-
      Sydney Informatics Hub, a core research facility of
      the University of Sydney
repository-code: >-
  https://github.com/Australian-Text-Analytics-Platform/quotation-tool
abstract: >-
  This Quotation Tool is developed for algorithmically
  extracting quotes from plain texts. In addition to quote
  extraction function, the tool execute embeds Name Entity
  Recognition (NER) on the extracted speakers and quotations
  information, in order to generate more insights for the
  text analysis task. As an output of collaborative work
  between the Sydney Informatics Hub (SIH) and the Sydney
  Corpus Lab (SCL), the quotation tool is part of the
  Australian Text Analytics Platform program and the HASS
  Research Data Commons and Indigenous Research Capability
  Program.
keywords:
  - quote
  - quotation tool
  - named entity recognition
  - extract quote
  - quote extraction
license: Apache-2.0
version: '1.0'
date-released: '2022-11-29'

GitHub Events

Total
  • Watch event: 3
  • Push event: 5
  • Fork event: 1
Last Year
  • Watch event: 3
  • Push event: 5
  • Fork event: 1

Dependencies

GenderGapTracker/api/requirements.txt pypi
  • fastapi >=0.92.0,<0.93.0
  • gunicorn >=20.1.0,<20.2.0
  • pandas >=1.5.3,<1.6.0
  • pydantic <2.0.0
  • pymongo <4.0.0
  • requests >=2.28.1
  • uvicorn >=0.20.0,<0.21.0
GenderGapTracker/nlp/english/requirements.txt pypi
  • dash ==2.8.1
  • dash_auth ==1.4.1
  • dash_bootstrap_components ==1.2.1
  • neuralcoref ==4.0
  • pandas >=1.1.5
  • pymongo >=3.10.0,<4.0.0
  • requests >=2.27.1
  • spacy ==2.1.3
  • statsmodels >=0.12.2
GenderGapTracker/nlp/english/topic_model/corpus_analysis/requirements.txt pypi
  • corpus_toolkit ==0.29
  • matplotlib ==3.1.0
  • pandas ==1.0.3
  • pymongo ==3.8.0
  • pyspark ==2.4.0
  • seaborn ==0.10.0
  • spacy ==2.3.2
  • tqdm ==4.32.1
  • wordcloud ==1.6.0
GenderGapTracker/nlp/english/topic_model/requirements.txt pypi
  • matplotlib ==3.3.4
  • pandas ==1.1.5
  • py4j ==0.10.7
  • pymongo ==3.11.3
  • pyspark ==2.4.5
  • scipy ==1.5.4
  • seaborn ==0.11.1
  • tqdm ==4.59.0
  • wordcloud ==1.8.1
GenderGapTracker/nlp/french/requirements.txt pypi
  • Levenshtein >=0.16.0
  • coreferee ==1.3.1
  • pandas >=1.5.3,<1.6.0
  • pydantic <2.0.0
  • pymongo >=3.12.0,<4.0.0
  • requests >=2.28.1
  • spacy ==3.2.5
  • statsmodels >=0.12.2
GenderGapTracker/scraper/requirements.txt pypi
  • Pillow >=3.3.0
  • PyYAML >=3.11
  • beautifulsoup4 >=4.4.1
  • cssselect >=0.9.2
  • feedfinder2 >=0.0.4
  • feedparser >=5.2.1
  • jieba3k >=0.35.1
  • lxml >=3.6.0
  • nltk >=3.2.1
  • python-dateutil >=2.5.3
  • requests >=2.10.0
  • tinysegmenter ==0.3
  • tldextract >=2.0.1
GenderGapTracker/statistics/requirements.txt pypi
  • pandas >=1.1.5
  • pymongo >=3.10.0,<4.0.0
  • requests >=2.27.1
environment.yml pypi
  • pyexcelerate *
  • pymongo ==4.1.1