legiscrawler

An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (https://lis.ly.gov.tw/). 🕸️ 🕸️ 爬立法委員問政專輯的爬蟲小幫手 🛠️🧰

https://github.com/davidycliao/legiscrawler

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.2%) to scientific vocabulary

Keywords

chromedriver legislation legislative-yuan legislators parliamentary-questions python selenium selenium-webdriver web-scraping
Last synced: 6 months ago · JSON representation ·

Repository

An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (https://lis.ly.gov.tw/). 🕸️ 🕸️ 爬立法委員問政專輯的爬蟲小幫手 🛠️🧰

Basic Info
  • Host: GitHub
  • Owner: davidycliao
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 5.95 MB
Statistics
  • Stars: 11
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
chromedriver legislation legislative-yuan legislators parliamentary-questions python selenium selenium-webdriver web-scraping
Created over 4 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

legisCrawler: An Automation Webcrawling Toolkit for Retrieving Taiwan Parliamentary Questions 🛠️🧰

CodeQL CI

An automation web crawling framework for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan 立法院 (https://lis.ly.gov.tw/) based on Selenium library in Python and Chrome browser.

Requirements

  • python>=3.7.3 🐍
  • pip>=19.2
  • numpy=1.16.2
  • pandas=0.24.2
  • matplotlib=3.0.3
  • selenium
  • webdriver-manager

Instruction

git clone git@github.com:davidycliao/legisCrawler.git

  • Copy the commands below and paste them into the terminal:

```

Change the directory by typing cd command once legisCrawler repository is downloaded.

cd legisCrawler

Create the enviroment by using conda and name the enviroment legisCrawler.

conda create -n legisCrawler python=3.7

Activate the pre-named enviroment.

conda activate legisCrawler

Install the dependencies from requirements.txt using pip methond.

pip install -r requirements.txt
```

  • Run legisCrawler in your Python:

```

Note: you need to run it in the terminal where you activated the enviroment.

python legisCrawler.py ```

  • When legisCrawler is running, you will be asked which term (2nd - 10th) you would like to scrape (please, type any single digit from 2 to 10). Then legisCrawler will automatically create a folder to restore the retrieval of parliamentary questions by individual member.

Workflow

What legisCrawler Scrapes

This designed crawler automatically webscrapes the parliamentary questions (專案質詢) from The Website of Legislative Yuan, including a bunch of information with regards to the topic, keywords and the type. An additional module for getting a corpus of grand parliamentary debates (總質詢) is in progress and will be available soon.

Note

If there’s anything you need about running legisCrawler, please don’t hesitate to post a message in Discussion 📣. 如果有任何需要幫忙的地方,歡迎到留言在發問區,或者email 給我。我會抽空來幫忙解決問題!

Cite

For citing this work, you can refer to the present GitHub project. For example, with BibTeX: @misc{legisCrawler, howpublished = {\url{https://github.com/davidycliao/legisCrawler}}, title = {An Automation Webcrawling Toolkit for Retrieving Taiwan Parliamentary Questions}, author = {David Yen-Chieh Liao and Calvin Yu-Ceng Liao}, publisher = {GitHub}, year = {2021} }

Owner

  • Name: David Liao
  • Login: davidycliao
  • Kind: user
  • Location: Birmingham | Colchester
  • Company: @Connected-Politics-Lab

Researcher at UoB + member of @Connected-Politics-Lab

Citation (CITATION.cff)

cff-version: 0.0.1
message: "If you use this software, please cite it as below."
authors:
- family-names: "Liao"
  given-names: "David Yen-Chieh"
  orcid: ""
title: "legisCrawler: An Automation Webcrawling Toolkit for Retrieving  Taiwan Parliamentary Questions"
version: 0.0.1
doi: 
date-released: 2022-01-10
url: "https://github.com/davidycliao/legisCrawler"


GitHub Events

Total
  • Watch event: 3
Last Year
  • Watch event: 3

Dependencies

requirements.txt pypi
  • matplotlib ==3.0.3
  • numpy ==1.16.5
  • pandas ==1.2.4
  • selenium *
  • webdriver-manager ==3.4.2
setup.py pypi
  • matplotlib *
  • numpy >1.16.2
  • pandas *
  • scipy >=1.5.1
  • selenium *
  • webdriver_manager *
.github/workflows/codeql-analysis.yml actions
  • actions/checkout v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/autobuild v1 composite
  • github/codeql-action/init v1 composite
.github/workflows/main.yml actions
  • actions/checkout v2 composite
.github/workflows/python-app.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/python-package-conda.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite