https://github.com/anthesevenants/gij-bent

Scripts for the article 'Zijt gij dat of bent gij dat?' – Een alternantiestudie van de tweede persoon enkelvoud van zijn in Vlaamse tussentaal.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Scripts for the article 'Zijt gij dat of bent gij dat?' – Een alternantiestudie van de tweede persoon enkelvoud van zijn in Vlaamse tussentaal.

Basic Info

Host: GitHub
Owner: AntheSevenants
Language: TeX
Default Branch: master
Homepage:
Size: 21.6 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme

gij bent

Scripts for the article 'Zijt gij dat of bent gij dat?' -- Een alternantiestudie van de tweede persoon enkelvoud van zijn in Vlaamse tussentaal.

This repository houses all the scripts that were used for the analysis in my article on gij bent in Dutch. There are three general components:

Tweet retrieval and sorting. This was done in Python using snscrape. Unfortunately, the library no longer works for Twitter, so you won't be able to replicate my output. All files pertaining to this process are in the root directory.
Statistical analysis. This was done in R. All files for the analysis can be found in the analysis/ directory.
Article. The Quarto document with the reporting can be found in the paper/ directory. It uses files from analysis/.

Tweet retrieval and sorting

All .py files in the root directory are used for the tweet retrieval and sorting process. These scripts are included for transparency purposes, since you cannot run them anymore for two reasons:

It is no longer possible to use the Twitter API / scrape data from Twitter.
I cannot share the dataset in full, because it contains personal information.

These are the files:

1-retrieve-tweets.py: used to query Twitter. Output is written to jsonl files in output/. Now defunct.
2-sort-tweets.py: used to create a TSV dataset from the jsonl files. Outputs to TSV.
3-geoguess.py: used to attach geolocation to every tweet. Outputs to a separate geo information dataset.
4-gender-detect.py: used to guess the gender of tweet authors. Outputs to a separate gender information dataset.
5-correct.py: used to find incorrectly retrieved tweets. Outputs a meta information dataset.
6-merge.py: used to merge all datasets together and filter wrong tweets. Outputs a final dataset.
7-anonymise.py: used to anonymise the dataset so it can be shared without personal data. Outputs the anonymised final dataset.

Statistical analysis

All .R files in the root directory are used for statistical analysis. All files are made to work in the report, except for gij-bent-gam2.R.

geo-map.R: prints a map of Flanders. Embedded in the report.
gij-bent2.R: loads the dataset. Embedded in the report.
kloeke.R: prints a map of the Low Countries with forms for 'you are' in dialect. Embedded in the report.
gij-bent-gam2.R: prints the map of gij bent. Not embedded in the report, since it takes a solid four minutes to generate the map. This means you need to generate the image first, which is then used in the report.

Article

I wrote the article in Quarto. The idea of Quarto is that you write your paper once, which you can then export to HTML, Word and PDF. The paper is generated dynamically, and all regression analyses, graphs and numbers are included on the fly. It uses the files from analysis/.

Reproducibility

I have anonymised the dataset with tweets. If you need to consult the full dataset with personal information, send me an email.

Owner

Name: Anthe Sevenants
Login: AntheSevenants
Kind: user
Location: Leuven, Belgium
Company: KU Leuven

Website: anthe.sevenants.net
Repositories: 39
Profile: https://github.com/AntheSevenants

AI & linguistics master. Linguistics PhD candidate @QLVL

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/anthesevenants/gij-bent

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

gij bent

Tweet retrieval and sorting

Statistical analysis

Article

Reproducibility

Owner

GitHub Events

Total

Last Year