https://github.com/anthesevenants/gij-bent
Scripts for the article 'Zijt gij dat of bent gij dat?' – Een alternantiestudie van de tweede persoon enkelvoud van zijn in Vlaamse tussentaal.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
Scripts for the article 'Zijt gij dat of bent gij dat?' – Een alternantiestudie van de tweede persoon enkelvoud van zijn in Vlaamse tussentaal.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
gij bent
Scripts for the article 'Zijt gij dat of bent gij dat?' -- Een alternantiestudie van de tweede persoon enkelvoud van zijn in Vlaamse tussentaal.
This repository houses all the scripts that were used for the analysis in my article on gij bent in Dutch. There are three general components:
- Tweet retrieval and sorting. This was done in Python using snscrape. Unfortunately, the library no longer works for Twitter, so you won't be able to replicate my output. All files pertaining to this process are in the root directory.
- Statistical analysis. This was done in R. All files for the analysis can be found in the
analysis/directory. - Article. The Quarto document with the reporting can be found in the
paper/directory. It uses files fromanalysis/.
Tweet retrieval and sorting
All .py files in the root directory are used for the tweet retrieval and sorting process. These scripts are included for transparency purposes, since you cannot run them anymore for two reasons:
- It is no longer possible to use the Twitter API / scrape data from Twitter.
- I cannot share the dataset in full, because it contains personal information.
These are the files:
1-retrieve-tweets.py: used to query Twitter. Output is written to jsonl files inoutput/. Now defunct.2-sort-tweets.py: used to create a TSV dataset from the jsonl files. Outputs to TSV.3-geoguess.py: used to attach geolocation to every tweet. Outputs to a separate geo information dataset.4-gender-detect.py: used to guess the gender of tweet authors. Outputs to a separate gender information dataset.5-correct.py: used to find incorrectly retrieved tweets. Outputs a meta information dataset.6-merge.py: used to merge all datasets together and filter wrong tweets. Outputs a final dataset.7-anonymise.py: used to anonymise the dataset so it can be shared without personal data. Outputs the anonymised final dataset.
Statistical analysis
All .R files in the root directory are used for statistical analysis. All files are made to work in the report, except for gij-bent-gam2.R.
geo-map.R: prints a map of Flanders. Embedded in the report.gij-bent2.R: loads the dataset. Embedded in the report.kloeke.R: prints a map of the Low Countries with forms for 'you are' in dialect. Embedded in the report.gij-bent-gam2.R: prints the map of gij bent. Not embedded in the report, since it takes a solid four minutes to generate the map. This means you need to generate the image first, which is then used in the report.
Article
I wrote the article in Quarto. The idea of Quarto is that you write your paper once, which you can then export to HTML, Word and PDF. The paper is generated dynamically, and all regression analyses, graphs and numbers are included on the fly. It uses the files from analysis/.
Reproducibility
I have anonymised the dataset with tweets. If you need to consult the full dataset with personal information, send me an email.
Owner
- Name: Anthe Sevenants
- Login: AntheSevenants
- Kind: user
- Location: Leuven, Belgium
- Company: KU Leuven
- Website: anthe.sevenants.net
- Repositories: 39
- Profile: https://github.com/AntheSevenants
AI & linguistics master. Linguistics PhD candidate @QLVL
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1