south-africa-street-history-mapping
A small repo to generate a map of South Africa's streets colour coded by the name's origin.
https://github.com/emily-rosesteyn/south-africa-street-history-mapping
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
A small repo to generate a map of South Africa's streets colour coded by the name's origin.
Basic Info
- Host: GitHub
- Owner: Emily-RoseSteyn
- License: gpl-3.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 343 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
South Africa Street History Mapping
We were curious about visualising how street names correlate to their language or place of origin in South Africa - a country whose history is marked by significant power struggles and complex race relations. This repo provides the code for creating maps of street networks colour coded by place of origin and language.
This readme is divided into:
Outputs
- Peer Reviewed Extended Abstract Submission for IC2S2 2024
- Poster for IC2S2 2024
- Paper coming soon!
Results
| Area | Dictionary Lookup | Language Detector |
|--------------|-------------------------------------------------------|---------------------------------------------------------------------|
| Johannesburg |
|
|
| Soweto |
|
|
| Sandton |
|
|
| Cape Town |
|
|
Running the code
This section describes how to run the code. Feel free to open an issue if you have any questions!
Prerequisites
Setup
Poetry is used to manage packages and virtual environments.
shell
poetry shell
shell
poetry install
Data Download Pipeline
1. Retrieve streets for relevant countries
Core Code: downloadcountrystreets.py
We first need to download all street names for South Africa and selected countries that have played a role in South Africa's history (see countries). Data is downloaded using the Overpass API - an API that retrieves data easily from OpenStreetMaps.
To retrieve street data, check that you're happy with what countries are being retrieved and run:
shell
python ./src/street_list_download/main.py
If you're on a slurm enabled cluster, you can run
shell
sbatch ./scripts/1_retrieve-streets.sbatch
The outputs of this script are saved to streets in CSV format.
2. Process street data
Core Code: preprocesscountrystreets.py
We now process the street names for the various countries so that we end up with a dictionary of terms for the country. Each street name is:
- Exploded by space (e.g. so that Nottingham Road becomes [Nottingham, Road])
- Converted to lowercase
This results in a dataframe of terms. Empty, NaN, digit, and duplicate terms are dropped. Words less than a certain length are also dropped.
To process the street data, run:
shell
python ./src/street_list_preprocessing/main.py
If you're on a slurm enabled cluster, you can run
shell
sbatch ./scripts/2_process-streets.sbatch
The outputs of this script are saved to streets in CSV format with the prefix "processed".
Additionally, all terms and the corresponding origin country are saved to a sqlite database
in output/street_history.sqlite in the table street_terms.
3. Build Dictionary
Core Code: builddictionaryfor_term.py
Now that we have all the terms for each selected country, we can build a lookup dictionary for each term for a "home" country. In our case, South Africa is the home country.
For each term in South Africa's terms data from the previous step, the term is looked up in the street_terms table. If
the term is matched to one or more countries (including in the home country), the term is saved in a dictionary table
and assigned a likelihood based on the frequency of the term appearing in different countries.
The term, origin, and likelihood are saved to a sqlite database
in output/street_history.sqlite in a table with the format <country>_terms_dictionary.
To build a dictionary of terms for a specific country, run:
shell
python ./src/dictionary_builder/main.py $COUNTRY
Where $COUNTRY is south_africa in the case of this repo but could be modified to other countries that have been
downloaded.
If you're on a slurm enabled cluster, you can run:
shell
sbatch ./scripts/3_build-dictionary-south-africa.sbatch
4. Map
Finally, we can now map street names for a particular area in the "home" country. To do this, OSMNX is used to retrieve a street network graph for an area. The street names in the network are preprocessed to produce terms for each name. The terms are looked up in the dictionary and the term with the highest likelihood origin is used to set the origin (excluding "stop" words like road, avenue, etc). The street is then mapped with a colour coding matching the allocated origin.
Additionally, an option is included to instead map the streets by language which needs some further work but produces interesting results. This second mapping uses lingua to detect the language of the terms provided.
To map all street names in a region, run the end-to-end mapping - e.g.:
shell
python ./src/mapping/map-e2e.py "Johannesburg, South Africa" --distance 30000 --fig_size 64
Helpful Notebooks
There are a bunch of Jupyter notebooks in the notebooks folder which may be useful for you to play around with.
Contact
Feel free to reach out to me either via this repo or emilyrosesteyn@gmail.com.
Owner
- Name: Emily-Rose Steyn
- Login: Emily-RoseSteyn
- Kind: user
- Location: South Africa
- Repositories: 2
- Profile: https://github.com/Emily-RoseSteyn
Growing 🌱
GitHub Events
Total
- Push event: 3
Last Year
- Push event: 3