south-africa-street-history-mapping

A small repo to generate a map of South Africa's streets colour coded by the name's origin.

https://github.com/emily-rosesteyn/south-africa-street-history-mapping

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

A small repo to generate a map of South Africa's streets colour coded by the name's origin.

Basic Info

Host: GitHub
Owner: Emily-RoseSteyn
License: gpl-3.0
Language: Jupyter Notebook
Default Branch: main
Size: 343 MB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

South Africa Street History Mapping

We were curious about visualising how street names correlate to their language or place of origin in South Africa - a country whose history is marked by significant power struggles and complex race relations. This repo provides the code for creating maps of street networks colour coded by place of origin and language.

This readme is divided into:

Outputs
Results
Running the code
Helpful Notebooks
Contact

Outputs

Peer Reviewed Extended Abstract Submission for IC2S2 2024
Poster for IC2S2 2024
Paper coming soon!

Results

| Area | Dictionary Lookup | Language Detector | |--------------|-------------------------------------------------------|---------------------------------------------------------------------| | Johannesburg | joburg | joburg-lang | | Soweto | soweto | soweto-lang | | Sandton | sandton | sandton-lang | | Cape Town | cape-town | cape-town-lang |

Running the code

This section describes how to run the code. Feel free to open an issue if you have any questions!

Prerequisites

If windows, git bash
Docker
Poetry
~5GB Disk Space (Docker images + data)

Setup

Poetry is used to manage packages and virtual environments.

shell poetry shell

shell poetry install

Data Download Pipeline

1. Retrieve streets for relevant countries

Core Code: downloadcountrystreets.py

We first need to download all street names for South Africa and selected countries that have played a role in South Africa's history (see countries). Data is downloaded using the Overpass API - an API that retrieves data easily from OpenStreetMaps.

To retrieve street data, check that you're happy with what countries are being retrieved and run:

shell python ./src/street_list_download/main.py

If you're on a slurm enabled cluster, you can run

shell sbatch ./scripts/1_retrieve-streets.sbatch

The outputs of this script are saved to streets in CSV format.

2. Process street data

Core Code: preprocesscountrystreets.py

We now process the street names for the various countries so that we end up with a dictionary of terms for the country. Each street name is:

Exploded by space (e.g. so that Nottingham Road becomes [Nottingham, Road])
Converted to lowercase

This results in a dataframe of terms. Empty, NaN, digit, and duplicate terms are dropped. Words less than a certain length are also dropped.

To process the street data, run:

shell python ./src/street_list_preprocessing/main.py

If you're on a slurm enabled cluster, you can run

shell sbatch ./scripts/2_process-streets.sbatch

The outputs of this script are saved to streets in CSV format with the prefix "processed". Additionally, all terms and the corresponding origin country are saved to a sqlite database in output/street_history.sqlite in the table street_terms.

3. Build Dictionary

Core Code: builddictionaryfor_term.py

Now that we have all the terms for each selected country, we can build a lookup dictionary for each term for a "home" country. In our case, South Africa is the home country.

For each term in South Africa's terms data from the previous step, the term is looked up in the street_terms table. If the term is matched to one or more countries (including in the home country), the term is saved in a dictionary table and assigned a likelihood based on the frequency of the term appearing in different countries.

The term, origin, and likelihood are saved to a sqlite database in output/street_history.sqlite in a table with the format <country>_terms_dictionary.

To build a dictionary of terms for a specific country, run:

shell python ./src/dictionary_builder/main.py $COUNTRY

Where $COUNTRY is south_africa in the case of this repo but could be modified to other countries that have been downloaded.

If you're on a slurm enabled cluster, you can run:

shell sbatch ./scripts/3_build-dictionary-south-africa.sbatch

4. Map

Finally, we can now map street names for a particular area in the "home" country. To do this, OSMNX is used to retrieve a street network graph for an area. The street names in the network are preprocessed to produce terms for each name. The terms are looked up in the dictionary and the term with the highest likelihood origin is used to set the origin (excluding "stop" words like road, avenue, etc). The street is then mapped with a colour coding matching the allocated origin.

Additionally, an option is included to instead map the streets by language which needs some further work but produces interesting results. This second mapping uses lingua to detect the language of the terms provided.

To map all street names in a region, run the end-to-end mapping - e.g.:

shell python ./src/mapping/map-e2e.py "Johannesburg, South Africa" --distance 30000 --fig_size 64

Helpful Notebooks

There are a bunch of Jupyter notebooks in the notebooks folder which may be useful for you to play around with.

Contact

Feel free to reach out to me either via this repo or emilyrosesteyn@gmail.com.

Owner

Name: Emily-Rose Steyn
Login: Emily-RoseSteyn
Kind: user
Location: South Africa

Repositories: 2
Profile: https://github.com/Emily-RoseSteyn

Growing 🌱

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science