Predihood

Predihood: an open-source tool for predicting neighbourhoods' information - Published in JOSS (2021)

https://gitlab.com/fduchate/predihood

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 62% confidence

Computer Science Computer Science - 44% confidence

Last synced: 6 months ago · JSON representation

Repository

Basic Info

Host: gitlab.com
Owner: fduchate
License: gpl-3.0+
Default Branch: master

Statistics

Stars: 1
Forks: 1
Open Issues: 0
Releases: 0

Created about 5 years ago

https://gitlab.com/fduchate/predihood/blob/master/

# Predihood

Predihood is an application for predicting information about neighbourhoods (e.g., environment characteristics, bird migration possibilities, health issues). It makes it very easy, even for non programmers, to use predictive algorithms. Besides, the tool is extensible: new datasets and predictive algorithms can be added into the system. 
A cartographic interface enables the visualization of neighbourhoods along with their indicators (which describe them, such as the number of bakeries, the average income, or the number of houses over 250m^2) and the prediction results for selected neighbourhoods. A tuning interface enables to configure and test different machine learning algorithms on a given dataset.

Predihood includes a dataset of 50,000 French neighbourhoods (`hil`) used in a [research project](https://imu.universite-lyon.fr/appels-en-cours-et-bilans/2017-en-cours/hil-artificial-intelligence-to-facilitate-property-searches-system-of-recommendations-with-spatial-and-non-spatial-visualisation-for-property-search-2017/) to predict the environment of a neighbourhood (e.g., social class, type of landscape) based on hundreds of indicators (about population, shops, buildings, etc.).
The tool also includes a small test dataset (`bird-migration`) to demonstrate how to add new datasets. 

Predihood is provided under a [GNU General Public License v3.0](https://gitlab.com/fduchate/predihood/-/blob/master/LICENSE).

Contributions are welcome, following the [community guidelines](https://gitlab.com/fduchate/predihood/-/blob/master/CONTRIBUTING.md).

Predihood has been published in Journal of Open Source Software (JOSS). Please cite our work if you use it in a scientific publication.

[![DOI](https://joss.theoj.org/papers/10.21105/joss.02805/status.svg)](https://doi.org/10.21105/joss.02805)

## Installation

Predihood includes two components: the Python application ([predihood repository](https://gitlab.com/fduchate/predihood)) and the data management library ([mongiris repository](https://gitlab.liris.cnrs.fr/fduchate/mongiris)).

Note that the two repositories takes about 2.8GB (including datasets) on the disk.

### Installation using Docker (recommended)

This method requires the use of [Docker](https://www.docker.com/). This method builds two Docker images (for a total size of 5.7GB on the disk, including all libraries and datasets loads).

First, clone the two repositories with the following commands. Note that both cloned repositories should be placed into a new (empty) directory.

```
git clone https://gitlab.liris.cnrs.fr/fduchate/mongiris
git clone https://gitlab.com/fduchate/predihood.git
```

Go into the downloaded `predihood/` directory and run in a terminal:

```
docker-compose up
```

This command deploys two containers, one for the application (`predihood`) and the other for the database (`db-predihood`). On the first run, the database container imports two datasets (`hil` and `bird-migration`), which may take a few minutes. 
Note that the application container may generate a database connection error (if the database container is not ready before timeout, which could occur at the first run on low machines), but it automatically restarts.

After some logging information, go to [http://127.0.0.1:8081/](http://127.0.0.1:8081/) in your browser (preferably Firefox or Chrome) to use the application.


### Manual installation

Although the Docker method is highly recommended, it is also possible to manually install Predihood and its dependencies. Note that some issues may occur due to version or package conflicts.

Requirements:

- Python, version >=3.8
- [MongoDB](https://www.mongodb.com/), version >=4 for importing the database about neighbourhoods.


First, clone the two repositories with the following commands:

```
git clone https://gitlab.liris.cnrs.fr/fduchate/mongiris
git clone https://gitlab.com/fduchate/predihood.git
```

Next, go in the `mongiris` directory and install the mongiris application:

```
python3 -m pip install -e .
```

Note that the download time may be quite long, as the mongiris API includes two datasets (760 MB).

Then import datasets into the MongoDB database: run the MongoDB server (`mongod`) and execute the following commands (from the MongoDB's executable directory if needed):

```
# import dataset 'hil' as a MongoDB dump
./mongorestore --archive=/path/to/dump-dbinsee.bin
# import dataset 'bird-migration' as a collection of JSON documents
./mongoimport --db=dbmigration -c=collmigration --file=/path/to/dump-bird-neighbourhoods.json	
./mongoimport --db=dbmigration -c=collindic --file=/path/to/dump-bird-indicators.json
```

where `/path/to/` is the path to the dataset files (provided with the package mongiris in `mongiris/data/dumps/`). A tip is to move the dataset files into the MongoDB binary (`PATH/TO/MONGODB/bin`). You may have to create these folders for Mongodb: `data/db` under `PATH/TO/MONGODB/bin` and run `./mongod --dbpath=./data/db`. 

Finally, go in the `predihood` directory and install the predihood application:

```
python3 -m pip install -e . -r requirements.txt
```

For running Predihood, go in the `predihood/predihood/` directory (which contains `main.py`) and run in a terminal:

```
python3 main.py [path/to/config.json]
```

The application may have an argument which is the path to the configuration file of the dataset to be loaded. By default, the dataset _hil_ is loaded (see the _Datasets_ section for more information).

After some logging information, go to [http://localhost:8081/](http://localhost:8081/) in your browser (preferably Firefox or Chrome) to use the application.


## Example usage

For the cartographic interface, an example would be:

1. Type a query in the panel on the left, e.g. "Lyon". This will display all neighbourhoods that contain "Lyon" in their name or their township.
2. Click on a neighbourhood (which are the small areas in blue). A tooltip will appear with some information about the neighbourhood. There are more information (list of all indicators) when clicking on the "More details" link.
3. In order to predict variables of the neighbourhood, you have to choose the classifier. The "Random Forest" classifier is recommended by default. After some seconds, predictions will appear in the tooltip. Prediction results can be exported as tablesheets (XLS) by clicking on the download button (in the popup)
4. Now, we want a prediction for several neighbourhoods. Select them on the map using a right-click (the list of selected ones is updated in the left panel). When all relevant neighbourhoods have been selected, select a classifier in the list and click on the button "predict selected neighbourhoods". Prediction results can be exported as tablesheets (XLS) by clicking on the download button (right of the button)

![Screenshot of the cartographic interface of Predihood](doc/predihood-predictions.png)

For the algorithmic interface, an example would be:

1. Choose an algorithm 
2. Tune it as desired
3. Click on "Train, test and evaluate" button. When computing accuracies is done, a table shows results for each environment variable and each list of indicators. 

![Screenshot of algorithmic interface of Predihood](doc/predihood-accuracies.png)


## Tests

Tests are in `predihood/predihood/tests.py` file.

Within a Docker installation, tests can be run as follows:
```
# docker-compose up is running
# docker ps lists running containers to obtain ID_CONTAINER for predihood
docker exec -it  bash
cd predihood/predihood/
python3 tests.py
```

With a local installation, run the tests using:
```
cd predihood/predihood/
python3 tests.py
```

## Documentation

The documentation of the code is in `predihood/doc/`. It is also available online at [https://nellybarret.gitlab.io/documentation-for-predihood](https://nellybarret.gitlab.io/documentation-for-predihood).

## Datasets

Dataset configuration files are stored in the `predihood/predihood/datasets/` directory. Predihood currently includes two datasets, _hil_ (50,000 neighbourhoods, 550 indicators and 6 environment variables to predict) and _bird-migration_ (769 neighbourhoods, 3 indicators, 1 variable to predict).

Datasets are stored into MongoDB according to the [GeoJSON format](https://geojson.org/).

### Using another dataset

To use an existing dataset (i.e., data already loaded into MongoDB), it is necessary to specify the path to the configuration file for this dataset.

- Using Docker, edit the `predihood/docker-compose.yml` to change the `CONFIG` environment option:

```
CONFIG=datasets/hil/config.json  # to use dataset hil
CONFIG=datasets/bird-migration/config.json  # to use dataset bird-migration
```
And run `docker-compose up` to use the mentioned dataset.

- With the manual installation, run predihood with an argument referring to the configuration file of the desired dataset:

```
python3 main.py datasets//config.json
python3 main.py datasets/bird-migration/config.json  # to use dataset bird-migration
```

### Importing a new dataset

To import another dataset, it is necessary to follow these instructions: 

1. Create a MongoDB database (called `DATABASE_NAME`), and add the following collections:

        - a collection called COLLECTION_NAME which contains data about neighbourhoods. Each document describes a single neighbourhood under the [GeoJSON format](https://geojson.org/) and includes indicators (short name and value).
        - a collection called "collindic" which contains the indicators used in your neighbourhoods (both short name and full name).

2. Create a CSV file containing human expertise (stored as `predihood/datasets//expertise.csv`). Each line entails a neighbourhoods' identifier and the expertized value for each variable (see `VARIABLES_VALUES` below).

3. Create a configuration file for your dataset (in `predihood/datasets//config.json`), in the JSON format, which contains this information:

        - DATABASE_NAME which is the name of your database in MongoDB;
        - COLLECTION_NAME which is the name of the collection in the database that contains information about neighbourhoods;
        - VARIABLES_VALUES is a dictionary which contains (at least in English) the set of variables to predict. For each variable, there are the 'label' (the description of the variable), the 'values' (the values that the variable can handle), the 'low_influence_value' (the value from values which has the least impact on the dataset while filling its missing values) and the 'median_value' (the median value from values, used for filling the missing values of character strings in the dataset);
        - NORMALIZATION corresponds to the unity with which the dataset will be normalised. It can be "None", "population" or "density". We recommend to use density (if possible) to have better results;
		- VARIABLE_REMOVE_LOW_REPRESENTATIVITY corresponds to the variable name used for removing neighbourhoods with lowest representativity when predicting in the cartographic interface.

Examples of these required files are presented in the next part for the _bird migration_ dataset.

To load a new dataset into a Docker image, check the file [mongiris/import-data.sh](https://gitlab.liris.cnrs.fr/fduchate/mongiris/blob/master/import-data.sh) which is automatically run when deploying the container. Data can be loaded either as a MongoDB dump (command `mongorestore`, as shown for database _hil_) or as a sequence of JSON documents (command `mongoimport`, as shown for database _bird-migration_).

### Example of the _bird-migration_ dataset

A fake dataset about bird migration is provided (769 neighbourhoods, 3 indicators, 1 variable). Its configuration file is in  `predihood/predihood/datasets/bird-migration` and its dump files are in `mongiris/mongiris/data/dumps/`. The objective is to predict whether a neighbourhood is suitable for migrating birds to stop by. The single variable accepts 4 values, from _favorable_ to _unfavorable_. The 3 indicators represent the _percent of greens_, the _percent of buildings_ and the _degree of human pressure_ in a neighbourhood.

The following command enables the creation of a MongoDB database with the two required collections. They are already loaded when using the Docker installation.

```
./mongoimport --db=dbmigration -c=collmigration --file=mongiris/mongiris/data/dump-bird-neighbourhoods  # neighbourhoods' collection
./mongoimport --db=dbmigration -c=collindic --file=mongiris/mongiris/data/dump-bird-indicators.json   # indicators' collection
```

The file `predihood/datasets/bird-migration/example-neighbourhood.json` shows an example of neighbourhood (including the value for each of the three indicators). Here is an simplified extract from this file:

```
{
	"_id": "5be32b9df3f0b960b1f8afb2",
	"geometry": {
		"type": "Polygon",
		"coordinates": [
			[
				[
					4.8261667,
					45.7619681
				],
				...
				[
					4.8261667,
					45.7619681
				]
			]
		]
	},
	"type": "Feature",
	"properties": {
		"NAME": "Saint-Georges",
		"CITY_NAME": "Lyon 5e Arrondissement",
		"ID": "693850103",
		"raw_indicators": {
			"percent_greens": 1,
			"human_pressure": 110,
			"percent_built": 99
		}
	}
}
```

The file `predihood/datasets/bird-migration/example-collindic.txt` shows the content of the `collindic` collection (3 documents, one for each indicator):

```
{ "_id" : "5ff1e6f3b0c86c7361a2637e", "short_label" : "human_pressure", "full_label" : "Human pressure on the area" }
{ "_id" : "5ff1e6eeb0c86c7361a2637d", "short_label" : "percent_built", "full_label" : "Percentage of building areas" }
{ "_id" : "5ff1e6e6b0c86c7361a2637c", "short_label" : "percent_greens", "full_label" : "Percentage of greens areas" }
```


Following is an example of `expertise.csv` file (simplified from `predihood/datasets/bird-migration/expertise.csv`), which contains manually expertized neighbourhoods:

```
id_neighbourhood;variable1
693860101;Favorable
693860102;Very favorable
690340801;Favorable
692660201;Favorable
693860104;Not much favorable
693860103;Not much favorable
690340402;Favorable
690340602;Not much favorable
692560101;Very favorable
693860302;Unfavorable
693860303;Unfavorable
```

Here is a commented example of `config.json` file (simplified version of `bird-migration/config.json`):

```
{
  "DATABASE_NAME": "dbmigration",                   # name of the database to connect to
  "COLLECTION_NAME": "collmigration",               # name of the collection containing neighbourhoods
  "VARIABLES_VALUES": {
    "fr": {                                         # variables to predict (in the main language)
      "variable1": {                                # first variable
        "label": "Zone de migration",               # label of the first variable
        "values": ["Favorable", "Dfavorable"],     # possible values for the first variable
        "low_influence_value": "Dfavorable",       # optional parameter, used for missing values in expertise that are filled in with this value 
        "median_value": "Favorable"                 # optional parameter, used for missing values in expertise (only for character strings variables)
      },
      ...                                           # next variables (if any)
    },
    "en": {                                         # variables to predict (in another language)
      "variable1": {                                # variables should be in the same order as in the main language
        "label": "Migration zone",
        "values": ["Favorable","Unfavorable"],      # values should be in the same order as in the main language
        "low_influence_value": "Unfavorable",
        "median_value": "Favorable"
        },
        ...                                         # next variables (if any)
    },
    ...                                             # next languages (if any)
  },
  "NORMALIZATION": "None",                          # optional parameter, used for normalizing all indicators using the provided indicator (e.g., density)
  "VARIABLE_REMOVE_LOW_REPRESENTATIVITY": "None"    # optional parameter, used for removing neighbourhoods with the lowest representative value for the mentioned variable (e.g., variable1)
}
```


## Note for configuring Predihood in PyCharm

Instead of running Predihood in a console, you can configure your IDE, here for PyCharm. Create a new configuration and set the following parameters:

- Script path: `path/to/predihood/predihood/main.py`
- Python interpreter: add the path to your current Python interpreter
- Working directory: `path/to/predihood/predihood`

You can also run the tests by creating a second configuration and set:

- Script path: `path/to/predihood/predihood/tests.py`
- Python interpreter: add the path to your current Python interpreter
- Working directory: `path/to/predihood/predihood`

JOSS Publication

Predihood: an open-source tool for predicting neighbourhoods' information

Published

May 09, 2021

DOI

10.21105/joss.02805

Volume 6, Issue 61, Page 2805

Authors

Nelly Barret

LIRIS UMR5205, Université Claude Bernard Lyon 1, Lyon, France

Fabien Duchateau

LIRIS UMR5205, Université Claude Bernard Lyon 1, Lyon, France

Franck Favetta

LIRIS UMR5205, Université Claude Bernard Lyon 1, Lyon, France

Editor

Gabriela Alessio Robles

Committers

Last synced: 6 months ago

All Time

Total Commits: 292
Total Committers: 5
Avg Commits per committer: 58.4
Development Distribution Score (DDS): 0.497

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Nelly Barret	n**t@e**r	147
Duchateau Fabien	f**u@u**r	83
Nelly Barret	n**t@i**r	46
Nelly Barret	n**y@N**l	15
Gabriela A	a**a@g**m	1

Committer Domains (Top 20 + Academic)

inria.fr: 1 univ-lyon1.fr: 1 etu.univ-lyon1.fr: 1

Issues and Pull Requests

Last synced: 6 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Predihood

Science Score: 89.0%

Scientific Fields

Repository

Basic Info

Statistics

https://gitlab.com/fduchate/predihood/blob/master/

JOSS Publication

Predihood: an open-source tool for predicting neighbourhoods' information

Authors

Editor

Tags

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Dependencies