eurostat-api

A python API to make simple Eurostat database requests

https://github.com/cr1337/eurostat-api

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

A python API to make simple Eurostat database requests

Basic Info

Host: GitHub
Owner: CR1337
License: other
Language: Python
Default Branch: main
Homepage:
Size: 36.1 KB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

eurostat-api

A python API to make simple Eurostat database requests.

If you use this software, please cite it using the button in the right column of the repository's main page. Thank you! (See also the CITATION.cff file.)

You are free to use use this software under the terms of the CC BY-NC 4.0 License. (See also the LICENSE file.)

Setup

This section only describes how to get example.py running. If you want to use the API in your own project, you can simply copy the eurostat_api folder into your project directory and install the dependencies from requirements.txt into your environment. Make sure to also copy the LICENSE and CITATION.cff files.

Linux

Clone this repository bash git clone https://github.com/CR1337/eurostat-api.git
Change directory bash cd eurostat-api
Create a virtual environment bash python3 -m venv .venv
Activate the virtual environment bash source .venv/bin/activate
Install the requirements bash pip3 install -r requirements.txt
Test if example.py works bash python3 example.py

Windows (not tested)

Clone this repository bash git clone https://github.com/CR1337/eurostat-api.git
Change directory bash cd eurostat-api
Create a virtual environment bash python -m venv .venv
Activate the virtual environment bash .venv\Scripts\activate
Install the requirements bash pip install -r requirements.txt
Test if example.py works bash python example.py

Usage

Requesting data

Import the classes EurostatDataset, DimensionFilter and TimePeriodFilter. python from eurostat_api.dataset import EurostatDataset from eurostat_api.filters import DimensionFilter, TimePeriodFilter

Create a new EurostatDataset object with a dataset id and a language. The languages de, en and fr are supported. python dataset = EurostatDataset('lfsi_emp_a', 'de')

Define a dimension filter and specify what values should be included. ```python dimension_filter = DimensionFilter(dataset)

dimensionfilter.adddimensionvalue('age', 'Y20-64') dimensionfilter.adddimensionvalue('indicem', 'EMPLFS') dimensionfilter.adddimensionvalue('unit', 'PCPOP')

dimension_filter.add('sex', ['M', 'F']) ```

Define a time period filter and specify the time period. Possible operators are EQUALS, GREATER_OR_EQUALS, GREATER, LOWER_OR_EQUALS and LOWER. ```python timeperiodfilter = TimePeriodFilter(dataset)

timeperiodfilter.add(TimePeriodFilter.Operators.GREATEROREQUALS, "2022") ```

Perform the request. python dataset.request_data()

There you have the data: python print(dataset.data.dataframe)

```

freq indicem sex age unit geo time status observation 0 A EMPLFS F Y20-64 PCPOP AT 2022 - 73.4 1 A EMPLFS F Y20-64 PCPOP BE 2022 - 68.1 2 A EMPLFS F Y20-64 PCPOP BG 2022 - 71.8 3 A EMPLFS F Y20-64 PCPOP CH 2022 - 77.8 4 A EMPLFS F Y20-64 PCPOP CY 2022 - 72.1 .. ... ... .. ... ... .. ... ... ... 61 A EMPLFS M Y20-64 PCPOP RO 2022 - 77.7 62 A EMPLFS M Y20-64 PCPOP RS 2022 - 76.2 63 A EMPLFS M Y20-64 PCPOP SE 2022 - 85.0 64 A EMPLFS M Y20-64 PCPOP SI 2022 - 81.2 65 A EMPLFS M Y20-64 PC_POP SK 2022 - 80.7

[66 rows x 9 columns] ```

Metadata and structure data

Time of last update

python print(dataset.data.updated) ```

2023-12-14 23:00:00+01:00 ```

All dimensions of the dataset

python print(dataset.data.dimension_ids) ```

['freq', 'indic_em', 'sex', 'age', 'unit', 'geo', 'time'] ```

All columns of the dataframe

python print(dataset.data.dataframe_columns) ```

['freq', 'indic_em', 'sex', 'age', 'unit', 'geo', 'time', 'status', 'observation'] ```

The amount of available values per dimension

python print(dataset.data.data_shape) ```

(1, 1, 2, 1, 1, 36, 1) ```

The meaning of the values of the dimensions (lanugae dependent)

python print(dataset.data.dimension_value_labels) ```

{'freq': {'A': 'Jährlich'}, 'indicem': {'EMPLFS': 'Beschäftigung insgesamt (Wohnbevölkerung - AKE)'}, 'sex': {'M': 'Männer', 'F': 'Frauen'}, 'age': {'Y20-64': '20 bis 64 Jahre'}, 'unit': {'PCPOP': 'Prozent der Bevölkerung insgesamt'}, 'geo': {'EU272020': 'Europäische Union - 27 Länder (ab 2020)', 'EA20': 'Euroraum - 20 Länder (ab 2023)', 'BE': 'Belgien', 'BG': 'Bulgarien', 'CZ': 'Tschechien', 'DK': 'Dänemark', 'DE': 'Deutschland', 'EE': 'Estland', 'IE': 'Irland', 'EL': 'Griechenland', 'ES': 'Spanien', 'FR': 'Frankreich', 'HR': 'Kroatien', 'IT': 'Italien', 'CY': 'Zypern', 'LV': 'Lettland', 'LT': 'Litauen', 'LU': 'Luxemburg', 'HU': 'Ungarn', 'MT': 'Malta', 'NL': 'Niederlande', 'AT': 'Österreich', 'PL': 'Polen', 'PT': 'Portugal', 'RO': 'Rumänien', 'SI': 'Slowenien', 'SK': 'Slowakei', 'FI': 'Finnland', 'SE': 'Schweden', 'IS': 'Island', 'NO': 'Norwegen', 'CH': 'Schweiz', 'ME': 'Montenegro', 'MK': 'Nordmazedonien', 'RS': 'Serbien', 'TR': 'Türkei'}, 'time': {'2022': '2022'}} ```

The selected language

python print(dataset.data.language) ```

de ```

The amount of observations (amount of rows in the dataframe)

python print(dataset.data.observation_count) ```

35952 ```

The last point in time where data is available

python print(dataset.data.latest_period) ```

2022 ```

The earliest point in time where data is available

python print(dataset.data.oldest_period) ```

2003 ```

The meanings of the status labels (language dependent)

python print(dataset.data.status_labels) ```

{'d': 'abweichende Definition (siehe Metadaten)'} ```

Other functions

Last point in time with specific data "fill level"

The get_latest_time_value_with function can be used to find the last time at which the data has at least a certain "fill level". To do this, you only need to pass the desired "fill level" and, if necessary, a restriction for the values of the dimensions. If the dimensions are not to be further restricted, it is sufficient to pass {}. python fill_level = 0.8 # 80 % filled time_period = dataset.data.get_latest_time_value_with(fill_level, {'sex': 'F'}) print(time_period) ```

2022 ```

Index dataframe

In addition to the actual DataFrame, a so-called index DataFrame can also be created. In this, the dimension values are not specified as actual values, but as indices for dataset.data.dimension_value_labels. python print(dataset.data.index_dataframe) ```

freq indic_em sex age unit geo time status observation 0 0 0 1 0 0 21 0 - 73.4 1 0 0 1 0 0 2 0 - 68.1 2 0 0 1 0 0 3 0 - 71.8 3 0 0 1 0 0 31 0 - 77.8 4 0 0 1 0 0 14 0 - 72.1 .. ... ... .. .. ... .. ... ... ... 61 0 0 0 0 0 24 0 - 77.7 62 0 0 0 0 0 34 0 - 76.2 63 0 0 0 0 0 28 0 - 85.0 64 0 0 0 0 0 25 0 - 81.2 65 0 0 0 0 0 26 0 - 80.7

[66 rows x 9 columns] ```

Pivot table

A pivot table compares one dimension with another. The only meaningful combination in this case is the location and time. Such a table can be created with get_pivot_table. Each other dimension may only contain one value. Restrictions for the dimension values can simply be passed to the function. If no restriction is necessary, it is sufficient to pass {}. python print(dataset.data.get_pivot_table({'sex': 'F'})) ```

time 2022 geo AT 73.4 BE 68.1 BG 71.8 CH 77.8 CY 72.1 CZ 73.7 DE 76.8 DK 77.4 EA20 69.0 EE 80.4 EL 55.9 ES 64.1 EU27_2020 69.3 FI 77.8 FR 71.2 HR 65.0 HU 75.3 IE 72.6 IS 82.1 IT 55.0 LT 78.6 LU 71.5 LV 75.5 MT 74.1 NL 79.0 NO 78.0 PL 70.2 PT 74.3 RO 59.1 RS 62.3 SE 79.2 SI 74.3 SK 72.6 ```

Status pivot table

The get_status_pivot_table function works in exactly the same way as get_pivot_table. The resulting table does not contain the data, but the status of the data. python print(dataset.data.get_status_pivot_table({'sex': 'F'})) ```

time 2022 geo AT - BE - BG - CH - CY - CZ - DE - DK - EA20 - EE - EL - ES d EU27_2020 - FI - FR d HR - HU - IE - IS - IT - LT - LU - LV - MT - NL - NO - PL - PT - RO - RS - SE - SI - SK - ```

Owner

Name: Christian Raue
Login: CR1337
Kind: user
Location: Berlin/Germany

Website: https://www.linkedin.com/in/christian-raue-35591a196/
Repositories: 3
Profile: https://github.com/CR1337

Studying IT-Systems Engineering M.Sc. at Hasso-Plattner-Institute Potsdam

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Raue
    given-names: Christian Johannes
title: "eurostat-api"
version: 1.0.0
url: https://github.com/CR1337/eurostat-api
date-released: 2024-01-12

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

certifi ==2023.11.17
charset-normalizer ==3.3.2
idna ==3.6
numpy ==1.26.3
pandas ==2.1.4
python-dateutil ==2.8.2
pytz ==2023.3.post1
requests ==2.31.0
six ==1.16.0
tzdata ==2023.4
urllib3 ==2.1.0

eurostat-api

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

eurostat-api

Setup

Linux

Windows (not tested)

Usage

Requesting data

Metadata and structure data

Time of last update

All dimensions of the dataset

All columns of the dataframe

The amount of available values per dimension

The meaning of the values of the dimensions (lanugae dependent)

The selected language

The amount of observations (amount of rows in the dataframe)

The last point in time where data is available

The earliest point in time where data is available

The meanings of the status labels (language dependent)

Other functions

Last point in time with specific data "fill level"

Index dataframe

Pivot table

Status pivot table

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies