eurostat-api

A python API to make simple Eurostat database requests

https://github.com/cr1337/eurostat-api

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A python API to make simple Eurostat database requests

Basic Info
  • Host: GitHub
  • Owner: CR1337
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 36.1 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.md

eurostat-api

A python API to make simple Eurostat database requests.

If you use this software, please cite it using the button in the right column of the repository's main page. Thank you! (See also the CITATION.cff file.)

You are free to use use this software under the terms of the CC BY-NC 4.0 License. (See also the LICENSE file.)

Setup

This section only describes how to get example.py running. If you want to use the API in your own project, you can simply copy the eurostat_api folder into your project directory and install the dependencies from requirements.txt into your environment. Make sure to also copy the LICENSE and CITATION.cff files.

Linux

  1. Clone this repository bash git clone https://github.com/CR1337/eurostat-api.git

  2. Change directory bash cd eurostat-api

  3. Create a virtual environment bash python3 -m venv .venv

  4. Activate the virtual environment bash source .venv/bin/activate

  5. Install the requirements bash pip3 install -r requirements.txt

  6. Test if example.py works bash python3 example.py

Windows (not tested)

  1. Clone this repository bash git clone https://github.com/CR1337/eurostat-api.git

  2. Change directory bash cd eurostat-api

  3. Create a virtual environment bash python -m venv .venv

  4. Activate the virtual environment bash .venv\Scripts\activate

  5. Install the requirements bash pip install -r requirements.txt

  6. Test if example.py works bash python example.py

Usage

Requesting data

Import the classes EurostatDataset, DimensionFilter and TimePeriodFilter. python from eurostat_api.dataset import EurostatDataset from eurostat_api.filters import DimensionFilter, TimePeriodFilter

Create a new EurostatDataset object with a dataset id and a language. The languages de, en and fr are supported. python dataset = EurostatDataset('lfsi_emp_a', 'de')

Define a dimension filter and specify what values should be included. ```python dimension_filter = DimensionFilter(dataset)

dimensionfilter.adddimensionvalue('age', 'Y20-64') dimensionfilter.adddimensionvalue('indicem', 'EMPLFS') dimensionfilter.adddimensionvalue('unit', 'PCPOP')

dimension_filter.add('sex', ['M', 'F']) ```

Define a time period filter and specify the time period. Possible operators are EQUALS, GREATER_OR_EQUALS, GREATER, LOWER_OR_EQUALS and LOWER. ```python timeperiodfilter = TimePeriodFilter(dataset)

timeperiodfilter.add(TimePeriodFilter.Operators.GREATEROREQUALS, "2022") ```

Perform the request. python dataset.request_data()

There you have the data: python print(dataset.data.dataframe)

```

freq indicem sex age unit geo time status observation 0 A EMPLFS F Y20-64 PCPOP AT 2022 - 73.4 1 A EMPLFS F Y20-64 PCPOP BE 2022 - 68.1 2 A EMPLFS F Y20-64 PCPOP BG 2022 - 71.8 3 A EMPLFS F Y20-64 PCPOP CH 2022 - 77.8 4 A EMPLFS F Y20-64 PCPOP CY 2022 - 72.1 .. ... ... .. ... ... .. ... ... ... 61 A EMPLFS M Y20-64 PCPOP RO 2022 - 77.7 62 A EMPLFS M Y20-64 PCPOP RS 2022 - 76.2 63 A EMPLFS M Y20-64 PCPOP SE 2022 - 85.0 64 A EMPLFS M Y20-64 PCPOP SI 2022 - 81.2 65 A EMPLFS M Y20-64 PC_POP SK 2022 - 80.7

[66 rows x 9 columns] ```

Metadata and structure data

Time of last update

python print(dataset.data.updated) ```

2023-12-14 23:00:00+01:00 ```

All dimensions of the dataset

python print(dataset.data.dimension_ids) ```

['freq', 'indic_em', 'sex', 'age', 'unit', 'geo', 'time'] ```

All columns of the dataframe

python print(dataset.data.dataframe_columns) ```

['freq', 'indic_em', 'sex', 'age', 'unit', 'geo', 'time', 'status', 'observation'] ```

The amount of available values per dimension

python print(dataset.data.data_shape) ```

(1, 1, 2, 1, 1, 36, 1) ```

The meaning of the values of the dimensions (lanugae dependent)

python print(dataset.data.dimension_value_labels) ```

{'freq': {'A': 'Jährlich'}, 'indicem': {'EMPLFS': 'Beschäftigung insgesamt (Wohnbevölkerung - AKE)'}, 'sex': {'M': 'Männer', 'F': 'Frauen'}, 'age': {'Y20-64': '20 bis 64 Jahre'}, 'unit': {'PCPOP': 'Prozent der Bevölkerung insgesamt'}, 'geo': {'EU272020': 'Europäische Union - 27 Länder (ab 2020)', 'EA20': 'Euroraum - 20 Länder (ab 2023)', 'BE': 'Belgien', 'BG': 'Bulgarien', 'CZ': 'Tschechien', 'DK': 'Dänemark', 'DE': 'Deutschland', 'EE': 'Estland', 'IE': 'Irland', 'EL': 'Griechenland', 'ES': 'Spanien', 'FR': 'Frankreich', 'HR': 'Kroatien', 'IT': 'Italien', 'CY': 'Zypern', 'LV': 'Lettland', 'LT': 'Litauen', 'LU': 'Luxemburg', 'HU': 'Ungarn', 'MT': 'Malta', 'NL': 'Niederlande', 'AT': 'Österreich', 'PL': 'Polen', 'PT': 'Portugal', 'RO': 'Rumänien', 'SI': 'Slowenien', 'SK': 'Slowakei', 'FI': 'Finnland', 'SE': 'Schweden', 'IS': 'Island', 'NO': 'Norwegen', 'CH': 'Schweiz', 'ME': 'Montenegro', 'MK': 'Nordmazedonien', 'RS': 'Serbien', 'TR': 'Türkei'}, 'time': {'2022': '2022'}} ```

The selected language

python print(dataset.data.language) ```

de ```

The amount of observations (amount of rows in the dataframe)

python print(dataset.data.observation_count) ```

35952 ```

The last point in time where data is available

python print(dataset.data.latest_period) ```

2022 ```

The earliest point in time where data is available

python print(dataset.data.oldest_period) ```

2003 ```

The meanings of the status labels (language dependent)

python print(dataset.data.status_labels) ```

{'d': 'abweichende Definition (siehe Metadaten)'} ```

Other functions

Last point in time with specific data "fill level"

The get_latest_time_value_with function can be used to find the last time at which the data has at least a certain "fill level". To do this, you only need to pass the desired "fill level" and, if necessary, a restriction for the values of the dimensions. If the dimensions are not to be further restricted, it is sufficient to pass {}. python fill_level = 0.8 # 80 % filled time_period = dataset.data.get_latest_time_value_with(fill_level, {'sex': 'F'}) print(time_period) ```

2022 ```

Index dataframe

In addition to the actual DataFrame, a so-called index DataFrame can also be created. In this, the dimension values are not specified as actual values, but as indices for dataset.data.dimension_value_labels. python print(dataset.data.index_dataframe) ```

freq indic_em sex age unit geo time status observation 0 0 0 1 0 0 21 0 - 73.4 1 0 0 1 0 0 2 0 - 68.1 2 0 0 1 0 0 3 0 - 71.8 3 0 0 1 0 0 31 0 - 77.8 4 0 0 1 0 0 14 0 - 72.1 .. ... ... .. .. ... .. ... ... ... 61 0 0 0 0 0 24 0 - 77.7 62 0 0 0 0 0 34 0 - 76.2 63 0 0 0 0 0 28 0 - 85.0 64 0 0 0 0 0 25 0 - 81.2 65 0 0 0 0 0 26 0 - 80.7

[66 rows x 9 columns] ```

Pivot table

A pivot table compares one dimension with another. The only meaningful combination in this case is the location and time. Such a table can be created with get_pivot_table. Each other dimension may only contain one value. Restrictions for the dimension values can simply be passed to the function. If no restriction is necessary, it is sufficient to pass {}. python print(dataset.data.get_pivot_table({'sex': 'F'})) ```

time 2022 geo AT 73.4 BE 68.1 BG 71.8 CH 77.8 CY 72.1 CZ 73.7 DE 76.8 DK 77.4 EA20 69.0 EE 80.4 EL 55.9 ES 64.1 EU27_2020 69.3 FI 77.8 FR 71.2 HR 65.0 HU 75.3 IE 72.6 IS 82.1 IT 55.0 LT 78.6 LU 71.5 LV 75.5 MT 74.1 NL 79.0 NO 78.0 PL 70.2 PT 74.3 RO 59.1 RS 62.3 SE 79.2 SI 74.3 SK 72.6 ```

Status pivot table

The get_status_pivot_table function works in exactly the same way as get_pivot_table. The resulting table does not contain the data, but the status of the data. python print(dataset.data.get_status_pivot_table({'sex': 'F'})) ```

time 2022 geo AT - BE - BG - CH - CY - CZ - DE - DK - EA20 - EE - EL - ES d EU27_2020 - FI - FR d HR - HU - IE - IS - IT - LT - LU - LV - MT - NL - NO - PL - PT - RO - RS - SE - SI - SK - ```

Owner

  • Name: Christian Raue
  • Login: CR1337
  • Kind: user
  • Location: Berlin/Germany

Studying IT-Systems Engineering M.Sc. at Hasso-Plattner-Institute Potsdam

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Raue
    given-names: Christian Johannes
title: "eurostat-api"
version: 1.0.0
url: https://github.com/CR1337/eurostat-api
date-released: 2024-01-12

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • certifi ==2023.11.17
  • charset-normalizer ==3.3.2
  • idna ==3.6
  • numpy ==1.26.3
  • pandas ==2.1.4
  • python-dateutil ==2.8.2
  • pytz ==2023.3.post1
  • requests ==2.31.0
  • six ==1.16.0
  • tzdata ==2023.4
  • urllib3 ==2.1.0