https://github.com/ajruben/sedia-api-fetchers

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: ajruben
License: mit
Language: Python
Default Branch: main
Size: 43 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 12 months ago · Last pushed 12 months ago

Metadata Files

Readme License

SEDIA API Fetchers

Python classes for fetching data from the European Commission's SEDIA API endpoints.

Overview

This package includes 5 specialized fetchers for different types of EU data:

| Fetcher | Purpose | API Key | Data Type | |---------|---------|---------|-----------| | SEDIA_GET_PROJECTS | EU funded projects | SEDIA_NONH2020_PROD | Project details, metadata, participants | | SEDIA_GET_PARTICIPANTS | Organizations & persons | SEDIA_PERSON | Participant profiles, collaborations | | SEDIA_GET_FUNDING_TENDERS | Calls & tenders | SEDIA | Grant opportunities, tender notices | | SEDIA_GET_TOPICS | Topic details | SEDIA | Research topic specifications | | SEDIA_GET_FAQ | FAQ system | SEDIA_FAQ | Frequently asked questions |

Installation & Setup

Prerequisites

bash pip install requests pandas numpy tqdm pathlib urllib3

Directory Structure

src/EUFT_retrieve/ ├── EUFT_retrieve_projects.py # Projects fetcher ├── EUFT_retrieve_participants.py # Participants fetcher ├── EUFT_retrieve_funding_tenders.py # Funding & tenders fetcher ├── EUFT_retrieve_topics.py # Topics fetcher ├── EUFT_retrieve_faq.py # FAQ fetcher ├── demo_all_fetchers.py # Comprehensive demo ├── helpers/ │ └── functions.py # Utility functions └── README.md # This file

Architecture

Base Classes

The fetchers use an inheritance-based architecture:

SEDIABaseFetcher (Abstract): Common functionality for all fetchers
SEDIAPaginatedFetcher: For POST-based endpoints with pagination
SEDIASimpleFetcher: For GET-based endpoints

Common Features

All fetchers inherit these capabilities:

Flexible Programme Input

```python

Single programme by name

data = fetcher.get('h2020')

Single programme by ID

data = fetcher.get(31045243)

Multiple programmes

data = fetcher.get(['h2020', 'horizon'])

Mixed input

data = fetcher.get(['h2020', 43108390]) ```

Configuration Options

python fetcher = SEDIA_GET_PROJECTS( flatten_metadata=True, # Flatten nested JSON structures enrich_with_details=False # Fetch detailed info (projects only) )

Data Management

Automatic timestamping: Files saved with timestamp
Progress tracking: Real-time progress bars
Error handling: Robust retry mechanisms
Memory efficient: Chunked processing for large datasets

Consistent API Pattern

```python

Basic usage

data = fetcher.get(programmes, save=True)

Advanced usage with filters

data = fetcher.get( programmes=['h2020', 'horizon'], additional_filters='value', save=True ) ```

Detailed Usage Guide

1. Projects Fetcher (`SEDIA_GET_PROJECTS`)

Fetches project data with optional enrichment.

```python from EUFTretrieveprojects import SEDIAGETPROJECTS

Basic usage

fetcher = SEDIAGETPROJECTS(flatten_metadata=True) data = fetcher.get('edf', save=True)

With detailed project enrichment

fetcher = SEDIAGETPROJECTS( flattenmetadata=True, enrichwith_details=True # Fetches detailed project info ) data = fetcher.get(['h2020', 'horizon'], save=True) ```

Features: - Handles >10K records via date-range partitioning - Optional project detail enrichment - Metadata flattening - Automatic duplicate handling

Architecture: Inherits from SEDIAPaginatedFetcher

2. Participants Fetcher (`SEDIA_GET_PARTICIPANTS`)

Fetches organization and person data from EU programmes.

```python from EUFTretrieveparticipants import SEDIAGETPARTICIPANTS

fetcher = SEDIAGETPARTICIPANTS(flatten_metadata=True)

Fetch all participants for EDF programme

data = fetcher.get('edf', save=True)

Multiple programmes

data = fetcher.get(['h2020', 'horizon'], save=True) ```

Features: - Fetches ORGANISATION and PERSON types - Participant metadata flattening - Programme-specific filtering - Collaboration network data

Architecture: Inherits from SEDIAPaginatedFetcher

3. Funding & Tenders Fetcher (`SEDIA_GET_FUNDING_TENDERS`)

Fetches grant opportunities and tender notices.

```python from EUFTretrievefundingtenders import SEDIAGETFUNDINGTENDERS

fetcher = SEDIAGETFUNDINGTENDERS(flattenmetadata=True)

Open grants for Horizon Europe

data = fetcher.get( programmes='horizon', funding_type='grants', # 'grants', 'tenders', 'all' status='open', # 'open', 'closed', 'all' save=True )

All tenders regardless of programme

data = fetcher.get( programmes=None, # All programmes funding_type='tenders', status='all', save=True )

With additional filters

data = fetcher.get( programmes='h2020', programmePeriod='2014 - 2020', crossCuttingPriorities=['OCEAN'], save=True ) ```

Available Options: - Funding types: grants, tenders, all - Status: open, closed, all - Additional filters: Any valid API parameter

Architecture: Inherits from SEDIAPaginatedFetcher

4. Topics Fetcher (`SEDIA_GET_TOPICS`)

Fetches detailed information about specific research topics.

```python from EUFTretrievetopics import SEDIAGETTOPICS

fetcher = SEDIAGETTOPICS(flatten_metadata=True)

Single topic

data = fetcher.get('HORIZON-CL3-2022-BM-01-01', save=True)

Multiple topics

topics = [ 'HORIZON-CL3-2022-BM-01-01', 'HORIZON-CL4-2022-RESILIENCE-01-08' ] data = fetcher.get(topics, save=True) ```

Features: - Topic-specific detailed information - Batch processing for multiple topics - Missing topic tracking - Research area categorization

Architecture: Inherits from SEDIASimpleFetcher (uses GET requests)

5. FAQ Fetcher (`SEDIA_GET_FAQ`)

Fetches FAQ index and detailed FAQ content.

```python from EUFTretrievefaq import SEDIAGETFAQ

fetcher = SEDIAGETFAQ(flatten_metadata=True)

FAQ index for specific programme

data = fetcher.get( programmes='h2020', faq_type='all', # 'active', 'archived', 'all' status='all', # 'active', 'archived', 'all' save=True )

FAQ index with detailed content

data = fetcher.get( programmes='horizon', fetch_details=True, # Fetch full FAQ content save=True )

Specific FAQ details by NID

data = fetcher.get( nid_list=['755', '12350'], save=True ) ```

Available Options: - FAQ types: active, archived, all - Status: active, archived, all - Details: fetch_details=True for complete content

Architecture: Uses SEDIAPaginatedFetcher (when migrated)

Quick Start Demo

Run the demo:

bash cd src/EUFT_retrieve python demo_all_fetchers.py

Demonstrates: - All 5 fetchers with example usage - Flexible input handling - Data processing capabilities - Error handling features - Advanced usage patterns

Programme IDs Reference

| Programme | Name | ID | |-----------|------|-----| | h2020 | Horizon 2020 | 31045243 | | horizon | Horizon Europe | 43108390 | | digital | Digital Europe | 43152860 | | edf | European Defence Fund | 44181033 |

Advanced Usage

Custom Query Parameters

All fetchers support additional query parameters via **kwargs:

```python

Funding & Tenders with custom filters

data = fetcher.get( programmes='horizon', funding_type='grants', programmePeriod='2021 - 2027', crossCuttingPriorities=['CLIMATE'], destination=['43650651'], save=True ) ```

Error Handling

python try: data = fetcher.get('invalid_programme') except ValueError as e: print(f"Invalid programme: {e}") except Exception as e: print(f"API error: {e}")

Memory Management

For large datasets:

```python

Disable metadata flattening for faster processing

fetcher = SEDIAGETPROJECTS(flatten_metadata=False)

Process in smaller chunks

data = fetcher.get('h2020', save=True) # Automatically chunked ```

Data Processing Pipeline

```python from helpers.functions import Functions

Load and process cached data

df = Functions.loadcacheddataframe('cache/my_data.feather')

Apply custom flattening

dfflat = Functions.flattendataframe_metadata(df)

Clean empty containers

dfclean = Functions.cleanemptycontainers(dfflat) ```

Output Files

All fetchers generate timestamped CSV files:

data/ ├── project_data_44181033_20241201_143022.csv ├── participant_data_44181033_20241201_143155.csv ├── funding_tenders_data_horizon_grants_open_20241201_143301.csv ├── topic_details_HORIZON-CL3-2022-BM-01-01_20241201_143445.csv └── faq_data_31045243_all_all_20241201_143612.csv

Important Notes

API Rate Limits

Retry mechanisms handle rate limiting
Automatic backoff for server errors
Session management

Data Size Considerations

Projects fetcher handles >10K records via partitioning
Other fetchers may hit 10K API limits
Use programme filters to reduce dataset size

Memory Usage

Metadata flattening increases memory usage
Disable flattening for very large datasets
Use chunked processing

Contributing

To extend the fetchers:

Follow the existing class structure
Implement consistent API patterns
Add error handling
Include progress tracking
Update this README

License

This project is part of the EU thesis research toolkit. Please refer to the main project license.

Support

For issues or questions: 1. Check the demo script for usage examples 2. Review error messages for specific issues 3. Verify programme IDs and API parameters 4. Ensure network connectivity to EU APIs

Owner

Login: ajruben
Kind: user

Repositories: 1
Profile: https://github.com/ajruben

GitHub Events

Total

Create event: 2

Last Year

Create event: 2

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

pypi.org: sedia-api-fetchers

Python classes for fetching data from the European Commission's SEDIA API endpoints

Homepage: https://github.com/ajruben/sedia-api-fetchers
Documentation: https://sedia-api-fetchers.readthedocs.io/
License: mit
Latest release: 1.0.0
published 12 months ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 8.9%

Average: 29.4%

Dependent repos count: 50.0%

Maintainers (1)

ajruben

Last synced: 10 months ago

Dependencies

pyproject.toml pypi

numpy >=1.20.0
pandas >=1.3.0
requests >=2.25.0
tqdm >=4.60.0
urllib3 >=1.26.0

setup.py pypi

numpy >=1.20.0
pandas >=1.3.0
requests >=2.25.0
tqdm >=4.60.0
urllib3 >=1.26.0

https://github.com/ajruben/sedia-api-fetchers

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

SEDIA API Fetchers

Overview

Installation & Setup

Prerequisites

Directory Structure

Architecture

Base Classes

Common Features

Flexible Programme Input

Single programme by name

Single programme by ID

Multiple programmes

Mixed input

Configuration Options

Data Management

Consistent API Pattern

Basic usage

Advanced usage with filters

Detailed Usage Guide

1. Projects Fetcher (SEDIA_GET_PROJECTS)

Basic usage

With detailed project enrichment

2. Participants Fetcher (SEDIA_GET_PARTICIPANTS)

Fetch all participants for EDF programme

Multiple programmes

3. Funding & Tenders Fetcher (SEDIA_GET_FUNDING_TENDERS)

Open grants for Horizon Europe

All tenders regardless of programme

With additional filters

4. Topics Fetcher (SEDIA_GET_TOPICS)

Single topic

Multiple topics

5. FAQ Fetcher (SEDIA_GET_FAQ)

FAQ index for specific programme

FAQ index with detailed content

Specific FAQ details by NID

Quick Start Demo

Programme IDs Reference

Advanced Usage

Custom Query Parameters

Funding & Tenders with custom filters

Error Handling

Memory Management

Disable metadata flattening for faster processing

Process in smaller chunks

Data Processing Pipeline

Load and process cached data

Apply custom flattening

Clean empty containers

Output Files

Important Notes

API Rate Limits

Data Size Considerations

Memory Usage

Contributing

License

Support

Owner

GitHub Events

Total

Last Year

Packages

pypi.org: sedia-api-fetchers

Rankings

Maintainers (1)

Dependencies

1. Projects Fetcher (`SEDIA_GET_PROJECTS`)

2. Participants Fetcher (`SEDIA_GET_PARTICIPANTS`)

3. Funding & Tenders Fetcher (`SEDIA_GET_FUNDING_TENDERS`)

4. Topics Fetcher (`SEDIA_GET_TOPICS`)

5. FAQ Fetcher (`SEDIA_GET_FAQ`)