https://github.com/yotambraun/apdtflow

APDTFlow: A Modular Forecasting Framework for Time Series Data

https://github.com/yotambraun/apdtflow

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

data-science deep-learning forecasting machine-learning time-series
Last synced: 5 months ago · JSON representation

Repository

APDTFlow: A Modular Forecasting Framework for Time Series Data

Basic Info
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
data-science deep-learning forecasting machine-learning time-series
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

APDTFlow: A Modular Forecasting Framework for Time Series Data

APDTFlow Logo

PyPI version License: MIT Downloads Python Versions CI Coverage

APDTFlow is a modern and extensible forecasting framework for time series data that leverages advanced techniques including neural ordinary differential equations (Neural ODEs), transformer-based components, and probabilistic modeling. Its modular design allows researchers and practitioners to experiment with multiple forecasting models and easily extend the framework for new methods.

APDTFlow Forecast

Experiment Results

In our mega experiment we compared multiple forecasting models across different forecast horizons using 3-fold cross‑validation. For brevity, below we show two key plots:

  1. Validation Loss Comparison: A bar plot comparing the average validation losses of the models (APDTFlow, TransformerForecaster, TCNForecaster, and EnsembleForecaster) across forecast horizons.
  2. Example Forecast (Horizon 7, CV Split 3): A forecast plot for the APDTFlow model for a 7-step forecast from CV split 3.

Validation Loss Comparison

The bar plot below summarizes the average validation losses (lower is better) for the different models across the forecast horizons (7, 10, and 30 time steps):

Validation Loss Comparison

Explanation:
This plot shows that the APDTFlow model (and possibly the ensemble) generally achieved lower validation losses compared to the other models, especially for longer forecast horizons. This indicates that its multi-scale decomposition and neural ODE dynamics are well-suited for capturing the trends and seasonal patterns in the dataset.

Discussion:
The plot demonstrates that, overall, the APDTFlow model (and, in some cases, the ensemble) tend to achieve lower validation losses—particularly as the forecast horizon increases.

Performance vs. Forecast Horizon

The following line plot illustrates how the performance (average validation loss) of each model changes with different forecast horizons. This visualization helps to assess which models maintain consistent performance as the forecast horizon increases.

Performance vs. Horizon

Discussion:
The line plot reveals the trend in model performance across forecast horizons. It helps us understand which models degrade gracefully (or even improve) as the forecast horizon lengthens.

Example Forecast (Horizon 7, CV Split 3)

Below is an example forecast produced by the APDTFlow model for a forecast horizon of 7 time steps on the third cross-validation split.

APDTFlow Forecast Horizon 7, CV3

Discussion: - Input Sequence (Blue): The historical data (last 30 time steps) used as input. - True Future (Dashed Orange): The actual future values for the next 7 time steps. - Predicted Future (Dotted Line): The forecast generated by the model.


For a detailed explanation, more plots, and *additional analysis of these results*, please see our Experiment Results and Analysis document.

Table of Contents

  1. Installation
  2. Quick Start
  3. Data Processing and Augmentation
  4. Forecasting Approaches
  5. Evaluation and Metrics
  6. Command-Line Interface (CLI)
  7. Cross-Validation Strategies
  8. Documentation and Examples
  9. License

1. Installation

APDTFlow is published on PyPI. To install:

bash pip install apdtflow For development, clone the repository and install in editable mode:

bash git clone https://github.com/yotambraun/APDTFlow.git cd APDTFlow pip install -e .

New Features in This Release

  • Learnable Time Series Embedding:
    APDTFlow now includes a TimeSeriesEmbedding module that learns to encode temporal information using gated residual networks. This module processes both raw time indices and periodic signals (and optionally calendar features) to produce a rich embedding that improves the subsequent forecasting performance.

  • New Configuration Options:
    In apdtflow/config/config.yaml, you can now specify:

    • use_embedding: Set to true to enable the new embedding.
    • embed_dim: The embedding dimension (recommended to match hidden_dim).

2. Quick Start

Training

Below is an example script to train the APDTFlow model on your dataset:

```python import torch from torch.utils.data import DataLoader from apdtflow.data import TimeSeriesWindowDataset from apdtflow.models.apdtflow import APDTFlow

csvfile = "datasetexamples/ElectricProduction.csv" dataset = TimeSeriesWindowDataset(csvfile, datecol="DATE", valuecol="IPG2211A2N", Tin=12, Tout=3) trainloader = DataLoader(dataset, batchsize=16, shuffle=True)

model = APDTFlow( numscales=3, inputchannels=1, filtersize=5, hiddendim=16, outputdim=1, forecasthorizon=3, useembedding=True ) device = torch.device("cuda" if torch.cuda.isavailable() else "cpu") model.to(device)

model.trainmodel( trainloader=trainloader, numepochs=15, learning_rate=0.001, device=device ) ```

Inference

Use the following example to run inference on new data:

```python import torch from torch.utils.data import DataLoader from apdtflow.data import TimeSeriesWindowDataset from apdtflow.models.apdtflow import APDTFlow

testdataset = TimeSeriesWindowDataset( csvfile="path/to/dataset.csv", datecol="DATE", valuecol="VALUE", Tin=12, Tout=3 ) testloader = DataLoader(testdataset, batch_size=16, shuffle=False)

model = APDTFlow( numscales=3, inputchannels=1, filtersize=5, hiddendim=16, outputdim=1, forecasthorizon=3, useembedding=True ) device = torch.device("cuda" if torch.cuda.isavailable() else "cpu") model.to(device)

checkpointpath = "path/to/checkpoint.pt" model.loadstatedict(torch.load(checkpointpath, map_location=device))

metrics = model.evaluate(test_loader, device, metrics=["MSE", "MAE", "RMSE", "MAPE"]) print("Evaluation Metrics:", metrics) ```

3. Data Processing and Augmentation

APDTFlow provides robust functions to process and augment your time series data. Key features include: * Date Conversion: Automatically converts date columns to datetime objects. * Gap Filling: Reindexes data to ensure a consistent time frequency. * Missing Value Imputation: Supports methods such as forward-fill, backward-fill, mean substitution, and interpolation. * Feature Engineering: Generates lag features and rolling statistics to enhance predictive performance. * Data Augmentation: Offers techniques like jittering, scaling, and time warping to improve model robustness.


4. Forecasting Approaches

APDTFlow includes several advanced forecasting strategies:

APDTFlow

The APDTFlow model integrates: - Multi-Scale Decomposition: Decomposes the input signal into multiple resolutions. - Neural ODE Dynamics: Models continuous latent state evolution using Neural ODEs. - Probabilistic Fusion: Merges latent representations while quantifying uncertainty. - Transformer-Based Decoding: Generates forecasts using a time-aware attention mechanism. Key parameters include T_in, T_out, num_scales, filter_size, hidden_dim, and forecast_horizon.

TransformerForecaster

Leverages the Transformer architecture to capture long-range dependencies using self‑attention. This approach is ideal for complex temporal patterns where context from many time steps is relevant.

TCNForecaster

Based on Temporal Convolutional Networks, the TCNForecaster uses dilated convolutions and residual connections to efficiently capture local and medium-range dependencies.

EnsembleForecaster

Combines predictions from multiple forecasting models (such as APDTFlow, TransformerForecaster, and TCNForecaster) using aggregation strategies (e.g., weighted averaging) to improve overall forecast robustness and accuracy.

Core Model Parameters Explained:

For a comprehensive description of each model's architecture and additional details, please see the Model Architectures Documentation. When configuring APDTFlow, several parameters play key roles in how the model processes and forecasts time series data. Here’s what they mean:

  • T_in (Input Sequence Length): This parameter specifies the number of past time steps the model will use as input. For example, if T_in=12, the model will use the previous 12 observations to make a forecast.
  • T_out (Forecast Horizon): This parameter defines the number of future time steps to predict. For instance, if T_out=3, the model will output predictions for the next 3 time steps.
  • num_scales: APDTFlow employs a multi-scale decomposition technique to capture both global and local trends in the data. The num_scales parameter determines how many scales (or resolutions) the input signal will be decomposed into. A higher number of scales may allow the model to capture more complex temporal patterns, but it could also increase computational complexity.
  • filter_size: This parameter is used in the convolutional component (or dynamic convolution) within the model’s decomposer module. It defines the size of the convolutional filter applied to the input signal, thereby affecting the receptive field. A larger filter size allows the model to consider a broader context in the time series but may smooth out finer details.
  • forecast_horizon: This parameter is used within the model to indicate the number of future time steps that the decoder will produce. It should match T_out to ensure consistency between the training data and the model's output.
  • hidden_dim: The size of the hidden state in the dynamics module and decoder. This parameter controls the capacity of the model to learn complex representations. Increasing hidden_dim may improve the model’s performance, but at the cost of additional computational resources and potential overfitting if not tuned properly.

5. Evaluation and Metrics

APDTFlow incorporates a flexible evaluation framework that supports several performance metrics including: - Mean Squared Error (MSE) - Mean Absolute Error (MAE) - Root Mean Squared Error (RMSE) - Mean Absolute Percentage Error (MAPE)

These metrics are computed via a dedicated evaluator, which can be extended with additional metrics as needed. Usage Example: ```python from apdtflow.evaluation.regression_evaluator import RegressionEvaluator

evaluator = RegressionEvaluator(metrics=["MSE", "MAE", "RMSE", "MAPE"]) results = evaluator.evaluate(predictions, targets) print("MSE:", results["MSE"], "MAE:", results["MAE"], "RMSE:", results["RMSE"], "MAPE:", results["MAPE"]) ```


6. Command-Line Interface (CLI)

For ease of use, APDTFlow provides a command‑line interface that allows you to run training, evaluation, and inference directly from the terminal. The CLI accepts various parameters to configure the forecasting process without modifying the code.

Available Commands:

  • apdtflow train: – Train a forecasting model.
  • apdtflow infer: – Run inference using a saved checkpoint.

Example Usage:

```bash

Train a model (using the learnable embedding, which is enabled by default)

apdtflow train --csvfile path/to/dataset.csv --datecol DATE --valuecol VALUE --Tin 12 --Tout 3 --numscales 3 --filtersize 5 --hiddendim 16 --batchsize 16 --learningrate 0.001 --numepochs 15 --checkpointdir ./checkpoints

Alternatively, disable the learnable embedding by adding the flag:

apdtflow train --csvfile path/to/dataset.csv --datecol DATE --valuecol VALUE --Tin 12 --Tout 3 --numscales 3 --filtersize 5 --hiddendim 16 --batchsize 16 --learningrate 0.001 --numepochs 15 --checkpointdir ./checkpoints --no_embedding

Run inference (ensure that the embedding setting matches what was used during training)

apdtflow infer --csvfile path/to/dataset.csv --datecol DATE --valuecol VALUE --Tin 12 --Tout 3 --checkpointpath ./checkpoints/APDTFlowcheckpoint.pt --batchsize 16 ```


7. Cross-Validation Strategies

To ensure robust forecasting evaluation, APDTFlow includes a Cross-Validation Factory that supports: - Rolling Splits: Moving training and validation windows over time. - Expanding Splits: Increasing training window size while keeping validation size constant. - Blocked Splits: Dividing the dataset into contiguous blocks.

Usage Example:

```python from apdtflow.cv_factory import TimeSeriesCVFactory from torch.utils.data import Dataset

class SampleDataset(Dataset): def init(self, length=100): self.data = list(range(length)) def len(self): return len(self.data) def getitem(self, idx): return self.data[idx]

dataset = SampleDataset() cvfactory = TimeSeriesCVFactory(dataset, method="rolling", trainsize=40, valsize=10, stepsize=10) splits = cvfactory.getsplits() print("Cross-Validation Splits:", splits) ```


8. Documentation and Examples

For comprehensive documentation—including user guides, API references, and example notebooks—please visit the docs directory. The examples provide step-by-step instructions for data preprocessing, model training, evaluation, and inference.

APDTFlow Documentation

Model Architectures Documentation

Experiment Results and Analysis

Configuration and YAML Files


9. License

APDTFlow is licensed under the MIT License. For more details, please refer to the LICENSE file.

Owner

  • Name: yotam braun
  • Login: yotambraun
  • Kind: user

GitHub Events

Total
  • Watch event: 5
  • Push event: 66
  • Fork event: 1
  • Create event: 3
Last Year
  • Watch event: 5
  • Push event: 66
  • Fork event: 1
  • Create event: 3

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 162 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 24
  • Total maintainers: 1
pypi.org: apdtflow

APDTFlow: A modular forecasting framework for time series data

  • Versions: 24
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 162 Last month
Rankings
Dependent packages count: 9.7%
Average: 32.0%
Dependent repos count: 54.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

apdtflow.egg-info/requires.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • pyyaml *
  • tensorboard *
  • torch *
  • torchdiffeq *
pyproject.toml pypi
requirements.txt pypi
  • matplotlib *
  • numpy *
  • pandas *
  • pyyaml *
  • tensorboard *
  • torch *
  • torchdiffeq *
setup.py pypi
  • matplotlib *
  • numpy *
  • pandas *
  • pyyaml *
  • tensorboard *
  • torch *
  • torchdiffeq *