recurrence-mimicking-learning

Hyper-efficient Offline Recurrent Reinforcement Learning Algorithm. It solves decision path of any length without sequential processing. Implemented for Sharpe Ratio optimization as a base problem.

https://github.com/tomwitkowski/recurrence-mimicking-learning

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

deep-learning offline-reinforcement-learning recurrent-reinforcement-learning reinforcement-learning sharpe-ratio-optimization trajectory-optimization

Last synced: 9 months ago · JSON representation ·

Repository

Hyper-efficient Offline Recurrent Reinforcement Learning Algorithm. It solves decision path of any length without sequential processing. Implemented for Sharpe Ratio optimization as a base problem.

Basic Info

Host: GitHub
Owner: tomWitkowski
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 28.3 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

deep-learning offline-reinforcement-learning recurrent-reinforcement-learning reinforcement-learning sharpe-ratio-optimization trajectory-optimization

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Recurrence Mimicking Learning (RML)

This repository contains code for Recurrence Mimicking Learning (RML) experiments described in our article. The method aims to optimize a global reward (such as the Sharpe Ratio) by mimicking how recurrent decisions would unfold over a time series, but without incurring the usual high cost of repeated model executions.

Method Overview

In brief, RML uses a single feedforward pass to generate actions for each time step as if they were generated recurrently. It does so by stacking the input $X$ multiple times along all possible previous actions $a_{t-1}$. After generating a stacked output, a lightweight re-indexing step ($\phi$-processing) reconstructs a trajectory of decisions that mirrors the recurrent process. This allows a direct calculation of the global reward (e.g., Sharpe Ratio) with only two forward passes, rather than $T$ passes in a traditional offline RRL.

Repository Structure

src/
Source code with modules for data loading/preprocessing, different reinforcement learning methods (offline RRL, online RRL, RML), and separate training scripts.
pyproject.toml
Project dependencies.
.gitignore
Standard Python and OS ignore patterns.

Minimal Usage Example

Install dependencies: bash bash install.sh
Jak can configure experiments with: bash config.py
To run time comparison with Offline RRL (RLSTM-A): bash python experiment_time.py
To run exactness comparison with Offline RRL (RLSTM-A): bash python experiment_exactness.py ## Reference

For the complete description of the method, mathematical details, and experiments, see our article THE REFERENCE TO ADD

Owner

Login: tomWitkowski
Kind: user

Repositories: 5
Profile: https://github.com/tomWitkowski

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Recurrence Mimicking Learning
message: >-
  This repository contains an implementation of Recurrence
  Mimicking Learning (RML).
type: software
authors:
  - given-names: Tomasz
    family-names: Witkowski
    email: tomasz.witkowski1@edu.uekat.pl
    affiliation: University of Economics in Katowice
    orcid: 'https://orcid.org/0000-0001-9648-9098'
repository-code: >-
  https://github.com/tomWitkowski/recurrence-mimicking-learning
abstract: >+
  In brief, RML uses a single feedforward pass to generate
  actions for each time step as if they were generated
  recurrently. It does so by stacking the input X multiple
  times along all possible previous actions a_(t-1). After generating a stacked output, a lightweight
  re-indexing step (ϕ-processing) reconstructs a trajectory
  of decisions that mirrors the recurrent process. This
  allows a direct calculation of the global reward (e.g.,
  Sharpe Ratio) with only two forward passes, rather than T
  passes in a traditional offline RRL.
keywords:
  - recurrence mimicking learning
  - recurrent reinforcement learning
  - offline reinforcement learning
  - recurrent classification
  - sharpe ratio
license: MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science