https://github.com/lamastex/spark-trend-calculus

To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation

https://github.com/lamastex/spark-trend-calculus

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.5%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation

Basic Info
  • Host: GitHub
  • Owner: lamastex
  • License: apache-2.0
  • Language: Scala
  • Default Branch: master
  • Size: 5.4 MB
Statistics
  • Stars: 6
  • Watchers: 3
  • Forks: 1
  • Open Issues: 9
  • Releases: 4
Created over 5 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License

README.md

Spark-Trend-Calculus

How to cite this work:

  • Antoine Amend, Johannes Graner, Andrew Morgan and Raazesh Sainudiin (2020-2021). A Scalable Library for Trader-Perceived Financial Events in an Interval-valued Time Series for a Trend-Calculus. https://github.com/lamastex/spark-trend-calculus/

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The work in 2020 was partly supported by Combient Mix AB through Data Engineering Science Summer Internships.

Many thanks to Andrew Morgan and Antoine Amend.

Quick Links

To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation.

Andrew's codes and documentation on how to study trends:

  • https://github.com/ByteSumoLtd/TrendCalculus-lua
  • https://github.com/bytesumo/TrendCalculus/blob/master/HowToStudyTrends_v1.03.pdf

Antoine's codes used to start this library:

  • https://github.com/aamend/texata-r2-2017

Example use cases:

  • https://github.com/lamastex/spark-gdelt-examples
  • https://github.com/lamastex/spark-trend-calculus-examples

Usage Abstract

See Antoine's GitHub for details on how to use his implementation of Trend Calculus, mainly contained in TrendCalculus.scala.

The scalable and streamable implementation in Apache Spark is given in TrendCalculus2.scala.

The basic use case is to transform the input time series to a Dataset[org.lamastex.spark.trendcalculus.TickerPoint] where TickerPoint is a case class in Point.scala consisting of a ticker ticker (String), a timestamp x (java.sql.Timestamp) and a value y (Double).

This Dataset is then used in the constructor of a TrendCalculus2 object together with a window size (minimum 2) and a SparkSession object.

The TrendCalculus2 object has a method reversals that can be called to (lazily) compute the reversals using the Trend Calculus algorithm.

Given input as the DataFrame inputDF with columns ticker, x, y, the whole call looks like

``` import org.lamastex.spark.trendcalculus._

val spark: SparkSession = ... val inputDF: DataFrame = ... val windowSize: Int = ...

val reversalDS = new TrendCalculus2(inputDF.select("ticker", "x", "y").as[TickerPoint], windowSize, spark).reversals ```

This also works when inputDF is a streaming DataFrame using Spark Structured Streaming.

For more detailed examples, see https://github.com/lamastex/spark-trend-calculus-examples.

Included parsers

Parsers for 1-minute foreign exchange data (https://github.com/philipperemy/FX-1-Minute-Data) and stock market data from yfinance (https://github.com/ranaroussi/yfinance). The data from yfinance requires some processing in python before being accepted by the parser.

Foreign Exchange parser

The scala function is parseFX and has as input a string formatted as

"DateTime Stamp;Bar OPEN Bid Quote;Bar HIGH Bid Quote;Bar LOW Bid Quote;Bar CLOSE Bid Quote;Volume"

where DateTime Stamp is formatted as yyyyMMdd HHmmSS. Open, High, Low and Close are floating point numbers and Volume is an integer (which seems to always be 0).

To read an FX-1-Minute csv directly to an apache spark Dataset, one can use spark.read.fx1m(filePath).

Yahoo! Finance parser

The scala function is parseYF and has as input a string formatted as

"DateTime Stamp,Open,High,Low,Close,Adj Close,Volume"

where DateTime Stamp is formatted either as yyyy-MM-dd or beginning with yyyy-MM-dd HH:mm:SS (i.e. 2020-07-09 18:05:00+02:00 is valid but UTC+2 2020-07-09 18:05:00 is not). Open, High, Low, Close and Adj Close are floating point numbers and Volume can be either an integer or a floating point number.

To read a yfinance csv directly to an apache spark Dataset, one can use spark.read.yfin(filePath).

Owner

  • Name: Raazesh Sainudiin
  • Login: lamastex
  • Kind: user
  • Location: Uppsala, Sweden
  • Company: lamastex.org

I work at the interface of mathematics, computing and statistics. This inter-disciplinary research aims broadly to use computers to solve real-world problems.

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 96
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 month
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.48
  • Merged pull requests: 36
  • Bot issues: 0
  • Bot pull requests: 74
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • dependabot[bot] (44)
  • johannes-graner (16)
  • AlbertNilsson (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (44)

Dependencies

pom.xml maven
  • joda-time:joda-time 2.10.13 provided
  • org.apache.spark:spark-core_2.12 3.2.1 provided
  • org.apache.spark:spark-mllib_2.12 3.2.1 provided
  • org.scala-lang:scala-library 2.12.14 provided
  • org.scala-lang:scala-reflect 2.12.14 provided
  • org.scalatest:scalatest_2.12 3.3.0-SNAP3 test