https://github.com/lamastex/spark-trend-calculus

To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Last synced: 8 months ago · JSON representation

Repository

To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation

Basic Info

Host: GitHub
Owner: lamastex
License: apache-2.0
Language: Scala
Default Branch: master
Size: 5.4 MB

Statistics

Stars: 6
Watchers: 3
Forks: 1
Open Issues: 9
Releases: 4

Created almost 6 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

Spark-Trend-Calculus

How to cite this work:

Antoine Amend, Johannes Graner, Andrew Morgan and Raazesh Sainudiin (2020-2021). A Scalable Library for Trader-Perceived Financial Events in an Interval-valued Time Series for a Trend-Calculus. https://github.com/lamastex/spark-trend-calculus/

Acknowledgements

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The work in 2020 was partly supported by Combient Mix AB through Data Engineering Science Summer Internships.

Many thanks to Andrew Morgan and Antoine Amend.

Quick Links

To detect trends in time series using Andrew Morgan's trend calculus algorithms in Apache Spark and Scala from Antoine Amend's initial implementation.

Andrew's codes and documentation on how to study trends:

https://github.com/ByteSumoLtd/TrendCalculus-lua
https://github.com/bytesumo/TrendCalculus/blob/master/HowToStudyTrends_v1.03.pdf

Antoine's codes used to start this library:

https://github.com/aamend/texata-r2-2017

Example use cases:

https://github.com/lamastex/spark-gdelt-examples
https://github.com/lamastex/spark-trend-calculus-examples

Usage Abstract

See Antoine's GitHub for details on how to use his implementation of Trend Calculus, mainly contained in TrendCalculus.scala.

The scalable and streamable implementation in Apache Spark is given in TrendCalculus2.scala.

The basic use case is to transform the input time series to a Dataset[org.lamastex.spark.trendcalculus.TickerPoint] where TickerPoint is a case class in Point.scala consisting of a ticker ticker (String), a timestamp x (java.sql.Timestamp) and a value y (Double).

This Dataset is then used in the constructor of a TrendCalculus2 object together with a window size (minimum 2) and a SparkSession object.

The TrendCalculus2 object has a method reversals that can be called to (lazily) compute the reversals using the Trend Calculus algorithm.

Given input as the DataFrame inputDF with columns ticker, x, y, the whole call looks like

``` import org.lamastex.spark.trendcalculus._

val spark: SparkSession = ... val inputDF: DataFrame = ... val windowSize: Int = ...

val reversalDS = new TrendCalculus2(inputDF.select("ticker", "x", "y").as[TickerPoint], windowSize, spark).reversals ```

This also works when inputDF is a streaming DataFrame using Spark Structured Streaming.

For more detailed examples, see https://github.com/lamastex/spark-trend-calculus-examples.

Included parsers

Parsers for 1-minute foreign exchange data (https://github.com/philipperemy/FX-1-Minute-Data) and stock market data from yfinance (https://github.com/ranaroussi/yfinance). The data from yfinance requires some processing in python before being accepted by the parser.

Foreign Exchange parser

The scala function is parseFX and has as input a string formatted as

"DateTime Stamp;Bar OPEN Bid Quote;Bar HIGH Bid Quote;Bar LOW Bid Quote;Bar CLOSE Bid Quote;Volume"

where DateTime Stamp is formatted as yyyyMMdd HHmmSS. Open, High, Low and Close are floating point numbers and Volume is an integer (which seems to always be 0).

To read an FX-1-Minute csv directly to an apache spark Dataset, one can use spark.read.fx1m(filePath).

Yahoo! Finance parser

The scala function is parseYF and has as input a string formatted as

"DateTime Stamp,Open,High,Low,Close,Adj Close,Volume"

where DateTime Stamp is formatted either as yyyy-MM-dd or beginning with yyyy-MM-dd HH:mm:SS (i.e. 2020-07-09 18:05:00+02:00 is valid but UTC+2 2020-07-09 18:05:00 is not). Open, High, Low, Close and Adj Close are floating point numbers and Volume can be either an integer or a floating point number.

To read a yfinance csv directly to an apache spark Dataset, one can use spark.read.yfin(filePath).

Owner

Name: Raazesh Sainudiin
Login: lamastex
Kind: user
Location: Uppsala, Sweden
Company: lamastex.org

Website: https://lamastex.github.io/
Repositories: 18
Profile: https://github.com/lamastex

I work at the interface of mathematics, computing and statistics. This inter-disciplinary research aims broadly to use computers to solve real-world problems.

GitHub Events

Total

Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 96
Average time to close issues: N/A
Average time to close pull requests: about 1 month
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.48
Merged pull requests: 36
Bot issues: 0
Bot pull requests: 74

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

dependabot[bot] (44)
johannes-graner (16)
AlbertNilsson (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (44)

Dependencies

pom.xml maven

joda-time:joda-time 2.10.13 provided
org.apache.spark:spark-core_2.12 3.2.1 provided
org.apache.spark:spark-mllib_2.12 3.2.1 provided
org.scala-lang:scala-library 2.12.14 provided
org.scala-lang:scala-reflect 2.12.14 provided
org.scalatest:scalatest_2.12 3.3.0-SNAP3 test

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/lamastex/spark-trend-calculus

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Spark-Trend-Calculus

How to cite this work:

Acknowledgements

Quick Links

Usage Abstract

Included parsers

Foreign Exchange parser

Yahoo! Finance parser

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies