https://github.com/talariadb/talaria
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.0%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
TalariaDB is a distributed, highly available, and low latency time-series database for Presto
Basic Info
Statistics
- Stars: 225
- Watchers: 14
- Forks: 31
- Open Issues: 14
- Releases: 37
Topics
Metadata Files
README.md
Talaria
This repository contains a fork of TalariaDB, a distributed, highly available, and low latency time-series database for Big Data systems. It was originally designed and implemented in Grab, where millions and millions of transactions and connections take place every day , which requires a platform scalable data-driven decision making.
Introduction
TalariaDB helped us to overcome the challenge of retrieving and acting upon the information from large amounts of data. It addressed our need to query at least 2-3 terabytes of data per hour with predictable low query latency and low cost. Most importantly, it plays very nicely with the different tools’ ecosystems and lets us query data using SQL.
From the original design, we have extended Talaria to be setup in a two possible ways:
- As an event ingestion platform. This allows you to track events using a simple gRPC endpoint from almost anywhere.
- As a data store for hot data. This allows you to query hot data (e.g. last 6 hours) as it goes through the data pipeline and ultimately ends up in your data lake when compacted.
Talaria is designed around event-based data model. An event is essentially a set of key-value pairs, however to make it consistent we need to define a set of commonly used keys. Each event will consist of the following:
- Hash key (e.g: using "event" key). This represents the type of the event and could be prefixed with the source scope (eg. "table1") and using the dot as a logical separator. The separation and namespacing is not required, but strongly recommended to make your system more usable.
- Sort key (e.g: using "time" key). This represents the time at which the update has occurred, in unix timestamp (as precise as the source allows) and encoded as a 64-bit integer value.
- Other key-value pairs will represent various values of the columns.
Below is an example of what a payload for an event describing a table update might look like.
| KEY | VALUE | DATA TYPE |
|-------------|---------------------|-------------|
| event | table1.update | string |
| time | 1586500157 | int64 |
| column1 | hello | string |
| column2 | { "name": "roman" } | json |
Talaria supports string, int32, int64, bool, float64, timestamp and json data types which are used to construct columns that can be exposed to Presto/SQL.
Event Ingestion with Talaria
If your organisation needs a reliable and scalable data ingestion platform, you can set up Talaria as one. The main advantage is that such platform is cost-efficient, does not require a complex Kafka setup and even offers in-flight query if you also point a Presto on it. The basic setup allows you to track events using a simple gRPC endpoint from almost anywhere.

In order to setup Talaria as an ingestion platform, you will need specify a table, in this case "eventlog", and enable compaction in the configuration, something along these lines:
yaml
mode: staging
env: staging
domain: "talaria-headless.default.svc.cluster.local"
storage:
dir: "/data"
tables:
eventlog:
compact: # enable compaction
interval: 60 # compact every 60 seconds
nameFunc: "s3://bucket/namefunc.lua" # file name function
s3: # sink to Amazon S3
region: "ap-southeast-1"
bucket: "bucket"
...
Once this is set up, you can point a gRPC client (see protobuf definition) directly to the ingestion endpoint. Note that we also offer some pre-generated or pre-made ingestion clients in this repository.
service Ingress {
rpc Ingest(IngestRequest) returns (IngestResponse) {}
}
Below is a list of currently supported sinks and their example configurations:
- Amazon S3 using s3 sink.
- DigitalOcean Spaces using s3 sink, a custom endpoint and us-east-1 region.
- Google Cloud Storage using gcs sink.
- Local filesystem using file sink.
- Microsoft Azure Blob Storage using azure sink.
- Minio using s3 sink, a custom endpoint and us-east-1 region.
- Google Big Query using bigquery sink.
- Talaria itself using talaria sink.
For Microsoft Azure Blob Storage and Azure Data Lake Gen 2, we support writing across multiple storage accounts. We supports two modes:
- Random choice, where each write is directed to a storage account randomly, for which you can just specficy a list of storage accouts.
- Weighted choice, where a set of weights (positive integers) are assigned and each write is directed to a storage account based on the specified weights.
An example of weighted choice is shown below:
yaml
- azure:
container: a_container
prefix: a_prefix
blobServiceURL: .storage.microsoft.net
storageAccounts:
- a_storage_account
- b_storage_account
storageAccountWeights: [1, 2]
Random choice and weighed choice are particularly useful for some key scenarios:
- High throughput deployment where the I/O generate by Talaria exceedes the limitation of the stroage accounts.
- When deploying on internal endpoints with multiple VPNs links and you want to split the network traffic across multiple network links.
Hot Data Query with Talaria
If your organisation requires querying of either hot data (e.g. last n hours) or in-flight data (i.e as ingested), you can also configure Talaria to serve it to Presto using built-in Presto Thrift connector.

In the example configuration below we're setting up an s3 + sqs writer to continously ingest files from an S3 bucket and an "eventlog" table which will be exposed to Presto.
yaml
mode: staging
env: staging
domain: "talaria-headless.default.svc.cluster.local"
writers:
grpc:
port: 8080
s3sqs:
region: "ap-southeast-1"
queue: "queue-url"
waitTimeout: 1
retries: 5
readers:
presto:
schema: data
port: 8042
storage:
dir: "/data"
tables:
eventlog:
ttl: 3600 # data is persisted for 1 hour
hashBy: event
sortBy: time
...
Once you have set up Talaria, you'll need to configure Presto to talk to it using the Thrift Connector. You would need to make sure that: 1. In the properties file you have configured to talk to Talaria through a kubernetes load balancer. 2. Presto can access directly the nodes, without the load balancer.
Once this is done, you should be able to query your data via Presto.
sql
select *
from talaria.data.eventlog
where event = 'table1.update'
limit 1000
Ingesting Files Into Talaria
To ingest existing ORC, CSV or Parquet files from a storage URL (imagine S3 or Azure Blob Storage), use the Talaria File Ingestion Client:
https://github.com/atris/TalariaFileIngestionClient
Quick Start
The easiest way to get started would be using the provided helm chart.
Contributing
We are open to contributions, feel free to submit a pull request and we'll review it as quickly as we can. TalariaDB is maintained by: * Roman Atachiants * Yichao Wang * Chun Rong Phang * Ankit Kumar Sinha * Atri Sharma * Qiao Wei * Oscar Cassetti * Manoj Babu Katragadda * Jeffrey Lean
License
TalariaDB is licensed under the MIT License.
Owner
- Name: Talaria
- Login: talariadb
- Kind: organization
- Repositories: 2
- Profile: https://github.com/talariadb
GitHub Events
Total
- Watch event: 6
Last Year
- Watch event: 6
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Roman Atachiants | r****s@g****m | 101 |
| Roman Atachiants | r****s@g****m | 50 |
| Jeffrey lean | 5****n | 11 |
| Phang Chun Rong | c****g@g****m | 10 |
| atlas-booker | c****y@g****m | 10 |
| WangBeyond | w****d@g****m | 9 |
| Yichao Wang | y****g@g****m | 9 |
| Atri Sharma | a****t@g****m | 7 |
| Ankit Sinha | a****a@g****m | 7 |
| Chunrong Phang | c****g@g****m | 7 |
| Manoj Babu | m****t@g****m | 6 |
| TiewKH | t****5@h****m | 5 |
| Ankit Kumar Sinha | 4****a | 5 |
| Oscar Cassetti | o****g@g****m | 2 |
| Wei | 4****g | 2 |
| Ankit kumar sinha | a****n@m****t | 1 |
| Ankit kumar sinha | a****a@g****m | 1 |
| dependabot[bot] | 4****] | 1 |
| Steve M | g****g | 1 |
| Eng Zer Jun | e****n@g****m | 1 |
| stack_underFlow | v****2@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 25
- Total pull requests: 80
- Average time to close issues: 4 months
- Average time to close pull requests: 28 days
- Total issue authors: 7
- Total pull request authors: 13
- Average comments per issue: 2.28
- Average comments per pull request: 0.88
- Merged pull requests: 57
- Bot issues: 0
- Bot pull requests: 4
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- atlas-comstock (17)
- tardunge (3)
- crphang (1)
- panamafrancis (1)
- VicLin66 (1)
- gedw99 (1)
- kumarankit1234 (1)
Pull Request Authors
- atlas-comstock (12)
- jeffreylean (11)
- atris (10)
- kelindar (10)
- tardunge (8)
- ocassetti (6)
- TiewKH (6)
- dependabot[bot] (4)
- kumarankit1234 (4)
- a9kitkumarsinha (3)
- qiaowei-g (3)
- crphang (2)
- Juneezee (1)
- gearcog (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 37
proxy.golang.org: github.com/talariadb/talaria
- Documentation: https://pkg.go.dev/github.com/talariadb/talaria#section-documentation
- License: mit
-
Latest release: v1.6.1
published over 2 years ago
Rankings
Dependencies
- cloud.google.com/go v0.108.0
- cloud.google.com/go/bigquery v1.45.0
- cloud.google.com/go/compute v1.15.1
- cloud.google.com/go/compute/metadata v0.2.3
- cloud.google.com/go/iam v0.10.0
- cloud.google.com/go/pubsub v1.27.1
- cloud.google.com/go/storage v1.28.1
- github.com/Azure/azure-pipeline-go v0.2.3
- github.com/Azure/azure-sdk-for-go v42.1.0+incompatible
- github.com/Azure/azure-storage-blob-go v0.13.0
- github.com/Azure/go-autorest v14.2.0+incompatible
- github.com/Azure/go-autorest/autorest v0.11.17
- github.com/Azure/go-autorest/autorest/adal v0.9.11
- github.com/Azure/go-autorest/autorest/azure/auth v0.5.7
- github.com/Azure/go-autorest/autorest/azure/cli v0.4.2
- github.com/Azure/go-autorest/autorest/date v0.3.0
- github.com/Azure/go-autorest/autorest/to v0.3.0
- github.com/Azure/go-autorest/logger v0.2.0
- github.com/Azure/go-autorest/tracing v0.6.0
- github.com/DataDog/datadog-go v3.7.1+incompatible
- github.com/Knetic/govaluate v3.0.0+incompatible
- github.com/apache/thrift v0.13.0
- github.com/armon/go-metrics v0.3.3
- github.com/aws/aws-sdk-go v1.33.0
- github.com/beorn7/perks v1.0.1
- github.com/bool64/shared v0.1.4
- github.com/cespare/xxhash v1.1.0
- github.com/cespare/xxhash/v2 v2.1.1
- github.com/crphang/orc v0.0.7
- github.com/davecgh/go-spew v1.1.1
- github.com/dgraph-io/badger/v3 v3.2103.1
- github.com/dgraph-io/ristretto v0.1.0
- github.com/dgryski/go-farm v0.0.0-20200201041132-a6ae2369ad13
- github.com/dimchansky/utfbom v1.1.1
- github.com/dnaeon/go-vcr v1.0.1
- github.com/dustin/go-humanize v1.0.0
- github.com/emitter-io/address v1.0.0
- github.com/form3tech-oss/jwt-go v3.2.2+incompatible
- github.com/fraugster/parquet-go v0.3.0
- github.com/gogo/protobuf v1.3.2
- github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
- github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da
- github.com/golang/protobuf v1.5.2
- github.com/golang/snappy v0.0.3
- github.com/google/btree v1.0.0
- github.com/google/flatbuffers v1.12.0
- github.com/google/go-cmp v0.5.9
- github.com/google/uuid v1.3.0
- github.com/googleapis/enterprise-certificate-proxy v0.2.1
- github.com/googleapis/gax-go/v2 v2.7.0
- github.com/gopherjs/gopherjs v0.0.0-20200209183636-89e6cbcd0b6d
- github.com/gorilla/mux v1.8.0
- github.com/grab/async v0.0.5
- github.com/grpc-ecosystem/go-grpc-middleware v1.3.0
- github.com/hako/durafmt v0.0.0-20191009132224-3f39dc1ed9f4
- github.com/hashicorp/errwrap v1.0.0
- github.com/hashicorp/go-immutable-radix v1.2.0
- github.com/hashicorp/go-msgpack v0.5.5
- github.com/hashicorp/go-multierror v1.1.0
- github.com/hashicorp/go-sockaddr v1.0.2
- github.com/hashicorp/golang-lru v0.5.4
- github.com/hashicorp/memberlist v0.2.2
- github.com/iancoleman/orderedmap v0.2.0
- github.com/imroc/req v0.3.0
- github.com/jmespath/go-jmespath v0.3.0
- github.com/kelindar/binary v1.0.9
- github.com/kelindar/loader v0.0.11
- github.com/kelindar/lua v0.0.7
- github.com/klauspost/compress v1.15.10
- github.com/mattn/go-ieproxy v0.0.1
- github.com/matttproud/golang_protobuf_extensions v1.0.1
- github.com/miekg/dns v1.1.29
- github.com/minio/highwayhash v1.0.2
- github.com/mitchellh/go-homedir v1.1.0
- github.com/mroth/weightedrand v0.4.1
- github.com/myteksi/hystrix-go v1.1.3
- github.com/nats-io/jwt/v2 v2.3.0
- github.com/nats-io/nats-server/v2 v2.9.1
- github.com/nats-io/nats.go v1.17.0
- github.com/nats-io/nkeys v0.3.0
- github.com/nats-io/nuid v1.0.1
- github.com/pkg/errors v0.9.1
- github.com/pmezard/go-difflib v1.0.0
- github.com/prometheus/client_golang v1.7.1
- github.com/prometheus/client_model v0.2.0
- github.com/prometheus/common v0.10.0
- github.com/prometheus/procfs v0.1.3
- github.com/samuel/go-thrift v0.0.0-20191111193933-5165175b40af
- github.com/satori/go.uuid v1.2.0
- github.com/sean-/seed v0.0.0-20170313163322-e2103e2c3529
- github.com/sercand/kuberesolver/v3 v3.0.0
- github.com/sergi/go-diff v1.2.0
- github.com/smartystreets/goconvey v1.6.4
- github.com/spf13/afero v1.9.2
- github.com/stretchr/objx v0.5.0
- github.com/stretchr/testify v1.8.1
- github.com/swaggest/assertjson v1.7.0
- github.com/twmb/murmur3 v1.1.3
- github.com/yudai/gojsondiff v1.0.0
- github.com/yudai/golcs v0.0.0-20170316035057-ecda9a501e82
- github.com/yuin/gopher-lua v0.0.0-20191220021717-ab39c6098bdb
- go.nhat.io/grpcmock v0.20.0
- go.nhat.io/matcher/v2 v2.0.0
- go.opencensus.io v0.24.0
- go.uber.org/atomic v1.9.0
- golang.org/x/crypto v0.0.0-20220919173607-35f4265a4bc0
- golang.org/x/net v0.5.0
- golang.org/x/oauth2 v0.4.0
- golang.org/x/sync v0.1.0
- golang.org/x/sys v0.4.0
- golang.org/x/text v0.6.0
- golang.org/x/time v0.1.0
- golang.org/x/xerrors v0.0.0-20220907171357-04be3eba64a2
- google.golang.org/api v0.107.0
- google.golang.org/appengine v1.6.7
- google.golang.org/genproto v0.0.0-20230113154510-dbe35b8444a5
- google.golang.org/grpc v1.52.0
- google.golang.org/protobuf v1.28.1
- gopkg.in/yaml.v2 v2.4.0
- gopkg.in/yaml.v3 v3.0.1
- layeh.com/gopher-luar v1.0.7
- 730 dependencies
- actions/checkout v1 composite
- actions/setup-go v1 composite
- docker/login-action v1 composite
- actions/checkout v1 composite
- actions/setup-go v1 composite
- docker/login-action v1 composite
- actions/checkout v1 composite
- actions/setup-go v1 composite
- docker/login-action v1 composite
- actions/checkout v1 composite
- actions/setup-go v1 composite
- debian latest build
- golang 1.17 build
- org.apache.tomcat:annotations-api 6.0.53 compileOnly
- com.google.protobuf:protobuf-java-util ${protobufVersion} implementation
- io.grpc:grpc-protobuf ${grpcVersion} implementation
- io.grpc:grpc-stub ${grpcVersion} implementation
- io.grpc:grpc-netty-shaded ${grpcVersion} runtimeOnly
- io.grpc:grpc-testing ${grpcVersion} testImplementation
- io.grpc:grpc-testing * testImplementation
- org.junit.jupiter:junit-jupiter-api 5.8.2 testImplementation
- org.mockito:mockito-core 4.6.1 testImplementation
- org.junit.jupiter:junit-jupiter-engine * testRuntimeOnly
- grpcio >=1.36.0
- protobuf *