https://github.com/timescale/outflux

Export data from InfluxDB to TimescaleDB

https://github.com/timescale/outflux

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

influx migration time-series timescaledb
Last synced: 5 months ago · JSON representation

Repository

Export data from InfluxDB to TimescaleDB

Basic Info
  • Host: GitHub
  • Owner: timescale
  • License: apache-2.0
  • Language: Go
  • Default Branch: main
  • Homepage:
  • Size: 320 KB
Statistics
  • Stars: 98
  • Watchers: 17
  • Forks: 23
  • Open Issues: 11
  • Releases: 5
Topics
influx migration time-series timescaledb
Created about 7 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

Outflux - Migrate InfluxDB to TimescaleDB

Go Report Card

This repo contains code for exporting complete InfluxDB databases or selected measurements to TimescaleDB.

Table of Contents

  1. Installation
  2. How to use
  3. Connection
  4. Known limitations

Installation

Binary releases

We provide binaries for GNU/Linux, Windows and MacOS with each release, these can be found under releases. To use outflux, download the binary, extract the compressed tarball and run the executable.

bash wget https://github.com/timescale/outflux/releases/download/v0.3.0/outflux_0.3.0_Linux_x86_64.tar.gz tar xf outflux_0.3.0_Linux_x86_64.tar.gz ./outflux --help

Installing from source

Outflux is a Go project managed by go modules. You can download it in any directory and on the first build it will download it's required dependencies.

Depending on where you downloaded it and the go version you're using, you may need to set the GO111MODULE to auto, on or off. Learn about the GO111MODULE environment variable here.

```bash

Fetch the source code of Outflux in any directory

$ git clone git@github.com:timescale/outflux.git $ cd ./outflux

Install the Outflux binary (will automaticly detect and download)

dependencies.

$ cd cmd/outflux $ GO111MODULE=auto go install

Building without installing will also fetch the required dependencies

$ GO111MODULE=auto go build ./... ```

How to use

Outflux supports InfluxDB 1.x.

Outflux should support using the 1.x query APIs for InfluxDB 2.x and 3.x. You will need to enable the 1.x APIs to use them. Consult the InfluxDB documentation for more details.

Before using it

It is recommended that you have and InfluxDB database with some data. For testing purposes you can check out the TSBS Data Loader Tool, which is part of the Time Series Benchmark Suite. It can generate large amounts of data to load into influx. Data can be generated with one command, just specify the format as 'influx', and then load it in with another command.

Connection params

Detailed information about how to pass the connection parameters to Outflux can be found at the bottom of this document at the Connection section.

Schema Transfer

The Outflux CLI has two commands. The first one is schema-transfer. This command discoverx the schema of an InfluxDB database, or specific measurements in an InfluxDB database, and (depending on the strategy selected) create or verify a TimescaleDB database that could hold the data.

The possible flags for the command can be seen by running:

bash $ cd $GOPATH/bin/ $ ./outflux schema-transfer --help

Usage is outflux schema-transfer database [measure1 measure2 ...] [flags], where database is the name of the InfluxDB database you wish to export, [measure1 ...] are optional and if specified will export only those measurements from the selected database. Additionally, you can specify the retention policy with the retention-policy flag.

For example outflux schema-transfer benchmark cpu mem will discover the schema for the cpu and mem measurements from the benchmark database.

Available flags for schema-transfer are:

| flag | type | default | description | |---------------------------|---------|-----------------------|-------------| | input-server | string | http://localhost:8086 | Location of the input database, http(s)://location:port. | | input-pass | string | | Password to use when connecting to the input database | | input-user | string | | Username to use when connecting to the input database | | input-unsafe-https | bool | false | Should 'InsecureSkipVerify' be passed to the input connection | | retention-policy | string | autogen | The retention policy to select the tags and fields from | | output-conn | string | sslmode=disable | Connection string to use to connect to the output database| | output-schema | string | | The schema of the output database that the data will be inserted into | | schema-strategy | string | CreateIfMissing | Strategy to use for preparing the schema of the output database. Valid options: ValidateOnly, CreateIfMissing, DropAndCreate, DropCascadeAndCreate | | tags-as-json | bool | false | If this flag is set to true, then the Tags of the influx measures being exported will be combined into a single JSONb column in Timescale | | tags-column | string | tags | When tags-as-json is set, this column specifies the name of the JSON column for the tags | | fields-as-json | bool | false | If this flag is set to true, then the Fields of the influx measures being exported will be combined into a single JSONb column in Timescale | | fields-column | string | fields | When fields-as-json is set, this column specifies the name of the JSON column for the fields | | multishard-int-float-cast | bool | false | If a field is Int64 in one shard, and Float64 in another, with this flag it will be cast to Float64 despite possible data loss | | quiet | bool | false | If specified will suppress any log to STDOUT |

Migrate

The second command of the Outflux CLI is migrate. The possible flags for the command can be seen by running:

bash $ cd $GOPATH/bin/ $ ./outflux migrate --help

Usage is outflux migrate database [measure1 measure2 ...] [flags], where database is the name of the InfluxDB database you wish to export, [measure1 measure2 ...] are optional and if specified will export only those measurements from the selected database.

The retention policy can be specified with the retention-policy flag. By default, the 'autogen' retention policy is used.

For example outflux migrate benchmark cpu mem will export the cpu and mem measurements from the benchmark database. On the other hand outflux migrate benchmark will export all measurements in the benchmark database.

Available flags are:

| flag | type | default | description| |----------------------------|---------|-----------------------|------------| | input-server | string | http://localhost:8086 | Location of the input database, http(s)://location:port. | | input-pass | string | | Password to use when connecting to the input database | | input-user | string | | Username to use when connecting to the input database | | input-unsafe-https | bool | false | Should 'InsecureSkipVerify' be passed to the input connection | | retention-policy | string | autogen | The retention policy to select the data from | | limit | uint64 | 0 | If specified will limit the export points to its value. 0 = NO LIMIT | | from | string | | If specified will export data with a timestamp >= of its value. Accepted format: RFC3339 | | to | string | | If specified will export data with a timestamp <= of its value. Accepted format: RFC3339 | | output-conn | string | sslmode=disable | Connection string to use to connect to the output database| | output-schema | string | public | The schema of the output database that the data will be inserted into. | | schema-strategy | string | CreateIfMissing | Strategy to use for preparing the schema of the output database. Valid options: ValidateOnly, CreateIfMissing, DropAndCreate, DropCascadeAndCreate | | chunk-size | uint16 | 15000 | The export query will request data in chunks of this size. Must be > 0 | | batch-size | uint16 | 8000 | The size of the batch inserted in to the output database | | data-buffer | uint16 | 15000 | Size of the buffer holding exported data ready to be inserted in the output database | | max-parallel | uint8 | 2 | Number of parallel measure extractions. One InfluxDB measure is exported using 1 worker | | rollback-on-external-error | bool | true | If set, when an error occurs while extracting the data, the insertion will be rollbacked. Otherwise it will try to commit | | tags-as-json | bool | false | If this flag is set to true, then the Tags of the influx measures being exported will be combined into a single JSONb column in Timescale | | tags-column | string | tags | When tags-as-json is set, this column specifies the name of the JSON column for the tags | | fields-as-json | bool | false | If this flag is set to true, then the Fields of the influx measures being exported will be combined into a single JSONb column in Timescale | | fields-column | string | fields | When fields-as-json is set, this column specifies the name of the JSON column for the fields | | multishard-int-float-cast | bool | false | If a field is Int64 in one shard, and Float64 in another, with this flag it will be cast to Float64 despite possible data loss | | quiet | bool | false | If specified will suppress any log to STDOUT |

Examples

  • Use environment variables for determining output db connection bash $ PGPORT=5433 $ PGDATABASE=test $ PGUSER=test ... $ ./outflux schema-transfer benchmark

  • Export the complete 'benchmark' database on 'localhost:8086' to the 'targetdb' database on localhost:5432. Use environment variable to set InfluxDB password

```bash $ PGDATABASE=somedefaultdb $ INFLUX_PASSWORD=test ... $ outflux migrate benchmark \

--input-user=test \ --input-pass=test \ --output-conn='dbname=targetdb user=test password=test' \ ```

  • Export only measurement 'cpu' from 'twoweek' retention policy in the 'benchmark' database. Drop the existing '"twoweek.cpu"' table in 'targetdb' if exists, create if not ```bash $ outflux migrate benchmark two_week.cpu \

    --input-user=test \ --input-pass=test \ --output-conn='dbname=targetdb user=test pass=test'\ --schema-strategy=DropAndCreate ```

  • Export only the 1,000,000 rows from measurements 'cpu' and 'mem' from 'benchmark', starting from Jan 14th 2019 09:00 ```bash $ ./outflux migrate benchmark cpu mem \

    --input-user=test \ --input-pass=test \ --limit=1000000 \ --from=2019-01-01T09:00:00Z ```

Connection

TimescaleDB connection params

The connection parameters to the TimescaleDB instance can be passed to Outflux in several ways. One is through the Postgres Environment Variables. Supported environment variables are: PGHOST, PGPORT, PGDATABASE, PGUSER, PGPASSWORD, PGSSLMODE, PGSSLCERT, PGSSLKEY, PGSSLROOTCERT, PGAPPNAME, PGCONNECT_TIMEOUT. If they are not specified defaults used are: host=localhost, dbname=postgres, pguser=$USER, and sslmode=disable.

The values of the enviroment variables can be OVERRIDEN by specifying the '--output-conn' flag when executing Outflux.

The connection string can be in the format URI or DSN format: * example URI: "postgresql://username:password@host:port/dbname?connect_timeout=10" * example DSN: "user=username password=password host=1.2.3.4 port=5432 dbname=mydb sslmode=disable"

InfluxDB connection params

The connection parameters to the InfluxDB instance can be passed also through flags or environment variables. Supported/Expected environment variables are: INFLUX_USERNAME, INFLUX_PASSWORD. These are the same environment variables that the InfluxDB CLI uses.

If they are not set, or if you wish to override them, you can do so with the --input-user and --input-pass. Also you can specify to Outflux to skip HTTPS verification when communicating with the InfluxDB server by setting the --input-unsafe-https flag to true.

Known limitations

Fields with different data types across shards

Outflux doesn't support fields that have the same name but different data types across shards in InfluxDB, UNLESS the field is an integer and float in the InfluxDB shards. InfluxDB can store the fields as integer (64bit integer), float (64bit float), string, and boolean. You can specify the multishard-int-float-cast flag. This will tell Outflux to cast the integer values to float values. A 64bit float can't hold all the int64 values, so this may result in scrambled data (for values > 2^53).

If the same field is of any of the other possible InfluxDB types, an error will be thrown, since the values can't be converted.

This is also an issue even if you select a time interval in which a field has a consistent type, but exists as a different type in a shard outside of that interval. This is because the SHOW FIELD KEYS FROM measurement_name doesn't accept a time interval for which you would be asking

Owner

  • Name: Timescale
  • Login: timescale
  • Kind: organization

GitHub Events

Total
  • Issues event: 1
  • Watch event: 8
  • Fork event: 2
Last Year
  • Issues event: 1
  • Watch event: 8
  • Fork event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 30
  • Total pull requests: 76
  • Average time to close issues: 5 months
  • Average time to close pull requests: 4 months
  • Total issue authors: 26
  • Total pull request authors: 14
  • Average comments per issue: 1.83
  • Average comments per pull request: 0.54
  • Merged pull requests: 62
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ptrcarta (3)
  • mahlonsmith (2)
  • komal-lunkad (2)
  • lukas-bernert (1)
  • allanneoh (1)
  • andre-luiz-dos-santos (1)
  • oldrichsmejkal (1)
  • lazzarello (1)
  • aimtsou (1)
  • K-Iwatani (1)
  • gabrielmocan (1)
  • BubbleMilkTea (1)
  • erlfos (1)
  • morganchristiansson (1)
  • Abhastimescale (1)
Pull Request Authors
  • atanasovskib (55)
  • JamesGuthrie (9)
  • toanalien (1)
  • danielhoherd (1)
  • lazzarello (1)
  • apgiorgi (1)
  • alejandrodnm (1)
  • axxelG (1)
  • petetnt (1)
  • RobAtticus (1)
  • mfreed (1)
  • freeznet (1)
  • patriczek (1)
Top Labels
Issue Labels
bug (6) enhancement (5) fixed (4) documentation (2) question (1)
Pull Request Labels
bug (3) fixed (2) enhancement (2)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total docker downloads: 20
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
proxy.golang.org: github.com/timescale/outflux
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Docker Downloads: 20
Rankings
Docker downloads count: 0.8%
Stargazers count: 5.0%
Forks count: 5.1%
Average: 5.7%
Dependent packages count: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago

Dependencies

go.mod go
  • github.com/cockroachdb/apd v1.1.0
  • github.com/influxdata/influxdb v1.7.3
  • github.com/influxdata/platform v0.0.0-20190117200541-d500d3cf5589
  • github.com/jackc/fake v0.0.0-20150926172116-812a484cc733
  • github.com/jackc/pgx v3.3.0+incompatible
  • github.com/lib/pq v1.0.0
  • github.com/pkg/errors v0.8.1
  • github.com/shopspring/decimal v0.0.0-20180709203117-cd690d0c9e24
  • github.com/spf13/cobra v0.0.3
  • github.com/spf13/pflag v1.0.3
  • github.com/stretchr/testify v1.3.0
  • golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4
go.sum go
  • 287 dependencies
docker-compose-integration.yml docker
  • influxdb latest
  • timescale/timescaledb latest