https://github.com/datacite/shiba-inu

Pipeline for DOI Resolution Logs procesing

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Pipeline for DOI Resolution Logs procesing

Basic Info

Host: GitHub
Owner: datacite
License: mit
Language: Ruby
Default Branch: master
Size: 256 KB

Statistics

Stars: 5
Watchers: 5
Forks: 6
Open Issues: 1
Releases: 0

Created about 8 years ago · Last pushed about 3 years ago

Metadata Files

Readme License

Pipeline for DOI Resolution Logs processing

[ Docker Build Status ]

Shiba-Inu is pipeline for DOI Resolution Logs processing. The pipeline processes DOI resolution logs following the Code of practice for research data usage metrics. Its based in Logstash.

The Shiba Inu is the smallest of the six original and distinct spitz breeds of dog from Japan.

Installation

Requirements

A Elasticsearch instance
Single line logs with DOI names.

One can run the logs processor using Docker. you will need to set the following enviroment variables:

``` ESHOST=http://elasticsearch:9200 ESINDEX=resolutions INPUTDIR=/usr/share/logstash/tmp/DataCite-access.log-201805 OUTPUTDIR=/usr/share/logstash/tmp/output.json LOGSTASH_HOST = localhost:9600

S3MERGEDLOGSBUCKET = /usr/share/logstash/monthlylogs S3RESOLUTIONLOGSBUCKET = /usr/share/logstash/ ELASTICPASSWORD=changeme LOGS_TAG=[Resolution Logs]

HUBTOKEN=eyJhbGciOiJSUzI1NiJ9 HUBURL=https://api.test.datacite.org ```

and run the container like this:

docker run -p 8090:9200 datacite/shiba-inu

Alternatively you can use docker-compose to use the log processor without an elasticsearch instace:

docker-compose up

Usage logs

Your logs need to fulling a 2 of requerimentes:

The logs must be single line logs.
MUST include the following data:
- doi => DOI name
- occurred_at => timestamp (ISO8601)
- clientip => IP address (IPV4 or IPV6)
- user_agent => user agent

You will need to provide the configuration of your log lines following the grok filter documentation. You can enter the configuration in the file /vendor/docker/log_configuration.tmpl.

For example for logs file with the following style:

```text 46.229.168.146 HTTP:HDL "2018-09-30 23:40:39.132Z" 1 1 3ms 10.5277/ppmp1850 "300:10.admin/codata" "" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8) Gecko/20051111 Firefox/1.5" 131.180.162.29 HTTP:HDL "2018-09-30 23:40:42.731Z" 1 1 71ms 10.4233/uuid:9798fb4a-9201-4efa-b324-3e50bbdc7ca5 "300:10.admin/codata" "" "" 131.180.162.29 HTTP:HDL "2018-09-30 23:40:44.846Z" 1 100 111ms 10.4233/uuid:a92fc858-da92-4339-8f80-b608aaa09741 "" "" ""

``` One would need the following configuration:

```logstash

"^%{IP:clientip} (?(HTTP:HDL)) %{QS:occurredat} %{INT:ld} %{INT:respcode} (?((.+ms))) %{DOI:doi} %{QS:server} %{QS:something} %{QS:user_agent}"

```

How to create reports

There are 3 basics steps to create a report.

Copy your usage logs to /usage_logs
Trigger the logs processing.
Generate the report.

1. Copying the usage logs

The logs processor is restricted to processes logs in a monthly basis and with individual files or ordered files. You would need to merge all your logs in a single file or rename them in order. Logs files must be places in /usage_logs.

2. Trigger the logs processing

The logs processor will start working automatically once a new logs get to the logs folder.

3. Generate the report.

Usage reports can be generated locally, pushed and/or streamed to the MDC Hub. We can use the kishu client for logs processing to generate a report in any of these ways. To run the kishu client you need to be inside the logstash docker container. The kishu client does not need paramaters about the report that need be generate (i.e. month) as automatically will generate the report with whatever is in the logs processor pipeline.

shell source /usr/local/rvm/scripts/rvm rvm user gemsets

To generate a usage report in JSON format following the Code of Practice for Usage Metrics, you can use the following command. This will generate a usage report in the folder /reports.

shell bundle exec kishu sushi generate_report --created_by {YOUR DATACITE CLIENT ID}

To generate and push a usage report in JSON format following the Code of Practice for Usage Metrics, you can use the following command.

shell bundle exec kishu sushi push_report --created_by {YOUR DATACITE CLIENT ID}

To stream a usage report in JSON format following the Code of Practice for Usage Metrics, you can use the following command. This option should be only used with reports with more than 50,000 datasets or larger than 10MB. We compress all reports that are streammed to the the MDC Hub.

shell bundle exec kishu sushi stream --created_by {YOUR DATACITE CLIENT ID} --schema resolution --aggs_size 200 --report_size 90000

Further information about parametrizing the streaming can be found in the kishu client.

Development

We use Rspec for unit and acceptance testing:

ruby -S bundle exec rspec

Follow along via Github Issues.

Note on Patches/Pull Requests

Fork the project
Write tests for your new feature or a test that reproduces a bug
Implement your feature or make a bug fix
Do not mess with Rakefile, version or history
Commit, push and make a pull request. Bonus points for topical branches.

License

shiba-inu is released under the MIT License.

Owner

Name: DataCite
Login: datacite
Kind: organization
Email: info@datacite.org

Website: https://www.datacite.org
Twitter: DataCite
Repositories: 111
Profile: https://github.com/datacite

Connecting research, identifying knowledge

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/datacite/shiba-inu

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Pipeline for DOI Resolution Logs processing

Installation

Usage logs

How to create reports

1. Copying the usage logs

2. Trigger the logs processing

3. Generate the report.

Development

Note on Patches/Pull Requests

License

Owner

GitHub Events

Total

Last Year