clowder

A data management system that allows users to share, annotate, organize and analyze large collections of datasets. It provides support for extensible metadata annotation using JSON-LD and a distribute analytics event bus for automatic curation of uploaded data.

https://github.com/clowder-framework/clowder

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A data management system that allows users to share, annotate, organize and analyze large collections of datasets. It provides support for extensible metadata annotation using JSON-LD and a distribute analytics event bus for automatic curation of uploaded data.

Basic Info
  • Host: GitHub
  • Owner: clowder-framework
  • License: ncsa
  • Language: JavaScript
  • Default Branch: develop
  • Homepage: https://clowderframework.org/
  • Size: 52 MB
Statistics
  • Stars: 39
  • Watchers: 12
  • Forks: 19
  • Open Issues: 165
  • Releases: 39
Created over 6 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Zenodo

README.md

Clowder: Open Source Data Management for Long Tail Data

DOI

A customizable and scalable data management system you can install in the cloud or on your own hardware. More information is available at https://clowderframework.org/.

Running Clowder

There are differet options to run clowder, the easiest and quickest way to get started is by using docker. To start clowder you will need to have docker and docker-compose installed. Once docker is installed you can start clowder using docker-compose and the docker compose file. To launch clowder you will run docker-compose up -d which will download all the containers and start them in the right order. The docker-compose file will launch all required containers, a web proxy as well as three example extractors.

The proxy in the docker-compose file will allow you to access not only clowder, but also will give you access to ther other aspects of clowder. For example if you run clowder on you local machine you can access the different aspects of clowder using the following urls:

The proxy can also generate a SSL certificate using Let's Encrypt.

Advanced Installation of Clowder

You will have to manually install dependencies such as MongoDB, RabbitMQ and ElasticSearch. Once the dependencies have installed you can either run a precompiled version of clowder which can be downloaded from NCSA. More information on this method is available on the wiki. You can also run clowder from the source code using sbt dist.

Configuring Clowder

All customizations to a clowder installation can be placed in the custom folder. This makes it easier for upgrades, since all you need to copy over between different clowder versions is the custom folder (and the logs folder if you want to keep all logs). The default configuration values are stored in the application.conf file in the conf folder of clowder. You can override these by placing them in a custom.conf file inside the custom folder. Plugins that should be enabled can be placed in the play.plugins file in the custom folder.

Customizing Clowder in Docker

In the case of docker you can override some of the values used by clowder as well as all the other containers using a .env file that is placed in the same folder as the docker-compose.yml file. Docker-compose when starting will use this file for environment variables specified in the docker-compose.yml file. Docker-compose will first use the environment variables set when starting the program, next it will look in the .env file, and finally it will use any default values specified in the docker-compose.yml file. The env.example file lists all variables that can be set as well as a short description of what it does. You can for example use this file to setup Let's Encrypt, or tell clowder to use different security keys instead of the defaults.

Initializing Clowder

Once clowder has started you will need to create an account. This account can be created using a docker container. You can start it with docker run -ti --rm --network clowder_clowder clowder/mongo-init. The container will ask for an email address, name, password as well as if this user should be admin (true). Once the container finishes running, you can login to clowder.

Extractors

To run clowder with some example extractors (image, video, pdf and audio) you can start the docker version of clowder using docker-compose -f docker-compose.yml -f docker-compose.extractors.yml -f docker-compose.override.yml up -d. This will start the full stack with a few extractors. After a few minutes the extractors should automatically be registered with clowder.

For a full list of metadata extractors you can deploy to your instance, please take a look at the
NCSA repositories or the Brown Dog wiki. If you have extractors available somewhere else, please get in touch with the team so we can add them these lists.

Support

For general questions you can write to the mailing list clowder@lists.illinois.edu or join the Slack workspace.

If you have found a bug, please check that it hasn't been filed already and if not open an issue on GitHub or Jira.

Contributing

For contributing to Clowder see CONTRIBUTING.md. If you have new ideas and you want to start developing please check into Slack to get feedback from other developers. If you want to contribute to the documentation please follow the same workflow or let the community know of external resources you want to share by advertising on the mailing list or in Slack.

License

This software is licensed under the NCSA Open Source license, an open source license based on the MIT/X11 license and the 3-clause BSD license.

Owner

  • Name: Clowder
  • Login: clowder-framework
  • Kind: organization
  • Email: clowder@lists.illinois.edu

Research data management for long tail data.

Citation (citation.cff)

cff-version: 1.2.0
message: If you use this software, please cite it using these metadata.
title: "Clowder: Open Source Data Management for Long Tail Data"
abstract: "A customizable and scalable data management system you can install in the cloud or on your own hardware."
type: software
version: "1.22.1"
license: "NCSA"
repository-code: "https://github.com/clowder-framework/clowder"
keywords:
  - data-management
  - cyberinfrastructure
  - clowder
  - open-data
  - open-science
preferred-citation:
  type: article
  title: "Clowder: Open Source Data Management for Long Tail Data"
  abstract: "Clowder is an open source data management system to support data curation of long tail data and metadata across multiple research domains and diverse data types. Institutions and labs can install and customize their own instance of the framework on local hardware or on remote cloud computing resources to provide a shared service to distributed communities of researchers. Data can be ingested directly from instruments or manually uploaded by users and then shared with remote collaborators using a web front end. We discuss some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas including digital preservation, geoscience, material science, medicine, social science, cultural heritage and the arts. Some of these challenges include support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, ability to provide new data visualizations in the web browser, a comprehensive Web service API for automatic data ingestion and curation, a suite of social annotation and metadata management features to support data annotation by communities of users and algorithms, and a web based front-end to interact with code running on heterogeneous clusters, including HPC resources."
  isbn: 9781450364461
  publisher:
    name: "Association for Computing Machinery"
  doi: 10.1145/3219104.3219159
  collection-title: "Proceedings of the Practice and Experience on Advanced Research Computing"
  keywords:
    - scientific gateways
    - metadata management
    - linked data
    - data management
    - data curation
  location:
    name: "Pittsburgh, PA, USA"
  year: 2018
  authors:
    - family-names: Marini
      given-names: Luigi
    - family-names: Gutierrez-Polo
      given-names: Indira
    - family-names: Kooper
      given-names: Rob
    - family-names: Satheesan
      given-names: Sandeep Puthanveetil
    - family-names: Burnette
      given-names: Maxwell
    - family-names: Lee
      given-names: Jong
    - family-names: Nicholson
      given-names: Todd
    - family-names: Zhao
      given-names: Yan
    - family-names: McHenry
      given-names: Kenton
references:
  - institution: "National Science Foundation" 
    number: "#BCS-0941268" 
  - institution: "National Science Foundation"  
    number: "#EAR- 331906"
  - institution: "National Science Foundation"  
    number: "#ACI-1261582"
  - institution: "National Science Foundation"  
    number: "#ACI-1443013"
  - institution: "National Science Foundation"  
    number: "#OCI-0940824"
  - institution: "National Science Foundation"  
    number: "#OCI-0525308"
  - institution: "National Science Foundation"  
    number: "#OAC-1835834"
  - institution: "National Institutes of Health"
    number: "#1P01AI089556-01A1"
  - institution: "Illinois - Indiana Sea Grant"
    number: "#DW92329201"
  - institution: "European Commission"
    number: "#RI-261600"
  - institution: "XSEDE"
    number: "#OCI-1053575"
  - institution: "ARPA-E"
    number: "#DE-AR0000594"
authors:
  - family-names: "Marini"
    given-names: "Luigi"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
    orcid: "0000-0002-8511-0211"
  - family-names: "Kooper"
    given-names: "Rob"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
    orcid: "0000-0002-5781-7287"
  - family-names: "Gutierrez"
    given-names: "Indira"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
    orcid: "0000-0001-5684-3419"
  - family-names: "Sophocleous"
    given-names: "Constantinos"
  - family-names: "Burnette"
    given-names: "Max"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Nicholson"
    given-names: "Todd"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Ondrejcek"
    given-names: "Michal"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Zhang"
    given-names: "Bing"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Zharnitsky"
    given-names: "Inna"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Puthanveetil Satheesan"
    given-names: "Sandeep Puthanveetil"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
    orcid: "0000-0001-9075-3740"
  - family-names: "Padhy"
    given-names: "Smruti"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Zhao"
    given-names: "Yan"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Liu"
    given-names: "Rui"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Vaidya"
    given-names: "Ashwini"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Myers"
    given-names: "Jim"
    orcid: "0000-0001-8462-650X"
  - family-names: "Felarca"
    given-names: "Mario"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Angelo"
    given-names: "Brock"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Roeder"
    given-names: "Gene"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Lee"
    given-names: "Jong"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Hennessy"
    given-names: "Will"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Issaranon"
    given-names: "Theerasit"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Guo"
    given-names: "Yibo"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Yuan"
    given-names: "Xiaocheng"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Kethineedi"
    given-names: "Varun"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Kumar"
    given-names: "Avinash"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Nayudu"
    given-names: "Nishant"
    affiliation: "University of Illinois at Urbana-Champaign"
  - family-names: "Poelmans"
    given-names: "Ward"
    affiliation: "Center for Molecular Modeling, Ghent University"
  - family-names: "Jansz"
    given-names: "Winston"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Jansen"
    given-names: "Gregory"
    affiliation: "College of Information Studies, University of Maryland"
  - family-names: "Navarro"
    given-names: "Chris"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Pitcel"
    given-names: "Michelle"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Tenczar"
    given-names: "Nicholas"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Wang"
    given-names: "Chen"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Lambert"
    given-names: "Mike"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "McHenry"
    given-names: "Kenton"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
    orcid: "0000-0003-0367-2550"
  - family-names: "Habib"
    given-names: "Aaraj"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Galewsky"
    given-names: "Ben"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Constantinou"
    given-names: "Chrysovalantis"
  - family-names: "Karimi-Asli"
    given-names: "Kaveh"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Tzima"
    given-names: "Maria-Spyridoula"
  - family-names: "Johnson"
    given-names: "Michael"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Bobak"
    given-names: "Mike"
    affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
  - family-names: "Yardley"
    given-names: "Tim"

GitHub Events

Total
  • Create event: 4
  • Release event: 1
  • Issues event: 1
  • Watch event: 3
  • Delete event: 1
  • Issue comment event: 3
  • Push event: 11
  • Pull request review event: 2
  • Pull request event: 8
  • Fork event: 3
Last Year
  • Create event: 4
  • Release event: 1
  • Issues event: 1
  • Watch event: 3
  • Delete event: 1
  • Issue comment event: 3
  • Push event: 11
  • Pull request review event: 2
  • Pull request event: 8
  • Fork event: 3

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 5 months
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.6
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: 17 days
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tcnichol (4)
  • max-zilla (2)
  • taywll252 (1)
  • markendr (1)
Pull Request Authors
  • dependabot[bot] (3)
  • tcnichol (3)
  • lmarini (2)
  • robkooper (2)
  • ddey2 (1)
  • atomsos (1)
Top Labels
Issue Labels
bug (6) enhancement (2)
Pull Request Labels
dependencies (3) enhancement (1)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v1 composite
  • actions/checkout v2 composite
  • actions/setup-java v1 composite
  • actions/setup-python v1 composite
  • actions/upload-artifact v2 composite
  • robkooper/sftp-action master composite
  • svenstaro/upload-release-action 1.1.0 composite
  • mongo 3.6 docker
.github/workflows/docker.yml actions
  • actions/checkout v3 composite
  • docker/build-push-action v2 composite
  • docker/login-action v2 composite
  • docker/setup-buildx-action v2 composite
  • docker/setup-qemu-action v2 composite
  • peter-evans/dockerhub-description v2 composite
.github/workflows/swagger.yml actions
  • actions/checkout v2 composite
  • mbowman100/swagger-validator-action master composite
Dockerfile docker
  • openjdk 8-jdk-bullseye build
  • openjdk 8-jre-bullseye build
doc/src/sphinx/Pipfile pypi
  • recommonmark *
  • sphinx *
  • sphinx-rtd-theme *
doc/src/sphinx/Pipfile.lock pypi
  • alabaster ==0.7.12
  • babel ==2.9.1
  • certifi ==2021.5.30
  • chardet ==4.0.0
  • commonmark ==0.9.1
  • docutils ==0.17.1
  • idna ==2.10
  • imagesize ==1.2.0
  • jinja2 ==3.0.1
  • markupsafe ==2.0.1
  • packaging ==20.9
  • pygments ==2.9.0
  • pyparsing ==2.4.7
  • pytz ==2021.1
  • recommonmark ==0.6.0
  • requests ==2.25.1
  • snowballstemmer ==2.1.0
  • sphinx ==3.1.2
  • sphinx-rtd-theme ==0.5.0
  • sphinxcontrib-applehelp ==1.0.2
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==2.0.0
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.5
  • urllib3 ==1.26.5