clowder
A data management system that allows users to share, annotate, organize and analyze large collections of datasets. It provides support for extensible metadata annotation using JSON-LD and a distribute analytics event bus for automatic curation of uploaded data.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Repository
A data management system that allows users to share, annotate, organize and analyze large collections of datasets. It provides support for extensible metadata annotation using JSON-LD and a distribute analytics event bus for automatic curation of uploaded data.
Basic Info
- Host: GitHub
- Owner: clowder-framework
- License: ncsa
- Language: JavaScript
- Default Branch: develop
- Homepage: https://clowderframework.org/
- Size: 52 MB
Statistics
- Stars: 39
- Watchers: 12
- Forks: 19
- Open Issues: 165
- Releases: 39
Metadata Files
README.md
Clowder: Open Source Data Management for Long Tail Data
A customizable and scalable data management system you can install in the cloud or on your own hardware. More information is available at https://clowderframework.org/.
Running Clowder
There are differet options to run clowder, the easiest and quickest way to get started is by using docker. To
start clowder you will need to have docker and docker-compose installed. Once docker is installed you can start
clowder using docker-compose and the docker compose file. To launch clowder you will run
docker-compose up -d which will download all the containers and start them in the right order. The
docker-compose file will launch all required containers, a web proxy as well as
three example extractors.
The proxy in the docker-compose file will allow you to access not only clowder, but also will give you access to ther other aspects of clowder. For example if you run clowder on you local machine you can access the different aspects of clowder using the following urls:
The proxy can also generate a SSL certificate using Let's Encrypt.
Advanced Installation of Clowder
You will have to manually install dependencies such as MongoDB,
RabbitMQ and ElasticSearch. Once the dependencies have
installed you can either run a precompiled version of clowder which can be downloaded from
NCSA. More information on this method
is available on the wiki.
You can also run clowder from the source code using sbt dist.
Configuring Clowder
All customizations to a clowder installation can be placed in the custom folder. This makes it easier for upgrades, since all you need to copy over between different clowder versions is the custom folder (and the logs folder if you want to keep all logs). The default configuration values are stored in the application.conf file in the conf folder of clowder. You can override these by placing them in a custom.conf file inside the custom folder. Plugins that should be enabled can be placed in the play.plugins file in the custom folder.
Customizing Clowder in Docker
In the case of docker you can override some of the values used by clowder as well as all the other containers using a .env file that is placed in the same folder as the docker-compose.yml file. Docker-compose when starting will use this file for environment variables specified in the docker-compose.yml file. Docker-compose will first use the environment variables set when starting the program, next it will look in the .env file, and finally it will use any default values specified in the docker-compose.yml file. The env.example file lists all variables that can be set as well as a short description of what it does. You can for example use this file to setup Let's Encrypt, or tell clowder to use different security keys instead of the defaults.
Initializing Clowder
Once clowder has started you will need to create an account. This account can be created using a docker
container. You can start it with docker run -ti --rm --network clowder_clowder clowder/mongo-init. The
container will ask for an email address, name, password as well as if this user should be admin (true).
Once the container finishes running, you can login to clowder.
Extractors
To run clowder with some example extractors (image, video, pdf and audio) you can start the docker version
of clowder using docker-compose -f docker-compose.yml -f docker-compose.extractors.yml
-f docker-compose.override.yml up -d. This will start the full stack with a few extractors. After a few
minutes the extractors should automatically be registered with clowder.
For a full list of metadata extractors you can deploy to your instance, please take a look at the
NCSA repositories or
the Brown Dog wiki.
If you have extractors available somewhere else, please get in touch with the team so we can add them these lists.
Support
For general questions you can write to the mailing list clowder@lists.illinois.edu or join the Slack workspace.
If you have found a bug, please check that it hasn't been filed already and if not open an issue on GitHub or Jira.
Contributing
For contributing to Clowder see CONTRIBUTING.md. If you have new ideas and you want to start developing please check into Slack to get feedback from other developers. If you want to contribute to the documentation please follow the same workflow or let the community know of external resources you want to share by advertising on the mailing list or in Slack.
License
This software is licensed under the NCSA Open Source license, an open source license based on the MIT/X11 license and the 3-clause BSD license.
Owner
- Name: Clowder
- Login: clowder-framework
- Kind: organization
- Email: clowder@lists.illinois.edu
- Website: https://clowderframework.org/
- Repositories: 30
- Profile: https://github.com/clowder-framework
Research data management for long tail data.
Citation (citation.cff)
cff-version: 1.2.0
message: If you use this software, please cite it using these metadata.
title: "Clowder: Open Source Data Management for Long Tail Data"
abstract: "A customizable and scalable data management system you can install in the cloud or on your own hardware."
type: software
version: "1.22.1"
license: "NCSA"
repository-code: "https://github.com/clowder-framework/clowder"
keywords:
- data-management
- cyberinfrastructure
- clowder
- open-data
- open-science
preferred-citation:
type: article
title: "Clowder: Open Source Data Management for Long Tail Data"
abstract: "Clowder is an open source data management system to support data curation of long tail data and metadata across multiple research domains and diverse data types. Institutions and labs can install and customize their own instance of the framework on local hardware or on remote cloud computing resources to provide a shared service to distributed communities of researchers. Data can be ingested directly from instruments or manually uploaded by users and then shared with remote collaborators using a web front end. We discuss some of the challenges encountered in designing and developing a system that can be easily adapted to different scientific areas including digital preservation, geoscience, material science, medicine, social science, cultural heritage and the arts. Some of these challenges include support for large amounts of data, horizontal scaling of domain specific preprocessing algorithms, ability to provide new data visualizations in the web browser, a comprehensive Web service API for automatic data ingestion and curation, a suite of social annotation and metadata management features to support data annotation by communities of users and algorithms, and a web based front-end to interact with code running on heterogeneous clusters, including HPC resources."
isbn: 9781450364461
publisher:
name: "Association for Computing Machinery"
doi: 10.1145/3219104.3219159
collection-title: "Proceedings of the Practice and Experience on Advanced Research Computing"
keywords:
- scientific gateways
- metadata management
- linked data
- data management
- data curation
location:
name: "Pittsburgh, PA, USA"
year: 2018
authors:
- family-names: Marini
given-names: Luigi
- family-names: Gutierrez-Polo
given-names: Indira
- family-names: Kooper
given-names: Rob
- family-names: Satheesan
given-names: Sandeep Puthanveetil
- family-names: Burnette
given-names: Maxwell
- family-names: Lee
given-names: Jong
- family-names: Nicholson
given-names: Todd
- family-names: Zhao
given-names: Yan
- family-names: McHenry
given-names: Kenton
references:
- institution: "National Science Foundation"
number: "#BCS-0941268"
- institution: "National Science Foundation"
number: "#EAR- 331906"
- institution: "National Science Foundation"
number: "#ACI-1261582"
- institution: "National Science Foundation"
number: "#ACI-1443013"
- institution: "National Science Foundation"
number: "#OCI-0940824"
- institution: "National Science Foundation"
number: "#OCI-0525308"
- institution: "National Science Foundation"
number: "#OAC-1835834"
- institution: "National Institutes of Health"
number: "#1P01AI089556-01A1"
- institution: "Illinois - Indiana Sea Grant"
number: "#DW92329201"
- institution: "European Commission"
number: "#RI-261600"
- institution: "XSEDE"
number: "#OCI-1053575"
- institution: "ARPA-E"
number: "#DE-AR0000594"
authors:
- family-names: "Marini"
given-names: "Luigi"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
orcid: "0000-0002-8511-0211"
- family-names: "Kooper"
given-names: "Rob"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
orcid: "0000-0002-5781-7287"
- family-names: "Gutierrez"
given-names: "Indira"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
orcid: "0000-0001-5684-3419"
- family-names: "Sophocleous"
given-names: "Constantinos"
- family-names: "Burnette"
given-names: "Max"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Nicholson"
given-names: "Todd"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Ondrejcek"
given-names: "Michal"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Zhang"
given-names: "Bing"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Zharnitsky"
given-names: "Inna"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Puthanveetil Satheesan"
given-names: "Sandeep Puthanveetil"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
orcid: "0000-0001-9075-3740"
- family-names: "Padhy"
given-names: "Smruti"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Zhao"
given-names: "Yan"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Liu"
given-names: "Rui"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Vaidya"
given-names: "Ashwini"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Myers"
given-names: "Jim"
orcid: "0000-0001-8462-650X"
- family-names: "Felarca"
given-names: "Mario"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Angelo"
given-names: "Brock"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Roeder"
given-names: "Gene"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Lee"
given-names: "Jong"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Hennessy"
given-names: "Will"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Issaranon"
given-names: "Theerasit"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Guo"
given-names: "Yibo"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Yuan"
given-names: "Xiaocheng"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Kethineedi"
given-names: "Varun"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Kumar"
given-names: "Avinash"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Nayudu"
given-names: "Nishant"
affiliation: "University of Illinois at Urbana-Champaign"
- family-names: "Poelmans"
given-names: "Ward"
affiliation: "Center for Molecular Modeling, Ghent University"
- family-names: "Jansz"
given-names: "Winston"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Jansen"
given-names: "Gregory"
affiliation: "College of Information Studies, University of Maryland"
- family-names: "Navarro"
given-names: "Chris"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Pitcel"
given-names: "Michelle"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Tenczar"
given-names: "Nicholas"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Wang"
given-names: "Chen"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Lambert"
given-names: "Mike"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "McHenry"
given-names: "Kenton"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
orcid: "0000-0003-0367-2550"
- family-names: "Habib"
given-names: "Aaraj"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Galewsky"
given-names: "Ben"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Constantinou"
given-names: "Chrysovalantis"
- family-names: "Karimi-Asli"
given-names: "Kaveh"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Tzima"
given-names: "Maria-Spyridoula"
- family-names: "Johnson"
given-names: "Michael"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Bobak"
given-names: "Mike"
affiliation: "National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign"
- family-names: "Yardley"
given-names: "Tim"
GitHub Events
Total
- Create event: 4
- Release event: 1
- Issues event: 1
- Watch event: 3
- Delete event: 1
- Issue comment event: 3
- Push event: 11
- Pull request review event: 2
- Pull request event: 8
- Fork event: 3
Last Year
- Create event: 4
- Release event: 1
- Issues event: 1
- Watch event: 3
- Delete event: 1
- Issue comment event: 3
- Push event: 11
- Pull request review event: 2
- Pull request event: 8
- Fork event: 3
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 5 months
- Total issue authors: 1
- Total pull request authors: 3
- Average comments per issue: 0.0
- Average comments per pull request: 0.6
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: 17 days
- Issue authors: 1
- Pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tcnichol (4)
- max-zilla (2)
- taywll252 (1)
- markendr (1)
Pull Request Authors
- dependabot[bot] (3)
- tcnichol (3)
- lmarini (2)
- robkooper (2)
- ddey2 (1)
- atomsos (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v1 composite
- actions/checkout v2 composite
- actions/setup-java v1 composite
- actions/setup-python v1 composite
- actions/upload-artifact v2 composite
- robkooper/sftp-action master composite
- svenstaro/upload-release-action 1.1.0 composite
- mongo 3.6 docker
- actions/checkout v3 composite
- docker/build-push-action v2 composite
- docker/login-action v2 composite
- docker/setup-buildx-action v2 composite
- docker/setup-qemu-action v2 composite
- peter-evans/dockerhub-description v2 composite
- actions/checkout v2 composite
- mbowman100/swagger-validator-action master composite
- openjdk 8-jdk-bullseye build
- openjdk 8-jre-bullseye build
- recommonmark *
- sphinx *
- sphinx-rtd-theme *
- alabaster ==0.7.12
- babel ==2.9.1
- certifi ==2021.5.30
- chardet ==4.0.0
- commonmark ==0.9.1
- docutils ==0.17.1
- idna ==2.10
- imagesize ==1.2.0
- jinja2 ==3.0.1
- markupsafe ==2.0.1
- packaging ==20.9
- pygments ==2.9.0
- pyparsing ==2.4.7
- pytz ==2021.1
- recommonmark ==0.6.0
- requests ==2.25.1
- snowballstemmer ==2.1.0
- sphinx ==3.1.2
- sphinx-rtd-theme ==0.5.0
- sphinxcontrib-applehelp ==1.0.2
- sphinxcontrib-devhelp ==1.0.2
- sphinxcontrib-htmlhelp ==2.0.0
- sphinxcontrib-jsmath ==1.0.1
- sphinxcontrib-qthelp ==1.0.3
- sphinxcontrib-serializinghtml ==1.1.5
- urllib3 ==1.26.5