https://github.com/bluebrain/bbp-atlas-data-fetch
CLI to fetch datasets from Nexus
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary
Keywords
Repository
CLI to fetch datasets from Nexus
Basic Info
Statistics
- Stars: 1
- Watchers: 4
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Description
This module is a (Python) CLI in charge of fetching datasets from Nexus, one file (or payload) at the time. It can fetch payloads and save them as JSON files, or it can fetch binaries (distributions) linked to resources.
There is mainly two ways of fetching a piece of data: - using the @id of a resource - using filters on the resource properties to narrow down the selection and eventually find the relevant dataset
When using filters (see below), a SPARQL query is dynamically generated, allowing graph traversal.
Install
pip install "bba-data-fetch"
From now on, the executable bba-data-fetch is in your PATH.
Usage
Inputs
There is no input apart from configuration and filter/id. See the CLI arguments section for more info.
Outputs
This CLI writes on disc one of the two depending on the arguments provided: - a JSON file that corresponds to the targeted resource payload, - a copy of the distribution file linked in the targeted resource (can be any kind of file).
CLI arguments
- --version - [flag] Display the version
- --help - [flag] Display help
- --verbose - [flag] Enables verbose mode to print the generated SPARQL query and response
- --nexus-token-file /path/to/token.txt - [single string] The path to the text file that contains the Nexus token. Mandatory
- --nexus-env https://bbp.epfl.ch/nexus/v1/ - [single string] The URL to the Nexus environment. Mandatory
- --nexus-org bbp - [single string] The name of the Nexu organization to look for a resource. Mandatory
- --nexus-id someidprobably_uuid - [single string] The @id of the Nexus resource to fetch. Optional, but necessary if --filter is not provided
- --payload - [flag] Fetch the payload as a JSON file. Optional, the default behavior is to fetch the file linked by the distribution.contentUrl property.
- **--favor - [multiple string] Payload properties and values with the format <'properties:value'> (ex: 'name:1.json') which will be used to determine which file to choose when retrieving a distribution from a resource with multiple distributions. Optional
- --out /some/file.json - [single string] Path to the output file to create. The extension has to be .json if the flag --payload is provided. Otherwise, the extension must be the same as the distant file. Mandatory
- --keep-meta - [flag] if --payload is provided, the JSON file will not contain the Nexus/JSON-LD system properties. If this flag is provided, the system metadata are kept
- --rev n - [number] The revision argument is mainly to be used along --nexus-id to fetch a specific revision of a given resource. Optional, fetches the last revision if not provided
- --tag some_tag - [single string] The tag argument is mainly to be used along --nexus-id to fetch a specific tag of a given resource. Optional
- --filter prop1=1 prop2=20 - [multiple strings] Filters are to be used instead of --nexus-id if the @id is not known. Filters are applied on properties and can work with graph traversal. Optional but necessary of --nexus-id is not provided
Filters
The --filter argument is powerful and deserves its own paragraph.
Using filters keeps only the resources that match the conditions. Each filter takes this
shape:
property name [ operator ] value to compare with
Where: - property name can be a direct root level property (eg. name), a subproperty (eg. resolution.value) or even a graph traversal property (eg. atlasRelease.name when the atlasRelease property is actually just an @id to another resource that has a name property). - value can be a number or a string - operator can be one of the following: - = strictly equal for number and case-insensitive equal for strings - ~= contains, only for strings - != different, for numbers or does-not-contain for strings - >= greater than or equal to, for numbers only - <= lower than or equal to, for numbers only - > greater than, for numbers only - < lower than, for numbers only
Example: resolution.value=25
Then, multiple filters of this kind can be used, space separated, after a single --filter flag.
⚠️ Warning: since this is used in a terminal, the symbol ">" is natively made to redirect the output to a file. Hence, filters that use the operators > or >= must have their whole expression famed with double or simple quotes.
⚠️ Warning 2: Like most CLI, multiword strings must also be framed in double or simple quotes.
Example: --filter "resolution.value>=25" name="hello world" "another.prop=bip boop"
ℹ️ Info: property names such as resolution or name should not be preceded by a context prefix. For example: "nsg:resolution.schema:value" is not valid.
ℹ️ Info: if a property is a @list, then we can address an element from the list using square brackets. Example: --filter dimension[0].name=intensity worldMatrix[0]=10
Under the hood, this is using rdf:first and rdf:rest.
Examples
Fetch a resource payload from its
@id:bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \ --nexus-token-file ./token.txt \ --nexus-org bbp \ --nexus-proj atlas \ --payload \ # <-- to fetch the payload ! --out ./tmp/some_payload.json \ # <-- needs a .json extension --nexus-id 7f85cd66-d212-4799-bb4c-0732b8534442Fetch the distribution file linked a resource, from resource
@id:bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \ --nexus-token-file ./token.txt \ --nexus-org bbp \ --nexus-proj atlas \ --out ./tmp/some_distribution.nrrd \ # <-- extension must be the same as distant file --nexus-id 7f85cd66-d212-4799-bb4c-0732b8534442
Fetch the distribution file corresponding to the favor argument among multiple distributions from a resource:
bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \
--nexus-token-file ./token.txt \
--nexus-org bbp \
--nexus-proj atlas \
--out ./tmp/some_payload.json \
--nexus-id http://bbp.epfl.ch/neurosciencegraph/ontologies/mba\
--favor "encodingFormat:application/ld+json" \
--verbose \
Fetch a resource payload properties (dynamic SPARQL query building):
bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \ --nexus-token-file ./token.txt \ --nexus-org bbp \ --nexus-proj atlas \ --payload \ # <-- to fetch the payload ! --out ./tmp/some_payload.json \ # <-- needs a .json extension --filter \ type=BrainParcellationDataLayer \ resolution.value=10 \ atlasRelease.name="Allen Mouse CCF v2" \Some comparison operator don't play well with shell script and need to be framed with quote signs:
bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \ --nexus-token-file ./token.txt \ --nexus-org bbp \ --nexus-proj atlas \ --out ./tmp/some_distribution.nrrd \ --filter \ type=BrainParcellationDataLayer \ "resolution.value>=10" \ # <-- if not framed with "...", the > symbol redirects the output atlasRelease.name="Allen Mouse CCF v2" \Case insensitive for property and type names:
bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \ --nexus-token-file ./token.txt \ --nexus-org bbp \ --nexus-proj atlas \ --out ./tmp/some_payload.json \ --payload \ --verbose \ --filter \ type=BrainParcellationDataLayer \ resOlution.value=10 \ # <-- wrong case spelling, still works! Bufferencoding=gzip \ # <-- wrong case spelling, still works! atlasRelease.name="Allen Mouse CCF v2" \Fetch the point cloud file linked to the brain region
mba:1048:bba-data-fetch --nexus-env https://bbp.epfl.ch/nexus/v1/ \ --nexus-token-file ./token.txt \ --nexus-org bbp \ --nexus-proj atlas \ --out ./tmp/some_payload.raw \ --verbose \ --filter \ type=CellPositions \ brainLocation.brainRegion="mba:1048" \
Note that mba:1048 is the id of a brain region (Allen CCF 1048 is the "gigantocellular reticular nucleus") preceded by mba, which is the prefix in the graph database (mba = mouse brain atlas). This means that prefixes can be used in values if need be.
Funding & Acknowledgment
The development of this software was supported by funding to the Blue Brain Project, a research center of the École polytechnique fédérale de Lausanne (EPFL), from the Swiss government’s ETH Board of the Swiss Federal Institutes of Technology.
Copyright © 2020-2024 Blue Brain Project/EPFL
Owner
- Name: The Blue Brain Project
- Login: BlueBrain
- Kind: organization
- Email: bbp.opensource@epfl.ch
- Location: Geneva, Switzerland
- Website: https://portal.bluebrain.epfl.ch/
- Repositories: 226
- Profile: https://github.com/BlueBrain
Open Source Software produced and used by the Blue Brain Project
GitHub Events
Total
- Watch event: 1
- Member event: 1
- Push event: 4
- Public event: 1
- Pull request event: 2
- Fork event: 1
- Create event: 1
Last Year
- Watch event: 1
- Member event: 1
- Push event: 4
- Public event: 1
- Pull request event: 2
- Fork event: 1
- Create event: 1
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 10 minutes
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 10 minutes
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- lecriste (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- click >=7.0
- nexus-sdk >=0.3.2
- nexusforge >=0.8.1
- numpy >=1.19
- pynrrd >=0.4.0
- click >=7.0
- nexus-sdk >=0.3.2
- nexusforge >=0.8.1
- numpy >=1.19
- pynrrd >=0.4.0