https://github.com/braingeneers/search
Prototype search engine for experiment object store
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Prototype search engine for experiment object store
Basic Info
- Host: GitHub
- Owner: braingeneers
- License: mit
- Language: JavaScript
- Default Branch: main
- Size: 2.43 MB
Statistics
- Stars: 0
- Watchers: 6
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Braingeneers Search
Bringeneers NRP bucket crawler with experiment and file explorer hosted at search.braingeneers.gi.ucsc.edu
NOTE: 2023-04-02-e-hc328_unperturbed containes primary and spike sorted NWB files
Install
pip install -r requirments.txt
Develop
First create a small crawl database
python crawl.py --count 10
Then run the server locally in debug and auto reload mode
make debug-server
Run
Build docker files and start using Docker Compose
make build
make up
make follow
NOTE: docker-compose.yml is configured to be run from the braingeneers server so that it integrates into the mission control reverse proxy exposing this as search.braingeneers.gi.ucsc.edu
h5wasm to read NWB files directly in the browser
h5wasm enables the full hdf5 library to run natively in the browser. Using Emscripten FS.createLazyFile enables providing h5wasm a virtual file backed by http that can use range requests to incrementally access the h5 file over the wire. The paves the way to provide a presigned s3 URL so that a browser based app can directly access an h5 file in a cloud store. Unfortunately you can only generate a presigned URL for a single HTTP method, and h5wasm performs a HEAD to get capabilities (like range requests) before making a GET. To work around this the flask server in thie repo responds to the HEAD request directly and then provides a presigned URL redirection for the GET request so that the browser is directly pulling data from s3. This requires that the headers in the HEAD request provide the right capabilities. This approach has the downside of a redirect for every chunk from the proxy to the client. Another approach taken by flatiron's dendro is to fork h5wasm and use an aborted fetch to just get the content length Here is the detailed sequence of requests and responses that h5wasm makes then leads to this incremental reading:
h5wasm HEAD Request and Response
``` HEAD /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1 Accept: / Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 Connection: keep-alive Host: localhost:5282 Referer: http://localhost:5282/static/worker.js Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10157) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15
HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=3600 Connection: keep-alive Content-Length: 4966709395 Content-Type: application/octet-stream Date: Mon, 18 Mar 2024 15:17:34 GMT ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z" Keep-Alive: timeout=5 Last-Modified: Mon, 11 Mar 2024 19:11:17 GM ```
h5wasm First GET Request and Response
``` GET /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1 Accept: / Accept-Encoding: identity Accept-Language: en-US,en;q=0.9 Connection: keep-alive Host: localhost:5282 Range: bytes=0-1048575 Referer: http://localhost:5282/static/worker.js Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10157) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15
HTTP/1.1 206 Partial Content Accept-Ranges: bytes Cache-Control: max-age=3600 Connection: keep-alive Content-Length: 1048576 Content-Range: bytes 0-1048575/4966709395 Content-Type: application/octet-stream Date: Mon, 18 Mar 2024 15:17:34 GMT ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z" Keep-Alive: timeout=5 Last-Modified: Mon, 11 Mar 2024 19:11:17 GMT ```
Sample NWB Files on NRP
s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2.nwb
s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2_kilosort2_curated_s1.nwb
References
Indexing
Quick full-text search using SQLite
NWB
Neurodata Without Borders(NWB)
h5wasm wrapper for h5 from http
How h5wasm accesses files over http via Emscripten lazy loading
GitHub thread on access h5 via range requests
Chunking and indexing note in an issue
React components to visualize and graph h5 data (uses h5wasm)
Owner
- Name: braingeneers
- Login: braingeneers
- Kind: organization
- Repositories: 15
- Profile: https://github.com/braingeneers