https://github.com/braingeneers/search

Prototype search engine for experiment object store

https://github.com/braingeneers/search

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: nature.com
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Prototype search engine for experiment object store

Basic Info
  • Host: GitHub
  • Owner: braingeneers
  • License: mit
  • Language: JavaScript
  • Default Branch: main
  • Size: 2.43 MB
Statistics
  • Stars: 0
  • Watchers: 6
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

Braingeneers Search

Bringeneers NRP bucket crawler with experiment and file explorer hosted at search.braingeneers.gi.ucsc.edu

NOTE: 2023-04-02-e-hc328_unperturbed containes primary and spike sorted NWB files

Install

pip install -r requirments.txt

Develop

First create a small crawl database python crawl.py --count 10

Then run the server locally in debug and auto reload mode make debug-server

Run

Build docker files and start using Docker Compose make build make up make follow

NOTE: docker-compose.yml is configured to be run from the braingeneers server so that it integrates into the mission control reverse proxy exposing this as search.braingeneers.gi.ucsc.edu

h5wasm to read NWB files directly in the browser

h5wasm enables the full hdf5 library to run natively in the browser. Using Emscripten FS.createLazyFile enables providing h5wasm a virtual file backed by http that can use range requests to incrementally access the h5 file over the wire. The paves the way to provide a presigned s3 URL so that a browser based app can directly access an h5 file in a cloud store. Unfortunately you can only generate a presigned URL for a single HTTP method, and h5wasm performs a HEAD to get capabilities (like range requests) before making a GET. To work around this the flask server in thie repo responds to the HEAD request directly and then provides a presigned URL redirection for the GET request so that the browser is directly pulling data from s3. This requires that the headers in the HEAD request provide the right capabilities. This approach has the downside of a redirect for every chunk from the proxy to the client. Another approach taken by flatiron's dendro is to fork h5wasm and use an aborted fetch to just get the content length Here is the detailed sequence of requests and responses that h5wasm makes then leads to this incremental reading:

h5wasm HEAD Request and Response

``` HEAD /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1 Accept: / Accept-Encoding: gzip, deflate Accept-Language: en-US,en;q=0.9 Connection: keep-alive Host: localhost:5282 Referer: http://localhost:5282/static/worker.js Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10157) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15

HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=3600 Connection: keep-alive Content-Length: 4966709395 Content-Type: application/octet-stream Date: Mon, 18 Mar 2024 15:17:34 GMT ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z" Keep-Alive: timeout=5 Last-Modified: Mon, 11 Mar 2024 19:11:17 GM ```

h5wasm First GET Request and Response

``` GET /data/aff5f64d-9a69-4ff3-a6fe-13a3f30dca50 HTTP/1.1 Accept: / Accept-Encoding: identity Accept-Language: en-US,en;q=0.9 Connection: keep-alive Host: localhost:5282 Range: bytes=0-1048575 Referer: http://localhost:5282/static/worker.js Sec-Fetch-Dest: empty Sec-Fetch-Mode: cors Sec-Fetch-Site: same-origin User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10157) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.4 Safari/605.1.15

HTTP/1.1 206 Partial Content Accept-Ranges: bytes Cache-Control: max-age=3600 Connection: keep-alive Content-Length: 1048576 Content-Range: bytes 0-1048575/4966709395 Content-Type: application/octet-stream Date: Mon, 18 Mar 2024 15:17:34 GMT ETag: W/"11795069-4966709395-2024-03-11T19:11:17.335Z" Keep-Alive: timeout=5 Last-Modified: Mon, 11 Mar 2024 19:11:17 GMT ```

Sample NWB Files on NRP

s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2.nwb s3://braingeneers/ephys/2023-04-02-e-hc328_unperturbed/shared/hc3.28_hckcr1_chip16835_plated34.2_rec4.2_kilosort2_curated_s1.nwb

References

Indexing

SQLite FTS5 Extension

Quick full-text search using SQLite

NWB

Neurodata Without Borders(NWB)

A NWB-based dataset and processing pipeline of human single-neuron activity during a declarative memory task

NWB Examples

h5wasm

h5wasm wrapper for h5 from http

How h5wasm accesses files over http via Emscripten lazy loading

GitHub thread on access h5 via range requests

Chunking and indexing note in an issue

React components to visualize and graph h5 data (uses h5wasm)

Owner

  • Name: braingeneers
  • Login: braingeneers
  • Kind: organization

GitHub Events

Total
Last Year