https://github.com/batmen-lab/sonata

SONATA: Disambiguated manifold alignment of single-cell data

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

SONATA: Disambiguated manifold alignment of single-cell data

Basic Info

Host: GitHub
Owner: batmen-lab
License: apache-2.0
Language: Python
Default Branch: main
Size: 14.9 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

SONATA

Source code for Securing diagonal integration of multimodal single-cell data against ambiguous mapping

SONATA

Requirements

Dependencies for SONATA are recorded in requirements.txt.

Data

The datasets used in this project are available for download at the following link: data.

Then organize the project as follows:

project_root/ ├── src/ │ ├── examples/ │ │ ├── baselines/ │ │ ├── cfgs/ │ │ ├── noise_scale.ipynb │ │ ├── simulation_t_branch.ipynb │ │ └── ... │ ├── run_baselines/ │ ├── utils/ │ └── sonata.py ├── examples/ │ ├── cfgs/ │ ├── simulation_t_branch.ipynb │ └── ... ├── data/ │ ├── t_branch/ │ └── ... ├── results/ │ ├── sonata_pipeline │ ├── ├── t_branch/ │ └── └── ... ├── README.md └── requirements.txt

Baseline Performance

We demonstrate that artificial integrations resulting from ambiguous mapping in diagonal data integration are widespread yet surprisingly overlooked, occurring across all mainstream diagonal integration methods. The following notebooks show the performance cases of baseline methods on various ambiguous datasets: - t_branch: t_branch.ipynb - scGEM: scGEM.ipynb - SNARE: SNARE.ipynb - scNMT: scNMT.ipynb

To quantify the ambiguity in these cases, we report label transfer accuracy and average FOSCTTM metrics in our manuscript. All baseline method tests are implemented in the folder src/run_baselines. To run a test, use the following commands: python cd src python run_baselines/run_unioncom.py --dataset t_branch We argue that artificial integrations are more harmful than failed integrations because, while failed integrations can be qualitatively recognized, artificial integrations are difficult to detect and can mislead users into pursuing hypotheses based on erroneous results.

SONATA Examples

Jupyter notebooks to replicate the SONATA results from the manuscript are available under folder examples:
- Simulation datasets - partial ambiguous: simulationtbranch.ipynb, simulationybranch.ipynb, simulationxbranch.ipynb - no ambiguous: simulationdecaypath.ipynb - Real biology datasets - scGEM: scGEM.ipynb - SNARE: SNARE.ipynb - scNMT: scNMT.ipynb

Basic Use

```python import sonata sn = sonata.sonata(noise_scale=0.2) DiagnoseResult = sn.diagnose(data)

Get the indices of cells identified as ambiguous

ambiguousidx = DiagnoseResult.ambiguousidx

Get the corresponding ambiguous group labels for those cells

ambiguouslabels = DiagnoseResult.ambiguouslabels ```

Input for SONATA: - parameters: - noise_scale: The scale of gaussian noise added to generate variational versions of the manifold. Default: 0.2. - n_neighbor: Number of neighbors when constructing noise manifold. Default: 10.
- mode: Mode for constructing the graph. Options: "connectivity" or "distance". Default: "connectivity". - metric: Metric to use for distance computation. Default: "correlation". - e: Coefficient of the entropic regularization term in the objective function of OT formulation. Default: 1e-3. - repeat: Number of iterations for alignment. Default: 10. - n_cluster: Number of cell groups used in hierarchical clustering to achieve a smooth and efficient spline fit. Recommended: ncluster <= $\sqrt{n_samples}$. Default: 20. - **pvalthres: P-value threshold for ambiguous group pair detection. Default: 1e-2.
- **scalableOT: If True, uses the scalable version of OT. Default: False. - scalesamplerate: The sample rate for the scalable version of OT. Default: 0.1. - verbose: If True, prints the progress of the algorithm. Default: True.

data: A NumPy array or matrix where rows correspond to samples and columns correspond to features.

Output for SONATA: - An object of SimpleNamespace containing the following attributes: - ambiguouslabels: A numpy array of ambiguous group labels for ambiguous samples. - ambiguousidx: A numpy array of indices of ambiguous samples. - cannot_links: A list of ambiguous sample pairs.

Guidence on how to decide parameter "noise_scale"

Please refer to notebook: noise_scale.ipynb.

Scalable SONATA

To support large-scale datasets, we offer a more efficient yet equally effective optimal transport algorithm that significantly improves the scalability of SONATA. You can enable this scalable mode by simply setting scalableOT=True: python import sonata sn = sonata.sonata(noise_scale=0.2, scalableOT=True) DiagnoseResult = sn.diagnose(data)

Major Updates

Jun. 11, 2025: Added Quantized Gromov–Wasserstein to enhance the scalability of SONATA for large datasets.
Nov. 2, 2024: We have released the source code for new version of SONATA.
Nov. 1, 2024: We have added more comprehensive tests for 5 baseline methods, which can be found in the src/run_baselines folder. We're also working on the new version of SONATA—coming soon!

Owner

Name: BATMEN Lab @ UWaterloo
Login: batmen-lab
Kind: user
Company: UWaterloo CS

Repositories: 7
Profile: https://github.com/batmen-lab

GitHub Events

Total

Push event: 11

Last Year

Push event: 11

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/batmen-lab/sonata

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

SONATA

Requirements

Data

Baseline Performance

SONATA Examples

Basic Use

Get the indices of cells identified as ambiguous

Get the corresponding ambiguous group labels for those cells

Guidence on how to decide parameter "noise_scale"

Scalable SONATA

Major Updates

Owner

GitHub Events

Total

Last Year