rose

PyTorch implementation of Rotary Spatial Embeddings

https://github.com/rhoadesscholar/rose

Keywords

attention positional-encoding pytorch rotary-position-embedding rotary-position-encoding transformer

Last synced: 11 months ago · JSON representation ·

Repository

PyTorch implementation of Rotary Spatial Embeddings

Basic Info

Host: GitHub
Owner: rhoadesScholar
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage:
Size: 61.5 KB

Statistics

Stars: 2
Watchers: 0
Forks: 1
Open Issues: 2
Releases: 14

Topics

attention positional-encoding pytorch rotary-position-embedding rotary-position-encoding transformer

Created about 1 year ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

RoSE N-dimensional Rotary Spatial Embeddings

Original implementation of Rotary Spatial Embeddings (in PyTorch)

GitHub - License PyPI - Version PyPI - Python Version

Rotary Spatial Embeddings (RoSE) extends 2D Rotary Position Embeddings (RoPE) and the original 1D RoPE to incorporate into the embeddings spatial information in terms of N-dimensional real world coordinates. This is particularly useful for tasks that require understanding of spatial relationships across different scales, such as in microscopy.

Explanation

1 Relative phase in 1-D RoPE

If you write the 1-D RoPE positional factor for token $t$ as a per-token complex phase

math \phi(t)=e^{\,i\,t\theta},\qquad t\in\mathbb Z .

After you attach that phase to query $qt$ and key $kt$,

math \tilde q_t = q_t\;\phi(t),\qquad \tilde k_t = k_t\;\phi(t)^{*},

where $^*$ denotes complex conjugation, their dot-product inside attention becomes

math \tilde q_n\,\tilde k_m^{} \;=\; q_n\,k_m^{}\, \underbrace{\phi(n)\,\phi(m)^{*}}_{=\,e^{\,i\,(n-m)\theta}} .

⸻

2 Extending to N dimensions

Give every token a coordinate vector $\mathbf{p}=(x,y,z,\dots)\in\mathbb R^{N}.$

Define its phase as

math \phi(\mathbf{p}) \;=\;e^{\,i\,\langle\mathbf{p},\,\boldsymbol\theta\rangle}, \qquad \langle\mathbf{p},\boldsymbol\theta\rangle =\sum_{a=1}^{N} p_a\,\theta_a .

Then

math \phi(\mathbf{p}_n)\,\phi(\mathbf{p}_m)^{*} \;=\; e^{\,i\,\langle\mathbf{p}_n-\mathbf{p}_m,\;\boldsymbol\theta\rangle},

which is the ND generalisation of the 1-D $e^{\,i\,(n-m)\theta}$. You still get

math A_{nm}\;=\;\mathrm{Re} \bigl[q_n k_m^{*}\;e^{\,i\,\langle\mathbf{p}_n-\mathbf{p}_m, \boldsymbol\theta\rangle}\bigr],

while keeping the per-token encoding cost $O(LD)$.

Partial Rotation: RoSE also supports partial rotation via the rotary_ratio parameter, where only a fraction of the embedding dimensions are rotated while the rest are passed through unchanged. This provides a balance between spatial awareness and computational efficiency.

3 Embedding real-world coordinates

In many applications, such as microscopy or 3D point clouds, the coordinates are not just indices but represent real-world positions that may contain useful spatial information. RoSE allows for injecting these coordinates directly into the rotary embeddings by simply multiplying the coordinate vectors by the coordinate spacing (i.e. voxel size) before applying the rotary embedding.

Installation

From PyPI

bash pip install rotary-spatial-embeddings

From source

bash pip install git+https://github.com/rhoadesScholar/RoSE.git

Usage

Basic Usage - Multi-Head Attention with Spatial Embeddings

```python import torch from RoSE import RoSEMultiHeadCrossAttention

Create RoSE multi-head attention layer

layer = RoSEMultiHeadCrossAttention( dim=128, numheads=8, spatialdims=3, learnable=True, basetheta=1e4, rotaryratio=1.0 # Apply rotation to all dimensions (default) )

batchsize, seqlen = 2, 1000 q = torch.randn(batchsize, seqlen, 128) k = torch.randn(batchsize, seqlen, 128)

Define spatial grid properties

grid_shape = (10, 10, 10) # 3D grid dimensions spacing = (1.0, 1.0, 1.0) # Physical size of each voxel

Compute attention scores with spatial embeddings

attnscores = layer(q, k, spacing, gridshape) # Shape: (batchsize, numheads, seqlen, seqlen) ```

Partial Rotation with `rotary_ratio`

The rotary_ratio parameter allows you to apply rotary embeddings to only a fraction of the embedding dimensions, which can be beneficial for performance and model capacity:

```python import torch from RoSE import RotarySpatialEmbedding

Apply rotation to only 50% of the embedding dimensions

embedding = RotarySpatialEmbedding( dim=128, numheads=8, spatialdims=2, rotary_ratio=0.5, # Only rotate first 50% of dimensions per head learnable=False )

batchsize, seqlen = 2, 100 x = torch.randn(batchsize, seqlen, 128)

The first 64 dimensions (50% of 128) will be rotated

The last 64 dimensions will be passed through unchanged

xembedded = embedding(x, spacing=(0.5, 0.5), gridshape=(10, 10)) ```

Key benefits of partial rotation:

Performance: Reduces computational cost for large embeddings
Flexibility: Allows some dimensions to encode non-spatial information
Stability: Can improve training stability in some scenarios
Memory: Lower memory usage for frequency parameters

Using Just the Embedding Layer

```python import torch from RoSE import RotarySpatialEmbedding

Create just the rotary spatial embedding layer

embedding = RotarySpatialEmbedding( dim=128, numheads=8, spatialdims=2, learnable=False, frequencyscaling="sqrt", rotaryratio=1.0 # Apply rotation to all dimensions (default) )

batchsize, seqlen = 2, 100 x = torch.randn(batchsize, seqlen, 128)

Define 2D grid

grid_shape = (10, 10) spacing = (0.5, 0.5)

Apply rotary spatial embeddings

xembedded = embedding(x, spacing, gridshape) # Shape: (batchsize, seqlen, 128) ```

Parameters

Core Parameters

dim: Total embedding dimension (must be even and divisible by num_heads)
num_heads: Number of attention heads
spatial_dims: Number of spatial dimensions (2 for 2D, 3 for 3D, etc.)
rotary_ratio: Fraction of embedding dimensions to apply rotation to (0.0 to 1.0, default: 1.0)
- 1.0: Apply rotation to all dimensions (full rotation)
- 0.5: Apply rotation to 50% of dimensions per head
- 0.0: No rotation applied (passthrough)

Advanced Parameters

base_theta: Base frequency for rotary embeddings (default: 10000.0)
learnable: Whether frequencies should be learnable parameters (default: True)
init_jitter_std: Standard deviation for frequency initialization jitter (default: 0.02)
frequency_scaling: Scaling strategy for frequencies (default: "sqrt")
- "none": No frequency scaling
- "linear": Linear scaling with spatial dimensions
- "sqrt": Square root scaling with spatial dimensions
- "adaptive": Adaptive scaling based on spatial dims and embedding dim

Advanced Examples

Working with 3D Medical Imaging Data

```python import torch from RoSE import RotarySpatialEmbedding

Example: 3D CT scan with anisotropic voxel spacing

batchsize, seqlen = 1, 8000 # 20x20x20 volume flattened embeddingdim = 256 numheads = 8

Create embedding layer for 3D medical data

embedding = RotarySpatialEmbedding( dim=embeddingdim, numheads=numheads, spatialdims=3, learnable=True, rotaryratio=0.75, # Rotate 75% of dimensions frequencyscaling="adaptive" )

Define anisotropic voxel spacing (common in medical imaging)

gridshape = (20, 20, 20) voxelspacing = (0.5, 0.5, 2.0) # 0.5mm x 0.5mm x 2mm

Your input features (e.g., from a CNN backbone)

x = torch.randn(batchsize, seqlen, embedding_dim)

Apply spatial embeddings

xwithspatial = embedding(x, voxelspacing, gridshape) print(f"Input shape: {x.shape}") print(f"Output shape: {xwithspatial.shape}") ```

Multi-Scale Microscopy Analysis

```python import torch from RoSE import RoSEMultiHeadCrossAttention

Example: Multi-scale microscopy with different zoom levels

def createmultiscaleattention(): return RoSEMultiHeadCrossAttention( dim=512, numheads=16, spatialdims=2, learnable=True, basetheta=1e4, rotaryratio=1.0 # Full rotation for spatial awareness )

Different scales: 10x, 40x, 100x magnification

scalesandspacings = [ ((100, 100), (1.0, 1.0)), # 10x: 1μm/pixel ((200, 200), (0.25, 0.25)), # 40x: 0.25μm/pixel ((400, 400), (0.1, 0.1)), # 100x: 0.1μm/pixel ]

attentionlayer = createmultiscale_attention()

for i, (gridshape, spacing) in enumerate(scalesandspacings): seqlen = gridshape[0] * gridshape[1]

# Simulate features from different magnifications
q = torch.randn(1, seq_len, 512)
k = torch.randn(1, seq_len, 512)

# Compute attention with spatial awareness
attn_scores = attention_layer(q, k, spacing, grid_shape)

print(f"Scale {i+1}: {grid_shape} grid, {spacing} spacing")
print(f"Attention shape: {attn_scores.shape}\n")

```

Custom Coordinate Systems

```python import torch from RoSE import RotarySpatialEmbedding

Example: Geographic coordinate system (lat/lon/elevation)

class GeospatialEmbedding(torch.nn.Module): def init(self, dim, numheads): super().init() self.spatialembedding = RotarySpatialEmbedding( dim=dim, numheads=numheads, spatialdims=3, # lat, lon, elevation learnable=True, frequencyscaling="adaptive" )

def forward(self, x, coordinates):
    """
    Args:
        x: Features [B, N, D]
        coordinates: [B, N, 3] tensor with [lat, lon, elevation]
    """
    # Normalize coordinates to reasonable scales
    lat_scale, lon_scale, elev_scale = 1/90, 1/180, 1/1000
    normalized_coords = coordinates * torch.tensor([lat_scale, lon_scale, elev_scale])

    # Convert to grid format (this is a simplified example)
    # In practice, you'd need proper coordinate-to-grid mapping
    batch_size, seq_len, _ = coordinates.shape
    grid_size = int(seq_len ** (1/3)) if seq_len ** (1/3) == int(seq_len ** (1/3)) else 10
    grid_shape = (grid_size, grid_size, grid_size)
    spacing = (lat_scale, lon_scale, elev_scale)

    return self.spatial_embedding(x, spacing, grid_shape)

Usage

geoembedding = GeospatialEmbedding(dim=256, numheads=8) features = torch.randn(2, 1000, 256) coordinates = torch.randn(2, 1000, 3) # Random lat/lon/elevation result = geo_embedding(features, coordinates) ```

Integration with Transformers

```python import torch import torch.nn as nn from RoSE import RotarySpatialEmbedding

class SpatialTransformerBlock(nn.Module): """Transformer block with spatial awareness via RoSE."""

def __init__(self, dim, num_heads, spatial_dims=2):
    super().__init__()
    self.spatial_embedding = RotarySpatialEmbedding(
        dim=dim,
        num_heads=num_heads,
        spatial_dims=spatial_dims,
        learnable=True
    )

    self.attention = nn.MultiheadAttention(
        embed_dim=dim,
        num_heads=num_heads,
        batch_first=True
    )

    self.norm1 = nn.LayerNorm(dim)
    self.norm2 = nn.LayerNorm(dim)

    self.mlp = nn.Sequential(
        nn.Linear(dim, 4 * dim),
        nn.GELU(),
        nn.Linear(4 * dim, dim)
    )

def forward(self, x, spacing, grid_shape):
    # Apply spatial embeddings
    x_spatial = self.spatial_embedding(x, spacing, grid_shape)

    # Self-attention with spatial embeddings
    attn_out, _ = self.attention(x_spatial, x_spatial, x_spatial)
    x = self.norm1(x + attn_out)

    # MLP
    mlp_out = self.mlp(x)
    x = self.norm2(x + mlp_out)

    return x

Example usage

transformer = SpatialTransformerBlock(dim=256, numheads=8, spatialdims=2) x = torch.randn(4, 100, 256) # Batch of sequences result = transformer(x, spacing=(1.0, 1.0), grid_shape=(10, 10)) print(f"Transformer output shape: {result.shape}") ```

Tips and Best Practices

Voxel Spacing: Always provide real-world spacing when available - it significantly improves spatial understanding
Rotary Ratio: Start with rotary_ratio=1.0 for maximum spatial awareness, then experiment with lower values for efficiency
Learnable Frequencies: Set learnable=True for fine-tuning on your specific spatial domain
Frequency Scaling: Use "adaptive" scaling for most applications, "sqrt" for simpler cases
Grid Shape: Ensure your sequence length matches prod(grid_shape) for proper spatial mapping

License

BSD 3-Clause License. See LICENSE for details.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use RoSE in your research, please cite it as below."
title: "RoSE: Rotary Spatial Embeddings"
authors:
  - family-names: Rhoades
    given-names: Jeff
    orcid: https://orcid.org/0000-0001-5077-2533
version: 2025.8.26.2007
date-released: 2025-08-26
# doi: 10.5281/zenodo.1234567  # optional, if you have a Zenodo DOI
repository-code: https://github.com/rhoadesScholar/RoSE
license: BSD-3-Clause

GitHub Events

Total

Create event: 5
Issues event: 2
Release event: 4
Watch event: 1
Delete event: 1
Issue comment event: 2
Push event: 9
Pull request review event: 1
Pull request event: 3

Last Year

Create event: 5
Issues event: 2
Release event: 4
Watch event: 1
Delete event: 1
Issue comment event: 2
Push event: 9
Pull request review event: 1
Pull request event: 3

Packages

Total packages: 1
Total downloads:
- pypi 1,079 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 12
Total maintainers: 1

pypi.org: rotary-spatial-embeddings

PyTorch implementation of Rotary Spatial Embeddings

Homepage: https://github.com/rhoadesScholar/RoSE
Documentation: https://github.com/rhoadesScholar/RoSE
License: BSD 3-Clause License
Latest release: 2025.8.26.2007
published 11 months ago

Versions: 12
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 1,079 Last month

Rankings

Dependent packages count: 8.8%

Average: 29.0%

Dependent repos count: 49.3%

Maintainers (1)

rhoadesScholar

Last synced: 11 months ago

rose

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

RoSE N-dimensional Rotary Spatial Embeddings

Original implementation of Rotary Spatial Embeddings (in PyTorch)

Explanation

1 Relative phase in 1-D RoPE

2 Extending to N dimensions

3 Embedding real-world coordinates

Installation

From PyPI

From source

Usage

Basic Usage - Multi-Head Attention with Spatial Embeddings

Create RoSE multi-head attention layer

Define spatial grid properties

Compute attention scores with spatial embeddings

Partial Rotation with rotary_ratio

Apply rotation to only 50% of the embedding dimensions

The first 64 dimensions (50% of 128) will be rotated

The last 64 dimensions will be passed through unchanged

Using Just the Embedding Layer

Create just the rotary spatial embedding layer

Define 2D grid

Apply rotary spatial embeddings

Parameters

Core Parameters

Advanced Parameters

Advanced Examples

Working with 3D Medical Imaging Data

Example: 3D CT scan with anisotropic voxel spacing

Create embedding layer for 3D medical data

Define anisotropic voxel spacing (common in medical imaging)

Your input features (e.g., from a CNN backbone)

Apply spatial embeddings

Multi-Scale Microscopy Analysis

Example: Multi-scale microscopy with different zoom levels

Different scales: 10x, 40x, 100x magnification

Custom Coordinate Systems

Example: Geographic coordinate system (lat/lon/elevation)

Usage

Integration with Transformers

Example usage

Tips and Best Practices

License

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Packages

pypi.org: rotary-spatial-embeddings

Rankings

Maintainers (1)

Dependencies

Partial Rotation with `rotary_ratio`