glaucus

Glaucus is a PyTorch complex-valued ML autoencoder & RF estimation python module.

https://github.com/the-aerospace-corporation/glaucus

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.7%) to scientific vocabulary

Keywords

autoencoder dsp ml module pytorch rf sigint
Last synced: 6 months ago · JSON representation ·

Repository

Glaucus is a PyTorch complex-valued ML autoencoder & RF estimation python module.

Basic Info
  • Host: GitHub
  • Owner: the-aerospace-corporation
  • License: lgpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.2 MB
Statistics
  • Stars: 25
  • Watchers: 2
  • Forks: 2
  • Open Issues: 1
  • Releases: 5
Topics
autoencoder dsp ml module pytorch rf sigint
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Glaucus Atlanticus Sea Slug

Glaucus

Glaucus Package Workflow

The Aerospace Corporation is proud to present our complex-valued encoder, decoder, and a new loss function for radio frequency (RF) digital signal processing (DSP) in PyTorch.

Video (click to play)

Using

Install

  • via PyPI: pip install glaucus
  • via source: pip install .

Testing

  • pytest
  • coverage run
  • pylint glaucus tests

Glaucus v2.0.0

The newest version of the autoencoder can encode arbitrary length continuous data using a combination of LSTM and residual vector quantization. Signal features are sequentially encoded and subtracted from the original, allowing a variable amount of compression from 51.2x to 819.2x.

Vector Quantized Variational Autoencoder Model

```python import torch

from glaucus import GlaucusRVQVAE

define model

model = GlaucusRVQVAE(quantize_dropout=True)

get weights

statedict = torch.hub.loadstatedictfromurl( "https://github.com/the-aerospace-corporation/glaucus/releases/download/v2.0.0/gvq-1024-a4baf001.pth", maplocation="cpu", ) model.loadstatedict(state_dict) model.freeze() model.eval() ```

Usage

The model will take any (batch_size, length) as a complex tensor. The forward function will encode and decode sequentially returning the same shape as the input.

python x_tensor = torch.randn(11, 11113, dtype=torch.complex64) y_tensor = model(x_tensor) # shape (11, 11113)

To get the compressed features from the input signal, you run the encode step. This returns a compressed feature tensor of (batch_size, input_len//compression_factor, num_quantizers) and a scale parameter. The decode function returns the reconstruction RMS normalized, or scaled if a scale parameter is given. compression_factor is equal to the product of the model compression_ratios and is 256 for the pretrained model.

python x_tensor = torch.randn(3, 65536, dtype=torch.complex64) y_encoded, y_scale = model.encode(x_tensor) # shapes ((3, 512, 16), (3,1)) y_tensor_rms = model.decode(y_encoded) # shape (3, 65536) y_tensor = model.decode(y_encoded, y_scale) # shape (3, 65536)

The pretrained model has a base compression of 51.2x, but can be scaled to 819.2x if desired by discarding N codebooks up to num_quantizers - 1. This will reduce reconstruction accuracy:

python y_encoded_truncated = y_encoded[..., :9] # keep 9 of 16 codebooks; new shape (3, 512, 9) y_tensor_57x = model.decode(y_encoded_truncated, y_scale) # shape (3, 65536)

codebook visualization

ytensor is an integer type, so to get the smallest binary representation for storage you can store the bytes. The ytensor is only using log2(num_embed) bits, so if we are very clever we can pack bits to keep even fewer bytes. Compare sizes for the first item in the batch from above:

```python

from glaucus import packtensor, unpacktensor len(xtensor[0].numpy().tobytes()) 524288 len(packtensor(yencoded[0].ravel())) # 51x smaller 10240 len(packtensor(yencodedtruncated[0].ravel())) # 91x smaller 5760 recoveredyencoded = unpacktensor(packtensor(yencoded[0].ravel())).reshape(1, -1, 16) model.decode(recoveredy_encoded).shape torch.Size([1, 65536]) ```

Note on Arbitrary Input Length

The new vector quantization model accepts arbitrary length RF input and will utilize history between samples when reconstructing. This history does NOT extend between batches, e.g. for a (7, 8192) shape input the model will only have a "context length" of 8192.

If the input length is not a multiple of the the product of compression_ratios (256 for the current pretrained model), the model will output extra samples in the decoding step that you will need to truncate.

Glaucus v1.2.0

Variational Autoencoder with progressive resampling and a better defined latent space.

Variational Autoencoder Model

```python import torch

from glaucus import blockgen, GlaucusVAE

define model

encoderblocks = blockgen(steps=8, spatialin=4096, spatialout=16, filtersin=2, filtersout=64, mode="encoder") decoderblocks = blockgen(steps=8, spatialin=16, spatialout=4096, filtersin=64, filtersout=2, mode="decoder") model = GlaucusVAE(encoderblocks, decoderblocks, bottleneckin=1024, bottleneckout=1024, data_format='nl')

get weights

statedict = torch.hub.loadstatedictfromurl( 'https://github.com/the-aerospace-corporation/glaucus/releases/download/v1.2.0/gvae-1920-2b2478a0.pth', maplocation='cpu') model.loadstatedict(state_dict) model.freeze() model.eval()

example usage

xtensor = torch.randn(7, 4096, dtype=torch.complex64) ytensor, yencoded, _, _ = model(xtensor) ```

Glaucus v1.0.0

Use pre-trained model with SigMF data

Load quantized model and return compressed signal vector & reconstruction. Our weights were trained & evaluated on a corpus of 200 GB of RF waveforms with various added RF impairments for a 1 PB training set.

```python import sigmf import torch

from glaucus import GlaucusAE

create model

model = GlaucusAE(bottleneckquantize=True, dataformat='nl') model = torch.quantization.prepare(model)

get weights for quantized model

statedict = torch.hub.loadstatedictfromurl( 'https://github.com/the-aerospace-corporation/glaucus/releases/download/v1.1.0/glaucus-512-3275-5517642b.pth', maplocation='cpu') model.loadstatedict(state_dict, strict=False)

prepare for prediction

model.freeze() model.eval() torch.quantization.convert(model, inplace=True)

get samples into NL tensor

xsigmf = sigmf.sigmffile.fromfile('example.sigmf') xtensor = torch.fromnumpy(xsigmf.read_samples())

create prediction & quint8 signal vector

ytensor, yencoded = model(x_samples)

get signal vector as uint8

yencodeduint8 = torch.intrepr(yencoded) ```

Higher-accuracy pre-trained model

```python

define architecture

import torch

from glaucus import GlaucusAE, blockgen

encoderblocks = blockgen(steps=6, spatialin=4096, spatialout=16, filtersin=2, filtersout=64, mode='encoder') decoderblocks = blockgen(steps=6, spatialin=16, spatialout=4096, filtersin=64, filtersout=2, mode='decoder')

create model

model = GlaucusAE(encoderblocks, decoderblocks, bottleneckin=1024, bottleneckout=1024, bottleneckquantize=True, dataformat='nl') model = torch.quantization.prepare(model)

get weights for quantized model

statedict = torch.hub.loadstatedictfromurl( 'https://github.com/the-aerospace-corporation/glaucus/releases/download/v1.1.0/glaucus-1024-761-c49063fd.pth', maplocation='cpu') model.loadstatedict(state_dict, strict=False)

see above for rest

```

Use pre-trained model & discard quantization layers

```python

create model, but skip quantization

from glaucus.utils import adaptglaucusquantized_weights

model = GlaucusAE(bottleneckquantize=False, dataformat='nl') statedict = torch.hub.loadstatedictfromurl( 'https://github.com/the-aerospace-corporation/glaucus/releases/download/v1.1.0/glaucus-512-3275-5517642b.pth', maplocation='cpu') statedict = adaptglaucusquantizedweights(state_dict)

ignore "unexpected_keys" warning

model.loadstatedict(state_dict, strict=False)

prepare for evaluation mode

model.freeze() model.eval()

see above for rest

```

Get loss between two RF signals

```python import np import torch

import glaucus

create criterion

loss = glaucus.RFLoss(spatialsize=128, dataformat='nl')

create some signal

xxx = torch.randn(128, dtype=torch.complex64)

alter signal with 1% freq offset

yyy = xxx * np.exp(1j * 2 * np.pi * 0.01 * np.arange(128))

return loss

loss(xxx, yyy) ```

Train model with TorchSig

partial implementation:

```python import lightning as L

from glaucus import GlaucusAE

model = GlaucusAE(data_format='nl')

this takes a very long time if no cache is available

signaldata = torchsig.datasets.Sig53(root=str(cachepath))

80 / 10 / 10 split

traindataset, valdataset, testdataset = torch.utils.data.randomsplit( signaldata, (len(signaldata) * np.array([0.8, 0.1, 0.1])).astype(int), generator=torch.Generator().manualseed(0xCAB005E), ) class RFDataModule(L.LightningDataModule): ''' defines the dataloaders for train, val, test and uses datasets ''' def _init(self, traindataset=None, valdataset=None, testdataset=None, numworkers=16, batch_size=32): super().init_() self.batchsize = batchsize self.numworkers = numworkers self.traindataset = traindataset self.valdataset = valdataset self.testdataset = test_dataset

def train_dataloader(self):
    return DataLoader(self.train_dataset, num_workers=self.num_workers, batch_size=self.batch_size, shuffle=True, pin_memory=True)
def val_dataloader(self):
    return DataLoader(self.val_dataset, num_workers=self.num_workers, batch_size=self.batch_size, shuffle=False, pin_memory=True)
def test_dataloader(self):
    return DataLoader(self.test_dataset, num_workers=self.num_workers, batch_size=self.batch_size, shuffle=False, pin_memory=True)

datamodule = RFDataModule( traindataset=traindataset, valdataset=valdataset, testdataset=testdataset, batchsize=batchsize, numworkers=numworkers)

trainer = L.Trainer() trainer.fit(model, datamodule=datamodule)

test with best checkpoint

trainer.test(model, datamodule=datamodule, ckpt_path="best") ```

Pre-trained Model List

| model weights | desc | published | mem (MB) | params (M) | multiadds (M) | provenance | |--------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|------------|----------|------------|---------------|---------------------------------------------------------------| | gvq-1024-a4baf001.pth | VQ-VAE | 2024-09-11 | 60.3 | 14.655 | 2370 | .016 pfs-days on general waveform Aerospace Dset | | gvae-1920-2b2478a0.pth | VAE | 2024-03-25 | 21.6 | 3.440 | 263 | .006 pfs-days on general waveform Dset. | | glaucus-1024-sig53TLe37-2956bcb6 | AE for Sig53 | 2023-05-16 | 19.9 | 2.873 | 380 | transfer learning from glaucus-1024-761-c49063fd w/Sig53 Dset | | glaucus-1024-761-c49063fd | AE accurate | 2023-03-02 | 19.9 | 2.873 | 380 | .035 pfs-days modulation & general waveform Aerospace Dset | | glaucus-512-3275-5517642b | AE small | 2023-03-02 | 17.9 | 2.030 | 259 | .009 pfs-days on modulation-only Aerospace Dset |

Note on pfs-days

Per OpenAI appendix here is the correct math (method 1):

  • pfs_days = (add-multiplies per forward pass) * (2 FLOPs/add-multiply) * (3 for forward and backward pass) * (number of examples in dataset) * (number of epochs) / (flop per petaflop) / (seconds per day)
  • (number of examples in dataset) * (number of epochs) = steps * batchsize
  • 1 pfs-day ≈ (8x V100 GPUs at 100% efficiency for 1 day) ≈ (100x GTX1080s at 100% efficiency for 1 day) ≈ (35x GTX 2080s at 100% efficiency for 1 day) ≈ 500 kWh

Papers

Code prior to v2.0.0 is documented by the two following IEEE publications.

Glaucus: A Complex-Valued Radio Signal Autoencoder

DOI

A complex-valued autoencoder neural network capable of compressing & denoising radio frequency (RF) signals with arbitrary model scaling is proposed. Complex-valued time samples received with various impairments are decoded into an embedding vector, then encoded back into complex-valued time samples. The embedding and the related latent space allow search, comparison, and clustering of signals. Traditional signal processing tasks like specific emitter identification, geolocation, or ambiguity estimation can utilize multiple compressed embeddings simultaneously. This paper demonstrates an autoencoder implementation capable of 64x compression hardened against RF channel impairments. The autoencoder allows separate or compound scaling of network depth, width, and resolution to target both embedded and data center deployment with differing resources. The common building block is inspired by the Fused Inverted Residual Block (Fused-MBConv), popularized by EfficientNetV2 & MobileNetV3, with kernel sizes more appropriate for time-series signal processing

Complex-Valued Radio Signal Loss for Neural Networks

DOI

A new optimized loss for training complex-valued neural networks that require reconstruction of radio signals is proposed. Given a complex-valued time series this method incorporates loss from spectrograms with multiple aspect ratios, cross-correlation loss, and loss from amplitude envelopes in the time & frequency domains. When training a neural network an optimizer will observe batch loss and backpropagate this value through the network to determine how to update the model parameters. The proposed loss is robust to typical radio impairments and co-channel interference that would explode a naive mean-square-error approach. This robust loss enables higher quality steps along the loss surface which enables training of models specifically designed for impaired radio input. Loss vs channel impairment is shown in comparison to mean-squared error for an ensemble of common channel effects.

Contributing

Do you have code you would like to contribute to this Aerospace project?

We are excited to work with you. We are able to accept small changes immediately and require a Contributor License Agreement (CLA) for larger changesets. Generally documentation and other minor changes less than 10 lines do not require a CLA. The Aerospace Corporation CLA is based on the well-known Harmony Agreements CLA created by Canonical, and protects the rights of The Aerospace Corporation, our customers, and you as the contributor. You can find our CLA here.

Please complete the CLA and send us the executed copy. Once a CLA is on file we can accept pull requests on GitHub or GitLab. If you have any questions, please e-mail us at oss@aero.org.

Licensing

The Aerospace Corporation supports Free & Open Source Software and we publish our work with GPL-compatible licenses. If the license attached to the project is not suitable for your needs, our projects are also available under an alternative license. An alternative license can allow you to create proprietary applications around Aerospace products without being required to meet the obligations of the GPL. To inquire about an alternative license, please get in touch with us at oss@aero.org.

Owner

  • Name: The Aerospace Corporation
  • Login: the-aerospace-corporation
  • Kind: organization
  • Location: El Segundo, California

We are THE Aerospace Corporation. A trusted partner. A national resource. A leader in national security space.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Glaucus
message: >-
  The Aerospace Corporation is proud to present our
  complex-valued encoder, decoder, and a new loss function
  for RF DSP in PyTorch.
type: software
authors:
  - given-names: Kyle Logue
    email: kyle.logue@aero.org
    affiliation: The Aerospace Corporation
identifiers:
  - type: doi
    value: 10.1109/AERO55745.2023.10115599
    description: 'Glaucus: A Complex-Valued Radio Signal Autoencoder'
  - type: doi
    value: 10.1109/AERO55745.2023.10116006
    description: Complex-Valued Radio Signal Loss for Neural Networks
repository-code: 'https://github.com/the-aerospace-corporation/glaucus'
keywords:
  - sigint
  - dsp
  - autoencoder
  - pytorch
  - rf
license: LGPL-3.0+

GitHub Events

Total
  • Issues event: 1
  • Watch event: 6
  • Issue comment event: 1
Last Year
  • Issues event: 1
  • Watch event: 6
  • Issue comment event: 1

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 15
  • Total Committers: 2
  • Avg Commits per committer: 7.5
  • Development Distribution Score (DDS): 0.067
Past Year
  • Commits: 6
  • Committers: 2
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.167
Top Committers
Name Email Commits
Kyle A Logue k****e@a****g 14
Philip K Giang p****g@a****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • thanostriantafyllou3 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 20 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
pypi.org: glaucus

Glaucus is a PyTorch complex-valued ML autoencoder & RF estimation python module.

  • Documentation: https://glaucus.readthedocs.io/
  • License: GNU Lesser General Public License v3 or later (LGPLv3+)
  • Latest release: 2.0.0
    published over 1 year ago
  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 20 Last month
Rankings
Dependent packages count: 6.6%
Downloads: 15.5%
Average: 22.3%
Stargazers count: 28.2%
Forks count: 30.5%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 6 months ago