https://github.com/csteinmetz1/fad_pytorch

Frechet Audio Distance evaluation in PyTorch

https://github.com/csteinmetz1/fad_pytorch

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Frechet Audio Distance evaluation in PyTorch

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of drscotthawley/fad_pytorch
Created almost 3 years ago · Last pushed almost 3 years ago

https://github.com/csteinmetz1/fad_pytorch/blob/main/

fad_pytorch
================



[Original FAD paper (PDF)](https://arxiv.org/pdf/1812.08466.pdf)

## Install

``` sh
pip install fad_pytorch
```

## Features:

- runs in parallel on multiple processors and multiple GPUs (via
  `accelerate`)
- supports multiple embedding methods:
  - VGGish and PANN, both mono @ 16kHz
  - OpenL3 and (LAION-)CLAP, stereo @ 48kHz
- uses publicly-available pretrained checkpoints for music (+other
  sources) for those models. (if you want Speech, submit a PR or an
  Issue; I dont do speech.)
- favors ops in PyTorch rather than numpy (or tensorflow)
- `fad_gen` supports local data read or WebDataset (audio data stored in
  S3 buckets)
- runs on CPU, CUDA, or MPS

## Instructions:

This is designed to be run as 3 command-line scripts in succession. The
latter 2 (`fad_embed` and `fad_score`) are probably what most people
will want:

1.  `fad_gen`: produces directories of real & fake audio (given real
    data). See `fad_gen`
    [documentation](https://drscotthawley.github.io/fad_pytorch/fad_gen.html)
    for calling sequence.
2.  `fad_embed [options]  `: produces
    directories of *embeddings* of real & fake audio
3.  `fad_score [options]  `: reads the
    embeddings & generates FAD score, for real ($r$) and fake ($f$):

$$ FAD = || \mu_r - \mu_f ||^2 + tr\left(\Sigma_r + \Sigma_f - 2 \sqrt{\Sigma_r \Sigma_f}\right)$$

## Documentation

See the [Documentation
Website](https://drscotthawley.github.io/fad_pytorch/).

## Comments / FAQ / Troubleshooting

- `RuntimeError: CUDA error: invalid device ordinal`: This happens
  when you have a bad node on an AWS cluster. [Havent yet figured out
  what causes it or how to fix
  it](https://discuss.huggingface.co/t/solved-accelerate-accelerator-cuda-error-invalid-device-ordinal/21509/1).
  Workaround: Just add the current node to your SLURM `--exclude` list,
  exit and retry. Note: it may take as many as 5 to 7 retries before you
  get a good node.
- FAD scores obtained from different embedding methods are *wildly*
  different! Yea. Its not obvious that scores from different
  embedding methods should be comparable. Rather, compare different
  groups of audio files using the same embedding method, and/or check
  that FAD scores go *down* as similarity improves.
- FAD score for the same dataset repeated (twice) is not exactly zero!
  Yea. There seems to be an uncertainty of around +/- 0.008. Id say,
  dont quote any numbers past the first decimal point.

## Contributing

This repo is still fairly bare bones and will benefit from more
documentation and features as time goes on. Note that it is written
using [nbdev](https://nbdev.fast.ai/), so the things to do are:

1.  Fork this repo
2.  Clone your fork to your (local) machine
3.  Install nbdev: `python3 -m pip install -U nbdev`
4.  Make changes by editing the notebooks in `nbs/`, not the `.py` files
    in `fad_pytorch/`.
5.  Run `nbdev_export` to export notebook changes to `.py` files
6.  For good measure, run `nbdev_install_hooks` and `nbdev_clean` -
    especially if youve *added* any notebooks.
7.  Do a `git status` to see all the `.ipynb` and `.py` files that need
    to be added & committed
8.  `git add` those files and then `git commit`, and then `git push`
9.  Take a look in your forks GitHub Actions tab, and see if the test
    and deploy CI runs finish properly (green light) or fail (red
    light)
10. Once you get green lights, send in a Pull Request!

*Feel free to ask me for tips with nbdev, it has quite a learning curve.
You can also ask on [fast.ai forums](https://forums.fast.ai/) and/or
[fast.ai
Discord](https://discord.com/channels/689892369998676007/887694559952400424)*

## Citations / Blame / Disclaimer

This repo is 2 weeks old. Im not ready for this to be cited in your
papers. Id hate for there to be some mistake I havent found yet.
Perhaps a later version will have citation info. For now, instead,
theres:

**Disclaimer:** Results from this repo are still a work in progress.
While every effort has been made to test model outputs, the author takes
no responsbility for mistakes. If you want to double-check via another
source, see Related Repos below.

## Related Repos

There are \[several\] others, but this one is mine. These repos didnt
have all the features I wanted, but I used them for inspiration:

- https://github.com/gudgud96/frechet-audio-distance
- https://github.com/google-research/google-research/tree/master/frechet_audio_distance:
  Goes with [Original FAD paper](https://arxiv.org/pdf/1812.08466.pdf)
- https://github.com/AndreevP/speech_distances

Owner

  • Name: Christian J. Steinmetz
  • Login: csteinmetz1
  • Kind: user
  • Location: London, UK
  • Company: @aim-qmul

Machine learning for Hi-Fi audio. PhD Researcher at C4DM.

GitHub Events

Total
Last Year