Releases | Open Source Science

mexca - v1.0.4

Fixes mismatch between PyPI release version and GitHub release version.

- Python
Published by maltelueken about 2 years ago

mexca - v1.0.3

Fixes one crucial and one minor bug.

Changed

Output files in the standard pipeline recipe are save after each video file is processed (instead of saving everything at the end)
GitHub action workflows are tested on MacOS version 13 because FFMPeg cannot be automatically installed on newest version
An extra step for freeing disk space is added to GitHub action Docker workflows

Fixes

A bug in the FaceExtractor component where the input for the MEFARG model was not sent to the correct device (GPU only)
A bug in the SentimentExtractor compoenent where the tokenizer would raise a run time error in very rare cases (probably for very long sentences). Now, padding is added to avoid the error and exceptions are caught returning null sentiment scores.

- Python
Published by maltelueken about 2 years ago

mexca - v1.0.2

Changes the pretrained MEFARG model to be downloaded from Hugging Face Hub instead of Google Drive due to problems with the gdown package.

- Python
Published by maltelueken about 2 years ago

mexca - v1.0.1

Fixes the PyPI publication. PyPI could not publish the previous version due to a name duplication error.

- Python
Published by maltelueken over 2 years ago

mexca - v1.0.0

Contains some final fixes and adjustments for the first complete release.

Changed

Upgrades pyannote.audio to version 3.1.1
Downgrades gdown to version 4.6.0
Only essential steps are logged on INFO level (i.e., cluster confidence, average embeddings, and removing audio files is now on DEBUG level)
The error message when the connection to the Docker daemon fails is now more informative

Removed

onnx-runtime, ruamel.yaml, and torchaudio as requirements for the speaker identifier component due to pyannote.audio upgrade

Fixed

A bug caused by pyannote.audio version 3.0.0 for short audio clips when frame-wise detected speakers exceeded maximum number of speakers (see #106)
An issue by gdown when model files hosted on Google Drive could not be accessed anymore (https://github.com/wkentaro/gdown/issues/43)

- Python
Published by maltelueken over 2 years ago

mexca - v0.7.0-beta

Adds average speaker embeddings and improved speaker diarization. Also increases the performance of data processing. Provides an advanced example notebook for extending the standard MEXCA pipeline.

Added

The SpeakerAnnoation class has a new attribute speaker_average_embeddings containing the average embeddings for each detected speaker
The SpeakerIdentifier has a new argument to explicitly set the device its run on (by default CPU)
The SpeakerIdentifier.apply() method has a new show_progress argument to enable progress bars for detected speech segments and embeddings
A new notebook on customizing and extending the MEXCA pipeline (examples/example_custom_pipeline_components.ipynb)
Two new recipes for applying the standard MEXCA pipeline and postprocessing the extracted features (recipes/)
The Pipeline.apply() method has a new merge argument to disable merging features from different modalities; this is useful when customizing a pipeline
A new logo (thanks to Ji Qi)
Documentation on how to use mexca with GPU and CUDA support
notebook has been added as a dependency for the demo installation
scikit-learn has been added as an explicit dependency (previously dependency of py-feat)

Changed

pyannote.audio has been upgraded to version 3.0.0; this required adding the following dependencies:
- torch >= 2.0.0
- onnxruntime-gpu on Windows and Linux
- onnxruntime on MacOS
- torchaudio on MacOS
torch has been upgraded to version 2.0.0 for all components requiring it
The SpeakerIdentifier component uses the pyannote/speaker-diarization-3.0 model by default
pandas has been replaced by polars; the Multimodal.features attribute now stores a polars.LazyFrame instead of a pandas.DataFrame; this speeds up postprocessing and merging for large data sets

Removed

py-feat has been removed as a dependency

- Python
Published by maltelueken over 2 years ago

mexca - v0.6.0-beta

Adds support for Python 3.10. Refactors data handling and storage using pydantic data structures and validation. Replaces the audio.features module with the emvoice package.

Added

The emvoice package a requirement for the VoiceExtractor component
Support for Python 3.10
A post_min_face_size argument in the FaceExtractor class which allows to filter out faces after detection and before clustering

Changed

The BaseFeature class in the audio.extraction module is now an abstract base class and its requires method an abstract property
The Pipeline.apply() method can now also take an iterable of filepaths as the filepath argument, processing them sequentially
The container test workflow is refactored and a pytest.mark.run_env() decorator added to allow running tests for only one component container in one job; the jobs for different components are completely decoupled
The flowchart is updated
Classes in the data module are refactored using base classes and pydantic data models
Classes in the data module have methods for JSON (de-) serialization
The CLIs for all components write output to JSON files with standardized names
Custom attributes in the VoiceFeatures class store nan values as None for consistency with the facial features
The confidence feature is renamed to span_confidence for consistency with the other text features

Removed

The audio.features submodule is removed and its functionality replaced by the emvoice package
Support for Python 3.7 due to dependency conflicts with the whisper package

Fixed

A bug in the lazy initialization of the Whisper model
A bug in loading the a voice feature configuration YAML file from the CLI
A bug in the calculation of transcription confidence scores for zero length speech segments
An exception is thrown if a container component fails, propagating the error message to the console

- Python
Published by maltelueken almost 3 years ago

mexca - v0.5.0-beta

Replaces the methods for predicting facial landmarks and action unit activations. Landmarks are now predicted by facenet-pytorch and action units by the MEFARG model instead of py-feat.

Added

The FaceExtractor component computes average face embeddings via the compute_avg_embeddings() method
The VideoAnnotation data class has an additional field face_average_embeddings, containing the average face representations for each detected face cluster
The AudioTranscriber component returns the confidence of the transcription (average over each word sequence)
The TranscriptionData data class has an additional field confidence for the transcription confidence
ruamel.yaml is added as an explicit dependency for the SpeakerIdentifier component
gdown is added as a dependency for the FaceExtractor component
Package code adheres to black code style
Adds pre-commit configuration (enable with pre-commit install) in .pre-commit-config.yaml
Adds black and pre-commit to .[dev] requirements

Changed

The mexca.video module is split into several submodules: extraction, mefl, anfl, mefarg, and helper_classes
Facial landmarks are predicted by facenet_pytorch.MTCNN instead of feat.detector.Detector
Facial action units are predicted by mexca.video.mefarg.MEFARG instead of feat.detector.Detector
Word-level timestamps are obtained from the native whisper package instead of stable-ts

Removed

py-feat is removed from the dependencies of the FaceExtractor component
stable-ts is removed from the dependencies of the AudioTranscriber component

- Python
Published by maltelueken almost 3 years ago

mexca - More voice features and full GPU support

Adds voice features, improves the documentation, and updates the example notebooks. Enables GPU support for all pipeline components.

Added

Classes for extracting voice features in the mexca.audio.features module:
- FormantAudioSignal for preemphasized audio signals for formant analysis
- AlphaRatioFrames and HammarIndexFrames for calculating alpha ratio and Hammarberg index
- SpectralSlopeFrames for estimating spectral slopes
- MelSpecFrames and MfccFrames for computing Mel spectrograms and cepstral coefficients
- SpectralFluxFrames and RmsEnergyFrames for calculating spectral flux and RMS energy
Classes for extracting and interpolating voice features in the mexca.audio.extraction module: FeatureAlphaRatio, FeatureHammarIndex, FeatureSpectralSlope, FeatureHarmonicDifference, FeatureMfcc, FeatureSpectralFlux, FeatureRmsEnergy
A VoiceFeaturesConfig class for configuring voice feature extraction in mexca.data
The CLI extract-voice has a new --config-filepath argument for YAML configuration files
The FaceExtractor component has new max_cluster_frames argument to set the maximum number of frames for spectral clustering
The SentimentExtractor component has a new device argument to run on GPU with 8-bit
pyyaml is added as a requirement for the base package
accelerate and bisandbytes are added as requirements for the SentimentExtractor component

Changed

The set of default voice features that are extracted with VoiceExtractor has been expanded
The default window for STFT is now the Hann window
Conversion from magnitude/energy to dB is now performed with librosa functions
The example notebooks have been updated
mexca.container.VoiceExtractorContainer can also handle VoiceFeaturesConfig objects
Required version of spectralcluster is set to 0.2.16
Required version of transformers is set to 4.25.1
Required version of numpy is set to >=1.21.6

Fixed

Warnings triggered during voice feature extraction
A bug occuring when using FaceExtractor with device="cuda" has been fixed

- Python
Published by maltelueken about 3 years ago

mexca - mexca v0.3.0-beta

Improves the audio transcription and sentiment extraction workflows. Refactors the voice feature extraction workflow and adds several new voice features.

Added

Docker containers are now versioned via tags and the container components automatically fetch the container matching the installed version of mexca; the container with the :latest tag can be fetched with the argument get_latest_tag=True (#65)
Classes for extracting voice features (#66):
- AudioSignal, BaseSignal for loading and storing signals in the mexca.audio.features module
- BaseFrames, FormantFrames, FormantAmplitudeFrames, HnrFrames, JitterFrames, PitchFrames, PitchHarmonicsFrames, PitchPeriodFrames, PitchPulseFrames, ShimmerFrames, SpecFrames for computing and storing formant features, glottal pulse features, and pitch features in the mexca.audio.features module
- BaseFeature, FeaturePitchF0, FeatureJitter, FeatureShimmer, FeatureHnr, FeatureFormantFreq, FeatureFormantBandwidth, FeatureFormantAmplitude for extracting and interpolating voice features in the mexca.audio.extraction module
An all extra requirements group which installs the requirements for all of mexca's components (i.e., pip install mexca[all], #64)

Changed

The SentimentData class now has a text instead of an index attribute, which is used for matching sentiment to transcriptions (#63)
The sentence sentiment is merged separately from the transcription in Multimodal._merge_audio_text_features() (#63)
librosa (version 0.9) is added as a requirement for the VoiceExtractor component instead of parselmouth; the voice feature extraction now relies on librosa instead of praat (#66)
stable-ts is required to be version 1.1.5 for compatibility with Python 3.7; in a future version, we might remove stable-ts as a dependency (#67)
transformers is added as a requirement for the AudioTranscriber component (#67)
scipy is moved to the general requirements for all components (#66)
The VoiceExtractor class and component is refactored with new default features (#66)
Tests make better use of fixtures for cleaner and more reusable code (#63)

Fixed

An error in the audio transcription that occurred for extremely short speech segments below the precision of whisper and stable-ts (#63)

Removed

The toml extra requirement for the coverage requirement in the dev group (#67)

- Python
Published by maltelueken about 3 years ago

mexca - mexca v0.2.1-beta

Minor patch that addresses a memory issue and includes some bug and documentation fixes.

Added

A "Troubleshooting" sub section in the "Installation Details" section in docs.
Exception class AuthenticationError for failed HuggingFace Hub authentication
Exception class NotEnoughFacesError for too few face detections for clustering

Changed

Refactored VideoDataset class to only load video frames when they are queried. The previous implementation attemped to load the entire video into memory leading to issues. Now, only frames of the current batch are loaded into memory as expected.

Fixed

Added missing note about HuggingFace Hub authentication to "Getting Started" section in docs.
An exception is triggered if pypiwin32 was not properly installed when initializing a docker client
An exception is triggered if no HuggingFace Hub token was found when initializing SpeakerIdentifier with use_auth_token=True
Correctly passes the HuggingFace Hub token to the Docker build action for the SpeakerIdentifier container

- Python
Published by maltelueken over 3 years ago

mexca - mexca v0.2.0-beta

First beta release. This version is a major overhaul of the first alpha release.

Added

A component for sentiment extraction
Data classes as interfaces for component in- and output in the data module
CLIs for all five components, removes general CLI for pipeline
Interfaces for Docker containers of all five components, removes general Dockerfile
Functionality to write output to common file formats (JSON, RTTM, SRT)
Lazy initialization for pretrained models to save memory
Data loader functionality to the FaceExtractor component to allow for batch processing
Clustering confidence metric to the output of the FaceExtractor class
Logging
Static type annotations
Added utils module
Flowchart to the introduction in docs
'Getting Started' section in docs

Changed

Simplified the structure of the package
Moved content of core module into separate modules
Refactors the Pipeline class to include five components: FaceExtractor, SpeakerIdentifier, VoiceExtractor, AudioTranscriber, SentimentExtractor
Separated the dependencies for all five components: They can all be installed separately from each other
Whisper for audio transcription instead of fine-tuned wav2vec models via huggingsound
Adapted the FaceExtractor component for the pretrained models used in py-feat v0.5
Refactors feature merging using pandas.DataFrame and intervaltree.IntervalTree
Splits the installation instructions in two parts (quick vs. detailed) in docs
Updates docker section
Updates command line section

Removed

Removed the AudioIntegrator and AudioTextIntegrator classes, feature merging is done in the Multimodal class
Removed the core module and its submodules
Removed face-speaker matching (temporarily); might be added again in a future release

- Python
Published by maltelueken over 3 years ago

mexca - First alpha version

What's in this version

This release contains the first alpha version of mexca. This version is still early development and may contain missing features, bugs, etc.

Contributors

@maltelueken made their first contribution in https://github.com/mexca/mexca/pull/6
@n400peanuts made their first contribution in https://github.com/mexca/mexca/pull/9
@dafnevk added init.py file

Full Changelog: https://github.com/mexca/mexca/commits/v0.1.0-alpha

- Python
Published by n400peanuts almost 4 years ago

Recent Releases of mexca

mexca - v1.0.4

mexca - v1.0.3

Changed

Fixes

mexca - v1.0.2

mexca - v1.0.1

mexca - v1.0.0

Changed

Removed

Fixed

mexca - v0.7.0-beta

Added

Changed

Removed

mexca - v0.6.0-beta

Added

Changed

Removed

Fixed

mexca - v0.5.0-beta

Added

Changed

Removed

mexca - More voice features and full GPU support

Added

Changed

Fixed

mexca - mexca v0.3.0-beta

Added

Changed

Fixed

Removed

mexca - mexca v0.2.1-beta

Added

Changed

Fixed

mexca - mexca v0.2.0-beta

Added

Changed

Removed

mexca - First alpha version

What's in this version

Contributors