Recent Releases of mexca
mexca - v1.0.3
Fixes one crucial and one minor bug.
Changed
- Output files in the standard pipeline recipe are save after each video file is processed (instead of saving everything at the end)
- GitHub action workflows are tested on MacOS version 13 because FFMPeg cannot be automatically installed on newest version
- An extra step for freeing disk space is added to GitHub action Docker workflows
Fixes
- A bug in the
FaceExtractorcomponent where the input for theMEFARGmodel was not sent to the correct device (GPU only) - A bug in the
SentimentExtractorcompoenent where the tokenizer would raise a run time error in very rare cases (probably for very long sentences). Now, padding is added to avoid the error and exceptions are caught returningnullsentiment scores.
- Python
Published by maltelueken almost 2 years ago
mexca - v1.0.0
Contains some final fixes and adjustments for the first complete release.
Changed
- Upgrades pyannote.audio to version 3.1.1
- Downgrades gdown to version 4.6.0
- Only essential steps are logged on INFO level (i.e., cluster confidence, average embeddings, and removing audio files is now on DEBUG level)
- The error message when the connection to the Docker daemon fails is now more informative
Removed
- onnx-runtime, ruamel.yaml, and torchaudio as requirements for the speaker identifier component due to pyannote.audio upgrade
Fixed
- A bug caused by pyannote.audio version 3.0.0 for short audio clips when frame-wise detected speakers exceeded maximum number of speakers (see #106)
- An issue by gdown when model files hosted on Google Drive could not be accessed anymore (https://github.com/wkentaro/gdown/issues/43)
- Python
Published by maltelueken about 2 years ago
mexca - v0.7.0-beta
Adds average speaker embeddings and improved speaker diarization. Also increases the performance of data processing. Provides an advanced example notebook for extending the standard MEXCA pipeline.
Added
- The
SpeakerAnnoationclass has a new attributespeaker_average_embeddingscontaining the average embeddings for each detected speaker - The
SpeakerIdentifierhas a new argument to explicitly set the device its run on (by default CPU) - The
SpeakerIdentifier.apply()method has a newshow_progressargument to enable progress bars for detected speech segments and embeddings - A new notebook on customizing and extending the MEXCA pipeline (
examples/example_custom_pipeline_components.ipynb) - Two new recipes for applying the standard MEXCA pipeline and postprocessing the extracted features (
recipes/) - The
Pipeline.apply()method has a newmergeargument to disable merging features from different modalities; this is useful when customizing a pipeline - A new logo (thanks to Ji Qi)
- Documentation on how to use mexca with GPU and CUDA support
- notebook has been added as a dependency for the demo installation
- scikit-learn has been added as an explicit dependency (previously dependency of py-feat)
Changed
- pyannote.audio has been upgraded to version 3.0.0; this required adding the following dependencies:
- torch >= 2.0.0
- onnxruntime-gpu on Windows and Linux
- onnxruntime on MacOS
- torchaudio on MacOS
- torch has been upgraded to version 2.0.0 for all components requiring it
- The
SpeakerIdentifiercomponent uses thepyannote/speaker-diarization-3.0model by default - pandas has been replaced by polars; the
Multimodal.featuresattribute now stores apolars.LazyFrameinstead of apandas.DataFrame; this speeds up postprocessing and merging for large data sets
Removed
- py-feat has been removed as a dependency
- Python
Published by maltelueken over 2 years ago
mexca - v0.6.0-beta
Adds support for Python 3.10. Refactors data handling and storage using pydantic data structures and validation. Replaces the audio.features module with the emvoice package.
Added
- The emvoice package a requirement for the VoiceExtractor component
- Support for Python 3.10
- A
post_min_face_sizeargument in theFaceExtractorclass which allows to filter out faces after detection and before clustering
Changed
- The
BaseFeatureclass in theaudio.extractionmodule is now an abstract base class and itsrequiresmethod an abstract property - The
Pipeline.apply()method can now also take an iterable of filepaths as thefilepathargument, processing them sequentially - The container test workflow is refactored and a
pytest.mark.run_env()decorator added to allow running tests for only one component container in one job; the jobs for different components are completely decoupled - The flowchart is updated
- Classes in the
datamodule are refactored using base classes and pydantic data models - Classes in the
datamodule have methods for JSON (de-) serialization - The CLIs for all components write output to JSON files with standardized names
- Custom attributes in the
VoiceFeaturesclass storenanvalues asNonefor consistency with the facial features - The
confidencefeature is renamed tospan_confidencefor consistency with the other text features
Removed
- The
audio.featuressubmodule is removed and its functionality replaced by the emvoice package - Support for Python 3.7 due to dependency conflicts with the whisper package
Fixed
- A bug in the lazy initialization of the Whisper model
- A bug in loading the a voice feature configuration YAML file from the CLI
- A bug in the calculation of transcription confidence scores for zero length speech segments
- An exception is thrown if a container component fails, propagating the error message to the console
- Python
Published by maltelueken over 2 years ago
mexca - v0.5.0-beta
Replaces the methods for predicting facial landmarks and action unit activations. Landmarks are now predicted by facenet-pytorch and action units by the MEFARG model instead of py-feat.
Added
- The
FaceExtractorcomponent computes average face embeddings via thecompute_avg_embeddings()method - The
VideoAnnotationdata class has an additional fieldface_average_embeddings, containing the average face representations for each detected face cluster - The
AudioTranscribercomponent returns the confidence of the transcription (average over each word sequence) - The
TranscriptionDatadata class has an additional fieldconfidencefor the transcription confidence ruamel.yamlis added as an explicit dependency for the SpeakerIdentifier componentgdownis added as a dependency for the FaceExtractor component- Package code adheres to black code style
- Adds pre-commit configuration (enable with
pre-commit install) in.pre-commit-config.yaml - Adds
blackandpre-committo .[dev] requirements
Changed
- The
mexca.videomodule is split into several submodules:extraction,mefl,anfl,mefarg, andhelper_classes - Facial landmarks are predicted by
facenet_pytorch.MTCNNinstead offeat.detector.Detector - Facial action units are predicted by
mexca.video.mefarg.MEFARGinstead offeat.detector.Detector - Word-level timestamps are obtained from the native
whisperpackage instead ofstable-ts
Removed
py-featis removed from the dependencies of the FaceExtractor componentstable-tsis removed from the dependencies of the AudioTranscriber component
- Python
Published by maltelueken over 2 years ago
mexca - More voice features and full GPU support
Adds voice features, improves the documentation, and updates the example notebooks. Enables GPU support for all pipeline components.
Added
- Classes for extracting voice features in the
mexca.audio.featuresmodule:FormantAudioSignalfor preemphasized audio signals for formant analysisAlphaRatioFramesandHammarIndexFramesfor calculating alpha ratio and Hammarberg indexSpectralSlopeFramesfor estimating spectral slopesMelSpecFramesandMfccFramesfor computing Mel spectrograms and cepstral coefficientsSpectralFluxFramesandRmsEnergyFramesfor calculating spectral flux and RMS energy
- Classes for extracting and interpolating voice features in the
mexca.audio.extractionmodule:FeatureAlphaRatio,FeatureHammarIndex,FeatureSpectralSlope,FeatureHarmonicDifference,FeatureMfcc,FeatureSpectralFlux,FeatureRmsEnergy - A
VoiceFeaturesConfigclass for configuring voice feature extraction inmexca.data - The CLI
extract-voicehas a new--config-filepathargument for YAML configuration files - The
FaceExtractorcomponent has newmax_cluster_framesargument to set the maximum number of frames for spectral clustering - The
SentimentExtractorcomponent has a newdeviceargument to run on GPU with 8-bit pyyamlis added as a requirement for the base packageaccelerateandbisandbytesare added as requirements for the SentimentExtractor component
Changed
- The set of default voice features that are extracted with
VoiceExtractorhas been expanded - The default window for STFT is now the Hann window
- Conversion from magnitude/energy to dB is now performed with librosa functions
- The example notebooks have been updated
mexca.container.VoiceExtractorContainercan also handleVoiceFeaturesConfigobjects- Required version of
spectralclusteris set to 0.2.16 - Required version of
transformersis set to 4.25.1 - Required version of numpy is set to >=1.21.6
Fixed
- Warnings triggered during voice feature extraction
- A bug occuring when using
FaceExtractorwithdevice="cuda"has been fixed
- Python
Published by maltelueken almost 3 years ago
mexca - mexca v0.3.0-beta
Improves the audio transcription and sentiment extraction workflows. Refactors the voice feature extraction workflow and adds several new voice features.
Added
- Docker containers are now versioned via tags and the container components automatically fetch the container matching the installed version of mexca; the container with the
:latesttag can be fetched with the argumentget_latest_tag=True(#65) - Classes for extracting voice features (#66):
AudioSignal,BaseSignalfor loading and storing signals in themexca.audio.featuresmoduleBaseFrames,FormantFrames,FormantAmplitudeFrames,HnrFrames,JitterFrames,PitchFrames,PitchHarmonicsFrames,PitchPeriodFrames,PitchPulseFrames,ShimmerFrames,SpecFramesfor computing and storing formant features, glottal pulse features, and pitch features in themexca.audio.featuresmoduleBaseFeature,FeaturePitchF0,FeatureJitter,FeatureShimmer,FeatureHnr,FeatureFormantFreq,FeatureFormantBandwidth,FeatureFormantAmplitudefor extracting and interpolating voice features in themexca.audio.extractionmodule
- An
allextra requirements group which installs the requirements for all of mexca's components (i.e.,pip install mexca[all], #64)
Changed
- The
SentimentDataclass now has atextinstead of anindexattribute, which is used for matching sentiment to transcriptions (#63) - The sentence sentiment is merged separately from the transcription in
Multimodal._merge_audio_text_features()(#63) - librosa (version 0.9) is added as a requirement for the VoiceExtractor component instead of parselmouth; the voice feature extraction now relies on librosa instead of praat (#66)
- stable-ts is required to be version 1.1.5 for compatibility with Python 3.7; in a future version, we might remove stable-ts as a dependency (#67)
- transformers is added as a requirement for the AudioTranscriber component (#67)
- scipy is moved to the general requirements for all components (#66)
- The
VoiceExtractorclass and component is refactored with new default features (#66) - Tests make better use of fixtures for cleaner and more reusable code (#63)
Fixed
- An error in the audio transcription that occurred for extremely short speech segments below the precision of whisper and stable-ts (#63)
Removed
- The
tomlextra requirement for the coverage requirement in thedevgroup (#67)
- Python
Published by maltelueken almost 3 years ago
mexca - mexca v0.2.1-beta
Minor patch that addresses a memory issue and includes some bug and documentation fixes.
Added
- A "Troubleshooting" sub section in the "Installation Details" section in docs.
- Exception class
AuthenticationErrorfor failed HuggingFace Hub authentication - Exception class
NotEnoughFacesErrorfor too few face detections for clustering
Changed
- Refactored
VideoDatasetclass to only load video frames when they are queried. The previous implementation attemped to load the entire video into memory leading to issues. Now, only frames of the current batch are loaded into memory as expected.
Fixed
- Added missing note about HuggingFace Hub authentication to "Getting Started" section in docs.
- An exception is triggered if pypiwin32 was not properly installed when initializing a docker client
- An exception is triggered if no HuggingFace Hub token was found when initializing
SpeakerIdentifierwithuse_auth_token=True - Correctly passes the HuggingFace Hub token to the Docker build action for the SpeakerIdentifier container
- Python
Published by maltelueken about 3 years ago
mexca - mexca v0.2.0-beta
First beta release. This version is a major overhaul of the first alpha release.
Added
- A component for sentiment extraction
- Data classes as interfaces for component in- and output in the
datamodule - CLIs for all five components, removes general CLI for pipeline
- Interfaces for Docker containers of all five components, removes general Dockerfile
- Functionality to write output to common file formats (JSON, RTTM, SRT)
- Lazy initialization for pretrained models to save memory
- Data loader functionality to the
FaceExtractorcomponent to allow for batch processing - Clustering confidence metric to the output of the
FaceExtractorclass - Logging
- Static type annotations
- Added utils module
- Flowchart to the introduction in docs
- 'Getting Started' section in docs
Changed
- Simplified the structure of the package
- Moved content of core module into separate modules
- Refactors the
Pipelineclass to include five components:FaceExtractor,SpeakerIdentifier,VoiceExtractor,AudioTranscriber,SentimentExtractor - Separated the dependencies for all five components: They can all be installed separately from each other
- Whisper for audio transcription instead of fine-tuned wav2vec models via huggingsound
- Adapted the
FaceExtractorcomponent for the pretrained models used in py-feat v0.5 - Refactors feature merging using
pandas.DataFrameandintervaltree.IntervalTree - Splits the installation instructions in two parts (quick vs. detailed) in docs
- Updates docker section
- Updates command line section
Removed
- Removed the
AudioIntegratorandAudioTextIntegratorclasses, feature merging is done in theMultimodalclass - Removed the core module and its submodules
- Removed face-speaker matching (temporarily); might be added again in a future release
- Python
Published by maltelueken about 3 years ago
mexca - First alpha version
What's in this version
This release contains the first alpha version of mexca. This version is still early development and may contain missing features, bugs, etc.
Contributors
- @maltelueken made their first contribution in https://github.com/mexca/mexca/pull/6
- @n400peanuts made their first contribution in https://github.com/mexca/mexca/pull/9
- @dafnevk added init.py file
Full Changelog: https://github.com/mexca/mexca/commits/v0.1.0-alpha
- Python
Published by n400peanuts over 3 years ago