shoutout

Open transcription platform with scheduled audio processing

https://github.com/rwth-time/shoutout

Science Score: 65.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization rwth-time has institutional domain (www.time.rwth-aachen.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Open transcription platform with scheduled audio processing

Basic Info
  • Host: GitHub
  • Owner: RWTH-TIME
  • License: gpl-3.0
  • Language: TypeScript
  • Default Branch: main
  • Homepage:
  • Size: 1.07 MB
Statistics
  • Stars: 21
  • Watchers: 2
  • Forks: 3
  • Open Issues: 18
  • Releases: 1
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

A modern web application for transcribing audiofiles on your own server.
Powered by models like whisper-v3 and pyannote/speaker-diarization. Try it yourself!

Why Shoutout?

Shoutout is a state of the art web application for transcribing audiofiles including speaker diarization and time stamps. Due to its high accuracy level, ideal data privacy and easy navigation, shoutout is perfectly suited for transcriptions of interviews in qualitative research or sensitive corporate recordings.

Shoutout provides: - a simple to use web-interface. - highly accurate transcriptions leveraging the open source transcription model whisper-v3. - automatic speaker detection including timestamps. - perfect data privacy: Shoutout runs 100% local and does not share any of your data with external services. - highly efficient and fast transcriptions using GPU acceleration and a scalable architecture. - easy deployment due to a completely dockerized build.

Screenshots

Creating a new job

.assets/Shoutout_1.png

Web-interface with an overview of all jobs

.assets/Shoutout_2.png

Downloading transcripts after finishing a job

.assets/Shoutout_3.png

Results example:

``` SPEAKER_00 00:00:00 Sure, okay, so for documentation purposes. Are you okay with me recording the interview?

SPEAKER_01 00:00:12 Yes, I agree to the audio recording.

SPEAKER_00 00:00:17 Okay, then let's start. Could you first briefly introduce yourself, describe your background and what your doing...

SPEAKER_01 00:00:28 Well, my name is ... ```

Architecture

.assets/arch.png

quickstart

Its recommended to use docker and docker-compose

Docker

To setup all services just run following command in the root directory.

Before running the following command, please update the environment variable MINIO_ENDPOINT inside the docker-compose.yml to an external reachable hostname! This container is called directly from the frontend.

If you want the worker to support your gpu you have to install the nvidia-container-toolkit on the host

sh docker compose -f docker-compose.prod.yml up -d

It will setup 6 containers: 1. Build of the dashboard at localhost:8000 2. PostgresDB on port 5433 3. MinIO S3 Bucket at localhost:9001 4. A MinIO client initializing the s3 bucket permissions 5. RabbitMQ at localhost:15672 6. Worker-Container (gpu support)

Development

Make sure that all services (postgres, minio, rabbitmq) are running

Dashboard

To start developing the dashboard run following commands, it will start the dev.

```bash cd dashboard

npm i

npm run dev ```

Database

When making any changes to the database, be aware to migrate them!

bash npx prisma migrate dev --name {MigrationName}

Open http://localhost:3000

Worker

First activate and install all requirements into your virtualenv.

```bash cd worker

pip install -r requirements.txt ```

To develop and test the worker just run the script without a container.

Be aware to stop the worker-container if it's running!

bash python3 main.py

Environment variables

Dashboard

| NAME | DEFAULT VALUE | DESCRIPTION | | -------------------------- | ---------------------------------------------------------------- | ------------------------------------------------------------------------------ | | DATABASE_URL | postgresql://admin:admin@localhost:5433/postgres?schema=public | It is required for prisma to connect with the postgres database. | | RABBITMQ_URL | amqp://rabbit:rabbit@localhost | URL of the rabbitmq | | QUEUE_NAME | jobs | The name of the job-queue | | MINIO_ENDPOINT | localhost | This is the endpoint of minio server. It will be the IP address of the server. | | MINIO_PORT | 9000 | Minio port for communication from dashboard. | | MINIO_ACCESS_KEY | shoutoutdevuser | Access key for minio dev user. | | MINIO_SECRET_KEY | shoutoutdevuser | Secret key for minio dev user. | | MINIO_JOB_BUCKET | shoutout-job-bucket | Bucket name to store all audio files. | | DOWNLOAD_FILE_TARGET_DIR | finished-files/ | Folder on S3-Bucket containing transcribed files | | FINISHED_FILE_FORMAT | .txt | The download format of the finished file | | UPLOAD_FILE_TARGET_DIR | to-transcribe/ | Folder on S3-Bucket to upload mp3 files to | | MINIO_SSL_ENABLED | false | SSL setting for S3 Bucket |

Worker

| NAME | DEFAULT VALUE | DESCRIPTION | | ------------------------ | ----------------------- | --------------------------------------------------------------------------------------------- | | DATABASE_HOST | localhost | PostgresDB host | | DATABASE_NAME | postgres | Database name | | DATABASE_USER | admin | PostgresDB username | | DATABASE_PASSWORD | admin | PostgresDB password | | DATABASE_PORT | 5433 | PostgresDB port | | RABBITMQ_HOST | localhost | Rabbitmq host | | RABBITMQ_USER | rabbit | Username for rabbitmq | | RABBITMQ_PASSWORD | rabbit | Password for rabbitmq | | RABBITMQ_QUEUE | jobs | The name of the job queue | | MINIO_JOB_BUCKET | shoutout-job-bucket | Bucket name to store all audio files. | | MINIO_SECRET_KEY | shoutoutdevuser | Secret key for minio dev user. | | MINIO_ACCESS_KEY | shoutoutdevuser | Access key for minio dev user. | | MINIO_URL | http://localhost:9000 | URL of S3 Bucket | | TMP_FILE_DIR | tmp_downloads | Local directory where all temporary files which are needed for transcription are stored | | UPLOAD_FILE_TARGET_DIR | finished-files/ | Folder on S3-Bucket to upload finished transcription to | | DOWNLOAD_FILE_DIR | to-transcribe/ | Folder on S3-Bucket containing mp3 files to transcribe | | WHISPER_MODEL | large-v3 | openai whisper model size | | FINISHED_FILE_FORMAT | .txt | File format of the transcribed file |

Citation

BiBTex:

@software{shoutout_2024, title = {{shoutout: A modern web application for transcribing audio files on your own server}}, doi = {10.5281/zenodo.14527349}, url = {https://github.com/RWTH-TIME/shoutout}, version = {v1.0.0}, year = {2024} author = {Selzner, Paul and Evers, Felix and Kalhorn, Paul and Beckers, Lukas}, note = {Published by RWTH Technology and Innovation Management Institute (TIME Research Area)} }

Owner

  • Name: TIME Research Area
  • Login: RWTH-TIME
  • Kind: organization
  • Email: tech@time.rwth-aachen.de
  • Location: Germany

TIME has crafted a mission and vision statement meant to reflect our sense of purpose and our objectives in research and teaching.

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Selzner
    given-names: Paul
  - family-names: Evers
    given-names: Felix
  - family-names: Kalhorn
    given-names: Paul
  - family-names: Beckers
    given-names: Lukas
title: Shoutout
version: v1.0.0
date-released: 2024-12-19
url: https://github.com/RWTH-TIME/shoutout
repository-code: https://github.com/RWTH-TIME/shoutout
doi: 10.5281/zenodo.14527349
                          

GitHub Events

Total
  • Create event: 19
  • Issues event: 6
  • Release event: 1
  • Watch event: 1
  • Delete event: 16
  • Issue comment event: 14
  • Push event: 356
  • Pull request review event: 19
  • Pull request event: 55
  • Fork event: 1
Last Year
  • Create event: 19
  • Issues event: 6
  • Release event: 1
  • Watch event: 1
  • Delete event: 16
  • Issue comment event: 14
  • Push event: 356
  • Pull request review event: 19
  • Pull request event: 55
  • Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 3
  • Total pull requests: 24
  • Average time to close issues: 7 months
  • Average time to close pull requests: 10 days
  • Total issue authors: 3
  • Total pull request authors: 4
  • Average comments per issue: 2.33
  • Average comments per pull request: 0.29
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 18
Past Year
  • Issues: 2
  • Pull requests: 24
  • Average time to close issues: 21 days
  • Average time to close pull requests: 10 days
  • Issue authors: 2
  • Pull request authors: 4
  • Average comments per issue: 3.5
  • Average comments per pull request: 0.29
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 18
Top Authors
Issue Authors
  • PaulKalho (6)
  • use-to (4)
  • pselzner (3)
  • rob7xiv (1)
  • LukasBeckers (1)
  • boomerangv (1)
  • renovate[bot] (1)
  • philipp-leibner (1)
Pull Request Authors
  • renovate[bot] (57)
  • PaulKalho (14)
  • pselzner (3)
  • use-to (2)
  • felixevers (2)
  • LukasBeckers (1)
  • philipp-leibner (1)
Top Labels
Issue Labels
bug (3) dashboard (2) worker (2) chore (1) deps (1)
Pull Request Labels
deps (56) enhancement (1) worker (1) bug (1)

Dependencies

dashboard/Dockerfile docker
  • node lts-alpine build
docker-compose.yml docker
  • minio/mc latest
  • postgres latest
  • quay.io/minio/minio latest
  • rabbitmq management
worker/Dockerfile docker
  • pytorch/pytorch latest build
dashboard/package-lock.json npm
  • 422 dependencies
dashboard/package.json npm
  • @types/amqplib ^0.10.1 development
  • @types/node 20.1.5 development
  • @types/react 18.2.6 development
  • @types/react-dom 18.2.4 development
  • @types/uuid ^9.0.6 development
  • @typescript-eslint/eslint-plugin ^5.59.7 development
  • @typescript-eslint/parser ^5.59.7 development
  • eslint ^8.40.0 development
  • eslint-config-next 13.4.2 development
  • eslint-config-prettier ^8.8.0 development
  • eslint-plugin-prettier ^4.2.1 development
  • prettier ^2.8.8 development
  • typescript 5.0.4 development
  • @emotion/react ^11.11.0
  • @emotion/styled ^11.11.0
  • @mui/icons-material ^5.11.16
  • @mui/material ^5.13.3
  • @prisma/client ^4.14.0
  • amqplib ^0.10.3
  • minio ^7.1.2
  • next 13.4.2
  • prisma ^4.14.0
  • react 18.2.0
  • react-dom 18.2.0
  • swr ^2.1.5
  • uuid ^9.0.1
worker/requirements.txt pypi
  • boto3 ==1.28.60
  • botocore ==1.31.60
  • jmespath ==1.0.1
  • openai-whisper ==20230918
  • pika ==1.3.2
  • psycopg2-binary ==2.9.8
  • pyannote.audio ==3.0.0
  • pydantic ==2.4.2
  • pydantic-settings ==2.0.3
  • pydantic_core ==2.10.1
  • pydub ==0.25.1
  • python-dateutil ==2.8.2
  • python-dotenv ==1.0.0
  • s3transfer ==0.7.0
  • six ==1.16.0
  • torch ==2.0.0
  • torchaudio ==2.0.0
  • types-pika ==1.2.0b1
  • typing_extensions ==4.8.0
  • urllib3 ==1.26.17
.github/workflows/cicd.yaml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • actions/setup-python v4 composite
  • docker/build-push-action v5 composite
  • docker/login-action v3 composite
  • docker/metadata-action v5 composite
  • py-actions/flake8 v2 composite
docker-compose.prod.yml docker
  • ghcr.io/rwth-time/shoutout/dashboard latest
  • ghcr.io/rwth-time/shoutout/worker latest
  • minio/mc latest
  • postgres latest
  • quay.io/minio/minio latest
  • rabbitmq management