https://github.com/awslabs/amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.

https://github.com/awslabs/amazon-transcribe-streaming-sdk

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.6%) to scientific vocabulary

Keywords

amazon-transcribe asyncio aws python

Keywords from Contributors

aws-cli cloud-management aws-sdk
Last synced: 5 months ago · JSON representation

Repository

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.

Basic Info
  • Host: GitHub
  • Owner: awslabs
  • License: apache-2.0
  • Language: Python
  • Default Branch: develop
  • Homepage:
  • Size: 273 KB
Statistics
  • Stars: 177
  • Watchers: 10
  • Forks: 52
  • Open Issues: 33
  • Releases: 0
Topics
amazon-transcribe asyncio aws python
Created over 5 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.md

Amazon Transcribe Streaming SDK

The Amazon Transcribe Streaming SDK allows users to directly interface with the Amazon Transcribe Streaming service and their Python programs. The goal of the project is to enable users to integrate directly with Amazon Transcribe without needing anything more than a stream of audio bytes and a basic handler.

It's highly advised to pin to strict dependencies if using this outside of local testing. Please note awscrt is a dependency shared with botocore (the core module of AWS CLI and boto3). You may need to keep amazon-transcribe at the latest version when installed in the same environment.

[!NOTE] This project was launched as a proof of concept and is no longer actively developed. It is not an official AWS product and is provided as-is, without a support commitment. This package can, in rare cases, suffer from high CPU issues (#109, #84).

Installation

To install from pip: bash python -m pip install amazon-transcribe

To install from Github: bash git clone https://github.com/awslabs/amazon-transcribe-streaming-sdk.git cd amazon-transcribe-streaming-sdk python -m pip install .

To use from your Python application, add amazon-transcribe as a dependency in your requirements.txt file.

NOTE: This SDK is built on top of the AWS Common Runtime (CRT), a collection of C libraries we interact with through bindings. The CRT is available on PyPI (awscrt) as precompiled wheels for common platforms (Linux, macOS, Windows). Non-standard operating systems may need to compile these libraries themselves.

Usage

Prerequisites

If you don't already have local credentials setup for your AWS account, you can follow this guide for configuring them using the AWS CLI.

In essence you'll need one of these authentication configurations setup in order for the SDK to successfully resolve your API keys:

  1. Set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and optionally the AWS_SESSION_TOKEN environment variables
  2. Set the AWS_PROFILE pointing to your AWS profile directory
  3. Configure the [default] profile in ~/.aws/credentials

For more details on the AWS shared configuration file and credential provider usage, check the following developer guides:

Quick Start

Setup for this SDK will require either live or prerecorded audio. Full details on the audio input requirements can be found in the Amazon Transcribe Streaming documentation.

Here's an example app to get started: ```python import asyncio

This example uses aiofile for asynchronous file reads.

It's not a dependency of the project but can be installed

with pip install aiofile.

import aiofile

from amazontranscribe.client import TranscribeStreamingClient from amazontranscribe.handlers import TranscriptResultStreamHandler from amazontranscribe.model import TranscriptEvent from amazontranscribe.utils import applyrealtimedelay

""" Here's an example of a custom event handler you can extend to process the returned transcription results as needed. This handler will simply print the text out to your interpreter. """

SAMPLERATE = 16000 BYTESPERSAMPLE = 2 CHANNELNUMS = 1

An example file can be found at tests/integration/assets/test.wav

AUDIOPATH = "tests/integration/assets/test.wav" CHUNKSIZE = 1024 * 8 REGION = "us-west-2"

class MyEventHandler(TranscriptResultStreamHandler): async def handletranscriptevent(self, transcriptevent: TranscriptEvent): # This handler can be implemented to handle transcriptions as needed. # Here's an example to get started. results = transcriptevent.transcript.results for result in results: for alt in result.alternatives: print(alt.transcript)

async def basic_transcribe(): # Setup up our client with our chosen AWS region client = TranscribeStreamingClient(region=REGION)

# Start transcription to generate our async stream
stream = await client.start_stream_transcription(
    language_code="en-US",
    media_sample_rate_hz=SAMPLE_RATE,
    media_encoding="pcm",
)

async def write_chunks():
    # NOTE: For pre-recorded files longer than 5 minutes, the sent audio
    # chunks should be rate limited to match the realtime bitrate of the
    # audio stream to avoid signing issues.
    async with aiofile.AIOFile(AUDIO_PATH, "rb") as afp:
        reader = aiofile.Reader(afp, chunk_size=CHUNK_SIZE)
        await apply_realtime_delay(
            stream, reader, BYTES_PER_SAMPLE, SAMPLE_RATE, CHANNEL_NUMS
        )
    await stream.input_stream.end_stream()

# Instantiate our handler and start processing events
handler = MyEventHandler(stream.output_stream)
await asyncio.gather(write_chunks(), handler.handle_events())

loop = asyncio.geteventloop() loop.rununtilcomplete(basic_transcribe()) loop.close() ```

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner

  • Name: Amazon Web Services - Labs
  • Login: awslabs
  • Kind: organization
  • Location: Seattle, WA

AWS Labs

GitHub Events

Total
  • Create event: 5
  • Issues event: 13
  • Watch event: 20
  • Delete event: 1
  • Member event: 2
  • Issue comment event: 35
  • Push event: 11
  • Pull request review event: 15
  • Pull request review comment event: 10
  • Pull request event: 13
  • Fork event: 10
Last Year
  • Create event: 5
  • Issues event: 13
  • Watch event: 20
  • Delete event: 1
  • Member event: 2
  • Issue comment event: 35
  • Push event: 11
  • Pull request review event: 15
  • Pull request review comment event: 10
  • Pull request event: 13
  • Fork event: 10

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 87
  • Total Committers: 13
  • Avg Commits per committer: 6.692
  • Development Distribution Score (DDS): 0.529
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Nate Prewitt N****t@g****m 41
Jordan Guymon j****d@g****m 22
Nate Prewitt n****t@g****m 9
David Miller 4****3 3
Guido Scalise g****e 3
Zidaan Dutta z****a@a****m 2
Adrienne Breland a****d@g****o 1
Brian Morton r****5@g****m 1
Gonzalo Arro g****o@h****m 1
Jon der 🇨🇳👑 1****n@g****m 1
Yuval Adam _@y****l 1
Amazon GitHub Automation 5****o 1
jonathan343 4****3 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 63
  • Total pull requests: 63
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 53
  • Total pull request authors: 19
  • Average comments per issue: 2.46
  • Average comments per pull request: 0.98
  • Merged pull requests: 55
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 8
  • Pull requests: 14
  • Average time to close issues: 4 months
  • Average time to close pull requests: 6 days
  • Issue authors: 8
  • Pull request authors: 3
  • Average comments per issue: 0.88
  • Average comments per pull request: 0.79
  • Merged pull requests: 10
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sm-andrew-w (3)
  • haldernayan (3)
  • gscalise (3)
  • ajay960singh (2)
  • anshitmt (2)
  • juanmol (2)
  • marinos123 (2)
  • sids07 (1)
  • dan-woz (1)
  • subramanichetan (1)
  • aj7tesh (1)
  • Marware (1)
  • parikls (1)
  • hoanguyen401 (1)
  • zhuermu (1)
Pull Request Authors
  • nateprewitt (23)
  • mirobat (8)
  • joguSD (6)
  • sinwar (4)
  • dlm6693 (3)
  • gscalise (3)
  • patrickbradshawdallas (2)
  • mbatchkarov (2)
  • zdutta (2)
  • dependabot[bot] (2)
  • derpferd (1)
  • ricardofunke (1)
  • abreland (1)
  • jonathan343 (1)
  • daniel-balosh (1)
Top Labels
Issue Labels
feature-request (5) question (3) documentation (2) enhancement (2) service api (2) needs more info (1) duplicate (1)
Pull Request Labels
dependencies (2) hacktoberfest-accepted (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 127,753 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 12
  • Total versions: 12
  • Total maintainers: 3
pypi.org: amazon-transcribe

Async Python SDK for Amazon Transcribe Streaming

  • Versions: 12
  • Dependent Packages: 1
  • Dependent Repositories: 12
  • Downloads: 127,753 Last month
Rankings
Downloads: 3.0%
Dependent repos count: 4.2%
Dependent packages count: 4.8%
Average: 5.3%
Stargazers count: 7.1%
Forks count: 7.3%
Maintainers (3)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • sphinx <5.3
  • sphinx-autodoc-typehints <1.12.0
  • sphinx_rtd_theme ==0.5.0
requirements-dev.txt pypi
  • mypy <1.0
  • pytest <6.3
  • pytest-asyncio <0.16.0
  • pytest-cov <2.13
requirements-release.txt pypi
  • twine >=3.4.1
  • wheel >=0.36.2
.github/workflows/ci.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
.github/workflows/lint.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pre-commit/action v2.0.0 composite
.github/workflows/typecheck.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
setup.py pypi