https://github.com/awslabs/amazon-transcribe-streaming-sdk

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.6%) to scientific vocabulary

Keywords

amazon-transcribe asyncio aws python

Keywords from Contributors

aws-cli cloud-management aws-sdk

Last synced: 5 months ago · JSON representation

Repository

The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: Python
Default Branch: develop
Homepage:
Size: 273 KB

Statistics

Stars: 177
Watchers: 10
Forks: 52
Open Issues: 33
Releases: 0

Topics

amazon-transcribe asyncio aws python

Created over 5 years ago · Last pushed 10 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct

Amazon Transcribe Streaming SDK

The Amazon Transcribe Streaming SDK allows users to directly interface with the Amazon Transcribe Streaming service and their Python programs. The goal of the project is to enable users to integrate directly with Amazon Transcribe without needing anything more than a stream of audio bytes and a basic handler.

It's highly advised to pin to strict dependencies if using this outside of local testing. Please note awscrt is a dependency shared with botocore (the core module of AWS CLI and boto3). You may need to keep amazon-transcribe at the latest version when installed in the same environment.

[!NOTE] This project was launched as a proof of concept and is no longer actively developed. It is not an official AWS product and is provided as-is, without a support commitment. This package can, in rare cases, suffer from high CPU issues (#109, #84).

Installation

To install from pip: bash python -m pip install amazon-transcribe

To install from Github: bash git clone https://github.com/awslabs/amazon-transcribe-streaming-sdk.git cd amazon-transcribe-streaming-sdk python -m pip install .

To use from your Python application, add amazon-transcribe as a dependency in your requirements.txt file.

NOTE: This SDK is built on top of the AWS Common Runtime (CRT), a collection of C libraries we interact with through bindings. The CRT is available on PyPI (awscrt) as precompiled wheels for common platforms (Linux, macOS, Windows). Non-standard operating systems may need to compile these libraries themselves.

Usage

Prerequisites

If you don't already have local credentials setup for your AWS account, you can follow this guide for configuring them using the AWS CLI.

In essence you'll need one of these authentication configurations setup in order for the SDK to successfully resolve your API keys:

Set the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and optionally the AWS_SESSION_TOKEN environment variables
Set the AWS_PROFILE pointing to your AWS profile directory
Configure the [default] profile in ~/.aws/credentials

For more details on the AWS shared configuration file and credential provider usage, check the following developer guides:

Quick Start

Setup for this SDK will require either live or prerecorded audio. Full details on the audio input requirements can be found in the Amazon Transcribe Streaming documentation.

Here's an example app to get started: ```python import asyncio

This example uses aiofile for asynchronous file reads.

It's not a dependency of the project but can be installed

with `pip install aiofile`.

import aiofile

from amazontranscribe.client import TranscribeStreamingClient from amazontranscribe.handlers import TranscriptResultStreamHandler from amazontranscribe.model import TranscriptEvent from amazontranscribe.utils import applyrealtimedelay

""" Here's an example of a custom event handler you can extend to process the returned transcription results as needed. This handler will simply print the text out to your interpreter. """

SAMPLERATE = 16000 BYTESPERSAMPLE = 2 CHANNELNUMS = 1

An example file can be found at tests/integration/assets/test.wav

AUDIOPATH = "tests/integration/assets/test.wav" CHUNKSIZE = 1024 * 8 REGION = "us-west-2"

class MyEventHandler(TranscriptResultStreamHandler): async def handletranscriptevent(self, transcriptevent: TranscriptEvent): # This handler can be implemented to handle transcriptions as needed. # Here's an example to get started. results = transcriptevent.transcript.results for result in results: for alt in result.alternatives: print(alt.transcript)

async def basic_transcribe(): # Setup up our client with our chosen AWS region client = TranscribeStreamingClient(region=REGION)

# Start transcription to generate our async stream
stream = await client.start_stream_transcription(
    language_code="en-US",
    media_sample_rate_hz=SAMPLE_RATE,
    media_encoding="pcm",
)

async def write_chunks():
    # NOTE: For pre-recorded files longer than 5 minutes, the sent audio
    # chunks should be rate limited to match the realtime bitrate of the
    # audio stream to avoid signing issues.
    async with aiofile.AIOFile(AUDIO_PATH, "rb") as afp:
        reader = aiofile.Reader(afp, chunk_size=CHUNK_SIZE)
        await apply_realtime_delay(
            stream, reader, BYTES_PER_SAMPLE, SAMPLE_RATE, CHANNEL_NUMS
        )
    await stream.input_stream.end_stream()

# Instantiate our handler and start processing events
handler = MyEventHandler(stream.output_stream)
await asyncio.gather(write_chunks(), handler.handle_events())

loop = asyncio.geteventloop() loop.rununtilcomplete(basic_transcribe()) loop.close() ```

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Create event: 5
Issues event: 13
Watch event: 20
Delete event: 1
Member event: 2
Issue comment event: 35
Push event: 11
Pull request review event: 15
Pull request review comment event: 10
Pull request event: 13
Fork event: 10

Last Year

Create event: 5
Issues event: 13
Watch event: 20
Delete event: 1
Member event: 2
Issue comment event: 35
Push event: 11
Pull request review event: 15
Pull request review comment event: 10
Pull request event: 13
Fork event: 10

Committers

Last synced: over 1 year ago

All Time

Total Commits: 87
Total Committers: 13
Avg Commits per committer: 6.692
Development Distribution Score (DDS): 0.529

Past Year

Commits: 3
Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Nate Prewitt	N**t@g**m	41
Jordan Guymon	j**d@g**m	22
Nate Prewitt	n**t@g**m	9
David Miller	4****3	3
Guido Scalise	g****e	3
Zidaan Dutta	z**a@a**m	2
Adrienne Breland	a**d@g**o	1
Brian Morton	r**5@g**m	1
Gonzalo Arro	g**o@h**m	1
Jon der 🇨🇳👑	1**n@g**m	1
Yuval Adam	_@y****l	1
Amazon GitHub Automation	5****o	1
jonathan343	4****3	1

Committer Domains (Top 20 + Academic)

yuv.al: 1 galt.aero: 1 amazon.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 63
Total pull requests: 63
Average time to close issues: 4 months
Average time to close pull requests: about 1 month
Total issue authors: 53
Total pull request authors: 19
Average comments per issue: 2.46
Average comments per pull request: 0.98
Merged pull requests: 55
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 8
Pull requests: 14
Average time to close issues: 4 months
Average time to close pull requests: 6 days
Issue authors: 8
Pull request authors: 3
Average comments per issue: 0.88
Average comments per pull request: 0.79
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sm-andrew-w (3)
haldernayan (3)
gscalise (3)
ajay960singh (2)
anshitmt (2)
juanmol (2)
marinos123 (2)
sids07 (1)
dan-woz (1)
subramanichetan (1)
aj7tesh (1)
Marware (1)
parikls (1)
hoanguyen401 (1)
zhuermu (1)

Pull Request Authors

nateprewitt (23)
mirobat (8)
joguSD (6)
sinwar (4)
dlm6693 (3)
gscalise (3)
patrickbradshawdallas (2)
mbatchkarov (2)
zdutta (2)
dependabot[bot] (2)
derpferd (1)
ricardofunke (1)
abreland (1)
jonathan343 (1)
daniel-balosh (1)

Top Labels

Issue Labels

feature-request (5) question (3) documentation (2) enhancement (2) service api (2) needs more info (1) duplicate (1)

Pull Request Labels

dependencies (2) hacktoberfest-accepted (1)

Packages

Total packages: 1
Total downloads:
- pypi 127,753 last-month

Total dependent packages: 1
Total dependent repositories: 12
Total versions: 12
Total maintainers: 3

pypi.org: amazon-transcribe

Async Python SDK for Amazon Transcribe Streaming

Homepage: https://github.com/awslabs/amazon-transcribe-streaming-sdk
Documentation: https://amazon-transcribe.readthedocs.io/
License: Apache License 2.0
Latest release: 0.6.4
published 10 months ago

Versions: 12
Dependent Packages: 1
Dependent Repositories: 12
Downloads: 127,753 Last month

Rankings

Downloads: 3.0%

Dependent repos count: 4.2%

Dependent packages count: 4.8%

Average: 5.3%

Stargazers count: 7.1%

Forks count: 7.3%

Maintainers (3)

aws mirobat srivsb

Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi

sphinx <5.3
sphinx-autodoc-typehints <1.12.0
sphinx_rtd_theme ==0.5.0

requirements-dev.txt pypi

mypy <1.0
pytest <6.3
pytest-asyncio <0.16.0
pytest-cov <2.13

requirements-release.txt pypi

twine >=3.4.1
wheel >=0.36.2

.github/workflows/ci.yml actions

actions/checkout v1 composite
actions/setup-python v2 composite
codecov/codecov-action v1 composite

.github/workflows/lint.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
pre-commit/action v2.0.0 composite

.github/workflows/typecheck.yml actions

actions/checkout v1 composite
actions/setup-python v1 composite

setup.py pypi

https://github.com/awslabs/amazon-transcribe-streaming-sdk

Science Score: 26.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Amazon Transcribe Streaming SDK

Installation

Usage

Prerequisites

Quick Start

This example uses aiofile for asynchronous file reads.

It's not a dependency of the project but can be installed

with pip install aiofile.

An example file can be found at tests/integration/assets/test.wav

Security

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: amazon-transcribe

Rankings

Maintainers (3)

Dependencies

with `pip install aiofile`.