https://github.com/awslabs/amazon-transcribe-streaming-sdk
The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.6%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
The Amazon Transcribe Streaming SDK is an async Python SDK for converting audio into text via Amazon Transcribe.
Basic Info
Statistics
- Stars: 177
- Watchers: 10
- Forks: 52
- Open Issues: 33
- Releases: 0
Topics
Metadata Files
README.md
Amazon Transcribe Streaming SDK
The Amazon Transcribe Streaming SDK allows users to directly interface with the Amazon Transcribe Streaming service and their Python programs. The goal of the project is to enable users to integrate directly with Amazon Transcribe without needing anything more than a stream of audio bytes and a basic handler.
It's highly advised to pin to strict dependencies if using this outside of local testing. Please note awscrt is a dependency shared with botocore (the core module of AWS CLI and boto3). You may need to keep amazon-transcribe at the latest version when installed in the same environment.
[!NOTE] This project was launched as a proof of concept and is no longer actively developed. It is not an official AWS product and is provided as-is, without a support commitment. This package can, in rare cases, suffer from high CPU issues (#109, #84).
Installation
To install from pip:
bash
python -m pip install amazon-transcribe
To install from Github:
bash
git clone https://github.com/awslabs/amazon-transcribe-streaming-sdk.git
cd amazon-transcribe-streaming-sdk
python -m pip install .
To use from your Python application, add amazon-transcribe as a dependency in your requirements.txt file.
NOTE: This SDK is built on top of the AWS Common Runtime (CRT), a collection of C libraries we interact with through bindings. The CRT is available on PyPI (awscrt) as precompiled wheels for common platforms (Linux, macOS, Windows). Non-standard operating systems may need to compile these libraries themselves.
Usage
Prerequisites
If you don't already have local credentials setup for your AWS account, you can follow this guide for configuring them using the AWS CLI.
In essence you'll need one of these authentication configurations setup in order for the SDK to successfully resolve your API keys:
- Set the
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEYand optionally theAWS_SESSION_TOKENenvironment variables - Set the
AWS_PROFILEpointing to your AWS profile directory - Configure the
[default]profile in~/.aws/credentials
For more details on the AWS shared configuration file and credential provider usage, check the following developer guides:
Quick Start
Setup for this SDK will require either live or prerecorded audio. Full details on the audio input requirements can be found in the Amazon Transcribe Streaming documentation.
Here's an example app to get started: ```python import asyncio
This example uses aiofile for asynchronous file reads.
It's not a dependency of the project but can be installed
with pip install aiofile.
import aiofile
from amazontranscribe.client import TranscribeStreamingClient from amazontranscribe.handlers import TranscriptResultStreamHandler from amazontranscribe.model import TranscriptEvent from amazontranscribe.utils import applyrealtimedelay
""" Here's an example of a custom event handler you can extend to process the returned transcription results as needed. This handler will simply print the text out to your interpreter. """
SAMPLERATE = 16000 BYTESPERSAMPLE = 2 CHANNELNUMS = 1
An example file can be found at tests/integration/assets/test.wav
AUDIOPATH = "tests/integration/assets/test.wav" CHUNKSIZE = 1024 * 8 REGION = "us-west-2"
class MyEventHandler(TranscriptResultStreamHandler): async def handletranscriptevent(self, transcriptevent: TranscriptEvent): # This handler can be implemented to handle transcriptions as needed. # Here's an example to get started. results = transcriptevent.transcript.results for result in results: for alt in result.alternatives: print(alt.transcript)
async def basic_transcribe(): # Setup up our client with our chosen AWS region client = TranscribeStreamingClient(region=REGION)
# Start transcription to generate our async stream
stream = await client.start_stream_transcription(
language_code="en-US",
media_sample_rate_hz=SAMPLE_RATE,
media_encoding="pcm",
)
async def write_chunks():
# NOTE: For pre-recorded files longer than 5 minutes, the sent audio
# chunks should be rate limited to match the realtime bitrate of the
# audio stream to avoid signing issues.
async with aiofile.AIOFile(AUDIO_PATH, "rb") as afp:
reader = aiofile.Reader(afp, chunk_size=CHUNK_SIZE)
await apply_realtime_delay(
stream, reader, BYTES_PER_SAMPLE, SAMPLE_RATE, CHANNEL_NUMS
)
await stream.input_stream.end_stream()
# Instantiate our handler and start processing events
handler = MyEventHandler(stream.output_stream)
await asyncio.gather(write_chunks(), handler.handle_events())
loop = asyncio.geteventloop() loop.rununtilcomplete(basic_transcribe()) loop.close() ```
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Owner
- Name: Amazon Web Services - Labs
- Login: awslabs
- Kind: organization
- Location: Seattle, WA
- Website: http://amazon.com/aws/
- Repositories: 914
- Profile: https://github.com/awslabs
AWS Labs
GitHub Events
Total
- Create event: 5
- Issues event: 13
- Watch event: 20
- Delete event: 1
- Member event: 2
- Issue comment event: 35
- Push event: 11
- Pull request review event: 15
- Pull request review comment event: 10
- Pull request event: 13
- Fork event: 10
Last Year
- Create event: 5
- Issues event: 13
- Watch event: 20
- Delete event: 1
- Member event: 2
- Issue comment event: 35
- Push event: 11
- Pull request review event: 15
- Pull request review comment event: 10
- Pull request event: 13
- Fork event: 10
Committers
Last synced: over 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Nate Prewitt | N****t@g****m | 41 |
| Jordan Guymon | j****d@g****m | 22 |
| Nate Prewitt | n****t@g****m | 9 |
| David Miller | 4****3 | 3 |
| Guido Scalise | g****e | 3 |
| Zidaan Dutta | z****a@a****m | 2 |
| Adrienne Breland | a****d@g****o | 1 |
| Brian Morton | r****5@g****m | 1 |
| Gonzalo Arro | g****o@h****m | 1 |
| Jon der 🇨🇳👑 | 1****n@g****m | 1 |
| Yuval Adam | _@y****l | 1 |
| Amazon GitHub Automation | 5****o | 1 |
| jonathan343 | 4****3 | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 63
- Total pull requests: 63
- Average time to close issues: 4 months
- Average time to close pull requests: about 1 month
- Total issue authors: 53
- Total pull request authors: 19
- Average comments per issue: 2.46
- Average comments per pull request: 0.98
- Merged pull requests: 55
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 8
- Pull requests: 14
- Average time to close issues: 4 months
- Average time to close pull requests: 6 days
- Issue authors: 8
- Pull request authors: 3
- Average comments per issue: 0.88
- Average comments per pull request: 0.79
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sm-andrew-w (3)
- haldernayan (3)
- gscalise (3)
- ajay960singh (2)
- anshitmt (2)
- juanmol (2)
- marinos123 (2)
- sids07 (1)
- dan-woz (1)
- subramanichetan (1)
- aj7tesh (1)
- Marware (1)
- parikls (1)
- hoanguyen401 (1)
- zhuermu (1)
Pull Request Authors
- nateprewitt (23)
- mirobat (8)
- joguSD (6)
- sinwar (4)
- dlm6693 (3)
- gscalise (3)
- patrickbradshawdallas (2)
- mbatchkarov (2)
- zdutta (2)
- dependabot[bot] (2)
- derpferd (1)
- ricardofunke (1)
- abreland (1)
- jonathan343 (1)
- daniel-balosh (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 127,753 last-month
- Total dependent packages: 1
- Total dependent repositories: 12
- Total versions: 12
- Total maintainers: 3
pypi.org: amazon-transcribe
Async Python SDK for Amazon Transcribe Streaming
- Homepage: https://github.com/awslabs/amazon-transcribe-streaming-sdk
- Documentation: https://amazon-transcribe.readthedocs.io/
- License: Apache License 2.0
-
Latest release: 0.6.4
published 10 months ago
Rankings
Dependencies
- sphinx <5.3
- sphinx-autodoc-typehints <1.12.0
- sphinx_rtd_theme ==0.5.0
- mypy <1.0
- pytest <6.3
- pytest-asyncio <0.16.0
- pytest-cov <2.13
- twine >=3.4.1
- wheel >=0.36.2
- actions/checkout v1 composite
- actions/setup-python v2 composite
- codecov/codecov-action v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- pre-commit/action v2.0.0 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite