kafka-python

Python client for Apache Kafka

https://github.com/dpkp/kafka-python

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    3 of 225 committers (1.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.9%) to scientific vocabulary

Keywords

kafka python

Keywords from Contributors

distributed data-mining unit-testing transformer closember agents parallel cryptocurrencies large-language-models shellcode
Last synced: 6 months ago · JSON representation

Repository

Python client for Apache Kafka

Basic Info
Statistics
  • Stars: 5,794
  • Watchers: 142
  • Forks: 1,441
  • Open Issues: 43
  • Releases: 61
Topics
kafka python
Created over 13 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Support Authors

README.rst

Kafka Python client
------------------------

.. image:: https://img.shields.io/badge/kafka-4.0--0.8-brightgreen.svg
    :target: https://kafka-python.readthedocs.io/en/master/compatibility.html
.. image:: https://img.shields.io/pypi/pyversions/kafka-python.svg
    :target: https://pypi.python.org/pypi/kafka-python
.. image:: https://coveralls.io/repos/dpkp/kafka-python/badge.svg?branch=master&service=github
    :target: https://coveralls.io/github/dpkp/kafka-python?branch=master
.. image:: https://img.shields.io/badge/license-Apache%202-blue.svg
    :target: https://github.com/dpkp/kafka-python/blob/master/LICENSE
.. image:: https://img.shields.io/pypi/dw/kafka-python.svg
    :target: https://pypistats.org/packages/kafka-python
.. image:: https://img.shields.io/pypi/v/kafka-python.svg
    :target: https://pypi.org/project/kafka-python
.. image:: https://img.shields.io/pypi/implementation/kafka-python
    :target: https://github.com/dpkp/kafka-python/blob/master/setup.py



Python client for the Apache Kafka distributed stream processing system.
kafka-python is designed to function much like the official java client, with a
sprinkling of pythonic interfaces (e.g., consumer iterators).

kafka-python is best used with newer brokers (0.9+), but is backwards-compatible with
older versions (to 0.8.0). Some features will only be enabled on newer brokers.
For example, fully coordinated consumer groups -- i.e., dynamic partition
assignment to multiple consumers in the same group -- requires use of 0.9+ kafka
brokers. Supporting this feature for earlier broker releases would require
writing and maintaining custom leadership election and membership / health
check code (perhaps using zookeeper or consul). For older brokers, you can
achieve something similar by manually assigning different partitions to each
consumer instance with config management tools like chef, ansible, etc. This
approach will work fine, though it does not support rebalancing on failures.
See https://kafka-python.readthedocs.io/en/master/compatibility.html
for more details.

Please note that the master branch may contain unreleased features. For release
documentation, please see readthedocs and/or python's inline help.

.. code-block:: bash

    $ pip install kafka-python


KafkaConsumer
*************

KafkaConsumer is a high-level message consumer, intended to operate as similarly
as possible to the official java client. Full support for coordinated
consumer groups requires use of kafka brokers that support the Group APIs: kafka v0.9+.

See https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html
for API and configuration details.

The consumer iterator returns ConsumerRecords, which are simple namedtuples
that expose basic message attributes: topic, partition, offset, key, and value:

.. code-block:: python

    from kafka import KafkaConsumer
    consumer = KafkaConsumer('my_favorite_topic')
    for msg in consumer:
        print (msg)

.. code-block:: python

    # join a consumer group for dynamic partition assignment and offset commits
    from kafka import KafkaConsumer
    consumer = KafkaConsumer('my_favorite_topic', group_id='my_favorite_group')
    for msg in consumer:
        print (msg)

.. code-block:: python

    # manually assign the partition list for the consumer
    from kafka import TopicPartition
    consumer = KafkaConsumer(bootstrap_servers='localhost:1234')
    consumer.assign([TopicPartition('foobar', 2)])
    msg = next(consumer)

.. code-block:: python

    # Deserialize msgpack-encoded values
    consumer = KafkaConsumer(value_deserializer=msgpack.loads)
    consumer.subscribe(['msgpackfoo'])
    for msg in consumer:
        assert isinstance(msg.value, dict)

.. code-block:: python

    # Access record headers. The returned value is a list of tuples
    # with str, bytes for key and value
    for msg in consumer:
        print (msg.headers)

.. code-block:: python

    # Read only committed messages from transactional topic
    consumer = KafkaConsumer(isolation_level='read_committed')
    consumer.subscribe(['txn_topic'])
    for msg in consumer:
        print(msg)

.. code-block:: python

    # Get consumer metrics
    metrics = consumer.metrics()


KafkaProducer
*************

KafkaProducer is a high-level, asynchronous message producer. The class is
intended to operate as similarly as possible to the official java client.
See https://kafka-python.readthedocs.io/en/master/apidoc/KafkaProducer.html
for more details.

.. code-block:: python

    from kafka import KafkaProducer
    producer = KafkaProducer(bootstrap_servers='localhost:1234')
    for _ in range(100):
        producer.send('foobar', b'some_message_bytes')

.. code-block:: python

    # Block until a single message is sent (or timeout)
    future = producer.send('foobar', b'another_message')
    result = future.get(timeout=60)

.. code-block:: python

    # Block until all pending messages are at least put on the network
    # NOTE: This does not guarantee delivery or success! It is really
    # only useful if you configure internal batching using linger_ms
    producer.flush()

.. code-block:: python

    # Use a key for hashed-partitioning
    producer.send('foobar', key=b'foo', value=b'bar')

.. code-block:: python

    # Serialize json messages
    import json
    producer = KafkaProducer(value_serializer=lambda v: json.dumps(v).encode('utf-8'))
    producer.send('fizzbuzz', {'foo': 'bar'})

.. code-block:: python

    # Serialize string keys
    producer = KafkaProducer(key_serializer=str.encode)
    producer.send('flipflap', key='ping', value=b'1234')

.. code-block:: python

    # Compress messages
    producer = KafkaProducer(compression_type='gzip')
    for i in range(1000):
        producer.send('foobar', b'msg %d' % i)

.. code-block:: python

    # Use transactions
    producer = KafkaProducer(transactional_id='fizzbuzz')
    producer.init_transactions()
    producer.begin_transaction()
    future = producer.send('txn_topic', value=b'yes')
    future.get() # wait for successful produce
    producer.commit_transaction() # commit the transaction

    producer.begin_transaction()
    future = producer.send('txn_topic', value=b'no')
    future.get() # wait for successful produce
    producer.abort_transaction() # abort the transaction

.. code-block:: python

    # Include record headers. The format is list of tuples with string key
    # and bytes value.
    producer.send('foobar', value=b'c29tZSB2YWx1ZQ==', headers=[('content-encoding', b'base64')])

.. code-block:: python

    # Get producer performance metrics
    metrics = producer.metrics()


Thread safety
*************

The KafkaProducer can be used across threads without issue, unlike the
KafkaConsumer which cannot.

While it is possible to use the KafkaConsumer in a thread-local manner,
multiprocessing is recommended.


Compression
***********

kafka-python supports the following compression formats:

- gzip
- LZ4
- Snappy
- Zstandard (zstd)

gzip is supported natively, the others require installing additional libraries.
See https://kafka-python.readthedocs.io/en/master/install.html for more information.


Optimized CRC32 Validation
**************************

Kafka uses CRC32 checksums to validate messages. kafka-python includes a pure
python implementation for compatibility. To improve performance for high-throughput
applications, kafka-python will use `crc32c` for optimized native code if installed.
See https://kafka-python.readthedocs.io/en/master/install.html for installation instructions.
See https://pypi.org/project/crc32c/ for details on the underlying crc32c lib.


Protocol
********

A secondary goal of kafka-python is to provide an easy-to-use protocol layer
for interacting with kafka brokers via the python repl. This is useful for
testing, probing, and general experimentation. The protocol support is
leveraged to enable a KafkaClient.check_version() method that
probes a kafka broker and attempts to identify which version it is running
(0.8.0 to 2.6+).

Owner

  • Name: Dana Powers
  • Login: dpkp
  • Kind: user
  • Location: San Francisco, CA

GitHub Events

Total
  • Create event: 165
  • Release event: 21
  • Issues event: 296
  • Watch event: 216
  • Delete event: 145
  • Issue comment event: 337
  • Push event: 427
  • Pull request review event: 13
  • Pull request review comment event: 12
  • Pull request event: 352
  • Fork event: 49
Last Year
  • Create event: 165
  • Release event: 21
  • Issues event: 296
  • Watch event: 216
  • Delete event: 145
  • Issue comment event: 337
  • Push event: 427
  • Pull request review event: 13
  • Pull request review comment event: 12
  • Pull request event: 352
  • Fork event: 49

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 2,206
  • Total Committers: 225
  • Avg Commits per committer: 9.804
  • Development Distribution Score (DDS): 0.626
Past Year
  • Commits: 279
  • Committers: 10
  • Avg Commits per committer: 27.9
  • Development Distribution Score (DDS): 0.036
Top Committers
Name Email Commits
Dana Powers d****s@g****m 824
Dana Powers d****s@r****o 527
Jeff Widman j****f@j****m 120
David Arthur m****h@g****m 95
Mahendra M m****m@g****m 48
Mark Roberts w****t@g****m 47
Viktor Shlapakov v****v@g****m 42
Omar Ghishan o****n@r****o 40
Zack Dever z****r@p****m 29
Taras v****1@g****m 27
Bruno Renié b****e@g****m 26
mrtheb m****e@g****m 13
John Anderson s****k@g****m 13
William Barnhart w****t@g****m 11
Vetoshkin Nikita n****n@y****u 10
Enrico Canzonieri e****i@g****m 9
Andre Araujo a****o@g****m 8
Ivan Pouzyrevsky s****o@y****u 8
Mark Roberts m****s@k****m 8
Alex Couture-Beil a****x@m****a 7
Thomas Dimson t****n@g****m 7
Tincu Gabriel g****i@a****o 6
Lou Marvin Caraig l****g@g****m 5
Matthew L Daniel m****l@g****m 5
Tyler Lubeck t****r@c****m 5
Mark Roberts w****t@f****m 4
Jim Lim j****m@q****m 4
Carson Ip c****5@g****m 4
Brian Sang s****i@g****m 4
Heikki Nousiainen h****n@a****o 4
and 195 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 357
  • Total pull requests: 425
  • Average time to close issues: about 3 years
  • Average time to close pull requests: 7 months
  • Total issue authors: 315
  • Total pull request authors: 74
  • Average comments per issue: 3.0
  • Average comments per pull request: 0.88
  • Merged pull requests: 275
  • Bot issues: 0
  • Bot pull requests: 10
Past Year
  • Issues: 46
  • Pull requests: 302
  • Average time to close issues: 4 days
  • Average time to close pull requests: about 17 hours
  • Issue authors: 41
  • Pull request authors: 17
  • Average comments per issue: 1.78
  • Average comments per pull request: 0.1
  • Merged pull requests: 250
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • jeffwidman (20)
  • dpkp (5)
  • berrfred (3)
  • ghost (2)
  • hackaugusto (2)
  • millerdev (2)
  • indywidualny (2)
  • Pyrrha (2)
  • jacopofar (2)
  • lv123123long (2)
  • f-r-kuznetsov (2)
  • tvoinarovskyi (2)
  • braedon (2)
  • isamaru (2)
  • wbarnha (2)
Pull Request Authors
  • dpkp (275)
  • wbarnha (18)
  • dependabot[bot] (10)
  • Romain-Geissler-1A (5)
  • jeffwidman (5)
  • emmanuel-ferdman (4)
  • hiwakaba (4)
  • mattoberle (3)
  • hackaugusto (3)
  • sunnyakaxd (2)
  • degagne (2)
  • sibiryakov (2)
  • moshez (2)
  • ljluestc (2)
  • bmassemin (2)
Top Labels
Issue Labels
consumer (22) producer (10) bug (6) admin-client (6) test coverage (4) packaging (4) network protocol (3) enhancement (3) help request (3) critical/stability (2) needs investigation (2) help wanted (2) wontfix (1) dependencies (1) duplicate (1) invalid (1) sasl (1)
Pull Request Labels
network protocol (14) dependencies (11) enhancement (5) needs investigation (4) admin-client (4) documentation (3) do-not-merge (3) github_actions (3) test coverage (2) producer (2) wontfix (2) consumer (2) bug (1) packaging (1)

Packages

  • Total packages: 7
  • Total downloads:
    • pypi 19,581,380 last-month
  • Total docker downloads: 2,032,956,638
  • Total dependent packages: 234
    (may contain duplicates)
  • Total dependent repositories: 4,005
    (may contain duplicates)
  • Total versions: 95
  • Total maintainers: 5
pypi.org: kafka-python

Pure Python client for Apache Kafka

  • Versions: 61
  • Dependent Packages: 219
  • Dependent Repositories: 3,616
  • Downloads: 19,084,118 Last month
  • Docker Downloads: 2,031,412,306
Rankings
Docker downloads count: 0.0%
Downloads: 0.1%
Dependent packages count: 0.1%
Dependent repos count: 0.2%
Average: 0.3%
Stargazers count: 0.4%
Forks count: 1.1%
Maintainers (2)
Last synced: 6 months ago
pypi.org: kafka

Pure Python client for Apache Kafka

  • Versions: 17
  • Dependent Packages: 12
  • Dependent Repositories: 348
  • Downloads: 479,837 Last month
  • Docker Downloads: 1,544,189
Rankings
Stargazers count: 0.4%
Downloads: 0.6%
Docker downloads count: 0.8%
Dependent repos count: 0.8%
Average: 0.8%
Forks count: 1.1%
Dependent packages count: 1.3%
Maintainers (1)
Last synced: 6 months ago
pypi.org: kafka-python3

Pure Python client for Apache Kafka

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 32
  • Downloads: 17,243 Last month
  • Docker Downloads: 143
Rankings
Stargazers count: 0.4%
Forks count: 1.1%
Downloads: 1.6%
Average: 2.0%
Dependent repos count: 2.6%
Docker downloads count: 2.8%
Dependent packages count: 3.3%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/dpkp/kafka-python
  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Forks count: 0.7%
Stargazers count: 0.9%
Average: 3.7%
Dependent repos count: 4.8%
Dependent packages count: 8.5%
Last synced: 6 months ago
pypi.org: gc-kafka-python

Pure Python client for Apache Kafka

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 10 Last month
Rankings
Stargazers count: 0.4%
Forks count: 1.1%
Dependent packages count: 7.4%
Average: 8.6%
Dependent repos count: 11.9%
Downloads: 22.4%
Maintainers (1)
Last synced: 6 months ago
pypi.org: ns-kafka-python

Pure Python client for Apache Kafka

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 172 Last month
Rankings
Stargazers count: 0.4%
Forks count: 1.1%
Dependent packages count: 7.4%
Average: 9.3%
Downloads: 15.5%
Dependent repos count: 22.2%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: kafka-python
  • Versions: 4
  • Dependent Packages: 2
  • Dependent Repositories: 5
Rankings
Forks count: 4.1%
Stargazers count: 4.8%
Average: 10.8%
Dependent repos count: 14.7%
Dependent packages count: 19.6%
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • sphinx *
  • sphinx_rtd_theme *
requirements-dev.txt pypi
  • Sphinx ==3.2.1
  • coveralls ==2.1.2
  • crc32c ==2.1
  • docker-py ==1.10.6
  • flake8 ==3.8.3
  • lz4 ==3.1.0
  • mock ==4.0.2
  • py ==1.9.0
  • pylint ==2.6.0
  • pytest ==6.0.2
  • pytest-cov ==2.10.1
  • pytest-mock ==3.3.1
  • pytest-pylint ==0.17.0
  • python-snappy ==0.5.4
  • sphinx-rtd-theme ==0.5.0
  • tox ==3.20.0
  • xxhash ==2.0.0