Kafka Çorbası: Kafka Streams Kullanımı

Giriş

Açıklaması şöyle

Kafka Streams, like Kafka Connect, are part of open-source Apache Kafka. Hence, the Java library is included if you download Kafka from the Apache website. It is already included in the data streaming landscape with the Kafka logo. You should always ask yourself if you need another framework besides Kafka Streams for stream processing. The significant benefit: One technology, one vendor, one infrastructure.

Many vendors exclude or do not focus on Kafka Streams and Kafka Connect and only offer incomplete Kafka; they want to sell their own integration and processing products instead.

Programlama Dili

Açıklaması şöyle

The Kafka Streams API supports JVM languages, including Java and Scala—so you can only import the library into Java and Scala applications. Although several Kafka and Kafka Stream client APIs have been developed by different user communities in other programming languages, including Python and C/C++, these solutions are not Kafka-native. So compared to other stream processing technologies, the language support for Kafka Streams is quite limited.

Data Parallelism

Açıklaması şöyle. Destekler

Kafka Streams has inherent data parallelism, which allows it to distribute and assign input data stream partitions (or topics) to different tasks created from the application processor topology. Kafka Streams runs anywhere the Kafka Stream application instance is run, and it allows you to scale for high-volume workloads by running extra instances on many machines. That’s a key advantage that Kafka Streams has over a lot of other stream processing applications; it doesn’t need a dedicated compute cluster, making it a lot faster and simpler to use.

Fault Tolerance

Açıklaması şöyle. Destekler

Whenever a Kafka Streams application instance fails, another instance can simply pick up the data automatically and restart the task. This is possible because the stream data is persisted in Kafka.

SQL support

Açıklaması şöyle. Desteklemez

Sadly, Kafka Streams does not natively provide SQL support. Again, different communities and developers have several solutions built on Kafka and Kafka Streams that address this.

ML library support

Açıklaması şöyle. Desteklemez

A limitation of Kafka Stream for machine learning is that it does not have a built-in ML library that easily connects with it in the Kafka ecosystem. Building an ML library on top of Kafka Streams is not straightforward either; while Java and Scala dominate data engineering and streaming, Python is the major language in machine learning.

Windowing support

Açıklaması şöyle. Destekler

Windowing allows you to group stream records based on time for state operations. Each window allows you to see a snapshot of the stream aggregate within a timeframe. Without windowing, aggregation of streams will continue to accumulate as data comes in.

Kafka Streams support the following types of windowing:

1. Hopping. This is simply a time-bounded window.
2. Tumbling. Like hopping, but it advances at the same time period.
3. Session. Not time-bounded.
4. Sliding. Time-bounded, but it’s based on the time difference between two records.

State stores

Açıklaması şöyle. Destekler

Maintaining state in stream processing opens up a lot of possibilities that Kafka Streams exploits really well. Kafka Streams has state stores that your stream processing application can use to implement stateful operations like joins, grouping, and so on. Stateless transformations like filtering and mapping are also provided.

Kafka Çorbası

Wednesday, April 26, 2023

Kafka Streams Kullanımı

No comments:

Post a Comment

Bufstream - Kafka Muadili

Report Abuse

Labels