Kafka Çorbası: September 2024

Geliştirme Dili

Kafka LinkedIn tarafından Java + Scala kullanılarak geliştirildi. Açıklaması şöyle.

Apache Kafka is an open-source stream-processing software platform developed by LinkedIn and donated to Apache Software Foundation. It is written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency streaming platform for handling and processing real-time data feeds.

Tarihçe

Kafka 2010 yılında geliştirilmeye başladı. Açıklaması şöyle

In 2010, LinkedIn engineers faced the problem of integrating huge amounts of data from their infrastructure into a Lambda architecture. It also included Hadoop and real-time event processing systems.

As for traditional message brokers, they didn't satisfy LinkedIn's needs. These solutions were too heavy and slow. So, the engineering team developed a scalable and fault-tolerant messaging system without lots of bells and whistles. The new queue manager has quickly transformed into a full-fledged event streaming platform.

Açık Kaynak Olması

2011 yılında açık kaynak oldu ve daha sonra Apache Foundation'a devredildi

Confluent İle İlişkisi

2014 yılında Kafka'nın geliştiricileri LinkedIn'den ayrıldı ve Confluent şirketini kurdu. Confluent 2021 yılında halka arz edildi

Kafka'nın Sıkıntıları

1. Farklılaşan Gecikme Gereksinimleri

Açıklaması şöyle. Yani Kafka herkesin gecikme isterlerini karşılamıyor

The latency expectations for modern systems have become more polarized. While financial services demand microsecond-level latency for stock trading, other use cases — such as logging or syncing data between operational databases and analytical systems — are fine with second-level latency. A one-size-fits-all solution doesn’t work anymore. Why should a company using Kafka for simple logging pay the same costs as one building mission-critical low-latency applications?

2. Batch systems are building their own ingestion tools

Açıklaması şöyle. Yani veriyi taşımak için farklı seçenekler var

Platforms like Snowflake with Snowpipe, Amazon Redshift with its noETL tool and ClickHouse, which recently acquired PeerDB, now offer built-in streaming data ingestion. These developments reduce the need for Kafka as the go-to system for moving data between environments. Kafka is no longer the only option for feeding data into analytical systems, leading to natural fragmentation in its traditional use cases.

3. Cloud infrastructure has made storage cheaper

çıklaması şöyle. Yani veriyi taşımak için farklı seçenekler var

Object storage solutions like Amazon S3 have become significantly more affordable than compute nodes such as EC2. This makes it increasingly hard to justify using more expensive storage options, especially in a world where companies are constantly optimizing their cloud costs. As a result, Kafka needs to embrace architectures that take advantage of cheaper storage options or risk becoming an overly expensive component in data pipelines.

Kafka Çorbası

Wednesday, September 25, 2024

Kafka'yı Kim Geliştirdi?

Bufstream - Kafka Muadili

Report Abuse

Labels