Kafka is what is known as a message queue, which is basically used to move large amounts of data from place to place in an effective manner.
Imagine for a second that you have an office desk with an inbox and an outbox. Work orders come in and they go on top of your inbox. When you come in, you take the top order, do the work, and put the result in your outbox. Periodically, someone will take things from your outbox and put them where they need to go. You have three coworkers who are doing the same thing. Problem is, for whatever reason, you’re getting 3x the number of work orders as your coworkers, and it’s gotten to the point where your inbox keeps getting bigger, and so you never reach the bottom of the pile, so the oldest orders are just languishing.
If we replace this instead with a queue system, you always start with the oldest entry and work your way forwards – queues are first in, first out, while the previous inbox was a stack, that is first in, last out. So you’ve solved one problem there. Also, instead of having separate inboxes for each person, you have a single queue, and everyone in your working group just takes the oldest item from the queue when they need more work.
One other thing Kafka does is allow you to have different working groups, each with a different pointer in the queue. So even though you and your coworkers are taking things from the front of the queue, they aren’t actually leaving, and some other group, like an auditing office, can step through the same exact series of orders that your team did without either stepping on the other’s toes. There’s also tools to make sure every result gets fully processed before it’s released. So if your coworker pulls down an order, does the first three steps, but then gets fired before completing the work, the order goes back to the queue and someone else will pick it up to make sure it actually gets processed correctly.
So how is this used in the real world? Constantly. A lot of banking software is built with Kafka. As people submit withdrawals, deposits, debit card transactions, etc, those actions get added to the Kafka queue and processed. Rather than, say, bombarding an internet endpoint with requests that might get lost if too many come in at once, these queues can be used to ensure every single transaction gets processed end to end exactly once, nothing gets lost, and multiple groups can step through the same data to audit it or create logs or monitor for fraudulent transactions etc.
Latest Answers