Backing up the Kafka cluster data

Hi Kafkadmins!

I have a simple question - what options exist out there to backup and restore the data in our Kafka clusters?

Is creation of replica cluster the recommended (as it is mentioned in the official docs) and the only way to backup data in the topics?

Or are there already some other, probably third-party options, which you already used and tested?

Or, probably this is okay (which I doubt is possible) to backup the cluster data just by copying broker logs using something like rsync and then compressing the copy with zip or gzip?

Thanks for sharing your experience! :hugs:

1 Like

At the moment I am using the latter approach with bind volumes in a Container going to a host mounted network share, which is then backed up or a regular basis, runs on RAID etc.

The one downside I see in this is that if there is a catastrophic failure, there won’t be automatic fail-over to some other replica and any restore will only be as good as the last backup.

Be interested to see what other folks are doing.

Edit: Thinking about this, if something takes out the entire storage array, then Kafka will be the least of our problems!

1 Like

Hi @whatsupbros ,
the usage of a dedicated replica/backup cluster is certainly an option, but it also needs to be operated/monitored…which adds quite some work…and you need the additional resources to run that cluster.
Another approach could be to use kafka-connect and dump data into an object store (S3 or compatibles) or in a RDBMS…maybe you already have something available.
And always ensure that you have a tested and reliable restore process !!!
I’d not recommend using low-level tools like rsync/gzip approach to copy over data, because you are only copying a binary format of partitions hosted by a particular broker. This can be a solution if you want to replace a failed single broker, where you can spin-up a new maching and start the broker with the same id as the failed broker and copy back your data.
At the end, as often, it also depends on your UseCase, e.g. if you want to be able to restore only certain topic(s), you need to have a backup on “data”-level (see first 2 examples), not on “storage”-level (as the rsync/gzip)

HTH

1 Like

@roadSurfer, do you mean the rsync approach? Have you already tried to restore the cluster on a different machine after that?

I mean, probably one of the most interesting points here would be whether the cluster is in a consistent state, and if one can successfully start the restored cluster after this hot-copy of the Broker log-files.

Totally agree with this point, and this is why I look for alternatives also.

Hmm, this is actually a really interesting approach. The complexity here would be probably to introduce the opposite process of restoration. Because, I think, it’s going to be even harder to do that, than to backup the data…

I put it here just as an option, because this is often a standard approach to backup stuff on unix systems. I agree that this will enable to backup only data of one Broker, but this process can be spread to other Brokers as well, or?

This is a good point. When I talked about “backing-up solution”, I meant something, with which you don’t have to think about the contents. You just backup and restore the data, and as a result, a consistent state of the cluster should be restored.

An example of what I mean would be rman utility, if we speak about Oracle Database, which can create backup sets, and also can restore and recover data after the moment when the backupset was created, using the archived and redo logs.

Yes @whatsupbros, I mean the rsync approach, although what back-up strategy is being used isn’t known to me. I would imagine it’s using Z-send or something similar.
It works in a simple sense, but it may not fit all use cases.

We could, for example, push data into our MinIO cluster and use that as a backup. This is early days for us at the moment.

1 Like

Okay, thanks for your input. Yes, same here - we are currently estimating all the stuff, it is already decided that we are giving it a try, but we are only in the very beginning of our journey.

That is why I have so many questions :sweat_smile:

I came across the below as an example of how to push data into MinIO:

1 Like

I wonder, would Kubernetes-native solutions like Velero work for clusters deployed to Kubernetes? Velero backs up the persistent volumes as well the etcd control plane database to allow solutions such as backup and restore or data migration.

Hi @whatsupbros ,

some time ago I’ve stumbled over

Never used it in production but a short test was promising.

HTH

2 Likes