I have two data sets of events that happen throughout the day. One is a database that is updated in real time. The other is daily dump files that are uploaded once a day to s3 containing all the events for the previous day. Currently I have a cron job that runs once a day to transform then merge the datasets in sorted order based on a timestamp then finally process the data and write to a couple of places.
I am looking at Kafka and with connect for this, but not sure if it fits. I understand how to connect the data, but unsure how to merge the two datasets based on timestamps as they are populated at different times (one daily and one real time). Also because of the s3 daily dump, I am not sure if this is a good use case for Kafka. Any advice?
Thanks in advance.