Brief background - We are planning to use Kafka as the core component for building a data distribution exchange, through which domain data from one division of the org will be shared with other divisions internally. The solution will capture and distribute a stream of database updates (using log CDC). The authoring databases will not go away / be replaced, they will stay as is because all our content ingestion applications will first write to databases, as they always have.
We want to retain all the data in Kafka topics and enable logs compaction. Data size will be approx 1TB to begin with and growing at double digit GBs per month.
Is it an anti-pattern / simply inadvisable to have eternal data retention in Kafka? If not and if we choose to do it, what are some of the pitfalls in design and workings that one needs to be aware of?
Do ack that this might be bit of a loaded question and I have thus added some contextual background. If more info needs to be provided, please do convey.
Regarding Kafka Tiered Storage (KIP-405): Will this get to the point that we can use an S3 backed topic and be able to read that topic from the beginning as it were stored in Kafka? When and with which version can this be expected to be available?
Extension to the above point, we found that tiered storage cannot be enabled for compacted topics. I can probably understand as to why it can’t be but if possible, can someone highlight if it can be expected with some future version?
Thanks & regards,