Hi, I'm Mike and I have a opening question

… I work for myself publishing websites, currently UK train times mostly , eg., traintime.uk, and have longer term plans not just for rail but public transport systems in general. Key to my plans are the ever increasing availability of open data and the efficient processing thereof. To this purpose I guess Kafka could be a useful tool towards that goal.

I do have an opening question and it is this; would Kafka be a useful tool towards that goal?

I currently develop on Windows and host on Centos 7 with Apache HTTP server. I’m currently transitioning from predominantly PHP / MySQL to Java / Postgres and have also decided on using Wildfly as my first choice application server, and Hibernate for ORM. I’ll be ready for my first live project on my new platforms in a few months and the main first task will be to populate the database with feeds of UK rail data (a couple of open data streaming feeds provided by National Rail and Network Rail). Having been kindly pointed to this forum by the host organiser of a recent meetup event following a great talk on the subject by Robin Moffat I’m hoping for at least one person to know what I’m talking about :slight_smile: I’m looking for the easiest, simplest, most efficient, etc., (best!) way to get that data onto my database. Is Kafka the tool for the job?

Mike

Hi Mike,
Welcome to the forum, and thanks for your question - I know we spoke briefly during my talk recently.

looking for the easiest, simplest, most efficient, etc., (best!) way to get that data onto my database.
Is Kafka the tool for the job?

The solution (ActiveMQ → Kafka → Database) that I outlined in my talk is a valid one, and has benefits including:

  • Scale
  • Resilience
  • Replay of messages
  • Consume the data elsewhere without building dependencies on the database

But it’s just one way of doing it. If you only want the data in a database, then you could argue that Kafka introduces unnecessary complexity. Some things in IT systems are right/wrong decisions - I would not frame this as one of them. It’s just different ways to solve the same problem, and you’d choose one approach over the other based on your overall requirements and direction.

Hope that helps!
thanks, Robin.

Hi Robin,

different ways to solve the same problem

And sometimes too many; which way to go?! I think Kafka could be a good way to go for my requirements though. Like I said, I was looking at an ActiveMQ bridge solution, but that’s maybe more trouble than it’s worth, and there are such things as event driven beans I’m also considering (actually, I think this will be the straight-forward solution), but for added-value, Kafka, as I’m learning is also not only event driven but, more than that, it’s raison d’etre, I think I’m right in saying, is event streaming, capturing data in real-time, exactly what I want. For other projects of mine too. The key for me will be, within a Java application server environment, whether it’s then easy or not to send on to the DB. I do always want to avoid unnecessarily adding extra layers unless for good reason but I’m guessing Kafka can add value and is certainly worth trying out.

BTW, I thought the guy with the question/statement about just sending raw data to the DB and then filtering as being the ‘modern style’ was simply wrong. As they say though there are indeed many ways to skin a cat!?

Cheers,
Mike