I am working on a system that will load files from various sources (S3, Azure Blob, SFTP, etc) into Kafka topics, and reviewing Kafka Connect as the underlying framework.
The number of sources and their configurations is dynamic and will change at runtime - new configuration will be added, old ones removed.
Each configuration will have separate polling intervals, number of tasks, target topic, etc.
The number of active configurations at any one times is at-least 10,000.
Here is my question: assuming I have 10,000 connectors, each running 1 task, will there be a total of 10,000 threads running across all JVMs in my distributed connect worker pool? Does connect create a dedicated thread per task, or does each worker have a shared thread pool?
My concern here is that of resource usage - a static assignment of a thread per task is wasteful if the polling interval is large (which is likely).