Greetings,
I am relatively new to both Kafka and Java, I am reading Mastering Kafka Streams and ksqlDB provided by Confluent as a starting point.
In the crypto sentiment app developed throughout chapter 3, one of the steps in the stream processing topology uses KStream.flatMapValues to perform sentiment analysis on the input records, like so in my implementation of the book’s project:
// Merge the English and Translated streams
KStream<byte[], Tweet> merged = ...
// Create a new Enriched stream of sentiment analyzed tweets (EntitySentiment records)
KStream<byte[], EntitySentiment> enriched = merged.flatMapValues(
(tweet) -> {
// Get the sentiment analyzed records
return sentimentClient.getEntitiesSentiment(tweet);
}
);
Both in my implementation and in the sample project from the book a new instance of sentimentClient
is created in the Topology’s build
method (in the book it’s called languageClient
and also provides translation). The GcpClient
implementation of this language/sentiment client from the book uses a ThreadLocal variable of type LanguageServiceClient
(from Google’s NLP client library) which is used to perform sentiment analysis within the getEntitiesSentiment
method called in flatMapValues
(above).
My question is, why does the LanguageServiceClient
variable (named nlpClients
in the book’s implementation) have to be thread-local? I understand there may be multiple instances of the Topology running in parallel, but since a new sentiment/language client is instantiated in the Topology’s build method, isn’t there a separate client in each thread already? Does the variable actually have to be thread-local?