Deserialisation and Chunk error - Exhausted standby hosts to try -

Hi folks,

We’ve got a very small and quiet development 3x server Zookeeper/Kafka/ksqlDB configuration, and in attempting to perform a pull query against a materialised table we’re running into an intermittent “Exhausted standby” error - either through the ksql CLI or direct to the REST interface with the same query (nothing complicated, simply ‘select * from table;’).

When I say intermittent, I mean that if I run the query against the node that doesn’t have the activePartition for the data contained within the response, the log file of that local node gets a series of errors (Multiple “Error while handling chunk”; “com.fasterxml.jackson.databind.JsonMappingException: Unexpected end-of-input” and ultimately a “WARN Error forwarding query to node” and “KsqlRestClientException: Error issuing POST to KSQL server. path:/query”). Stack traces are below.

Now, I can mitigate this issue by shutting down two of the three nodes, and also (to an extent at least) set ksql.lag.reporting.enable to true, which means I no longer receive the error in the client… but the errors are still happening in the log file, and I’m concerned that it may eventually lead to sufficiently out of date information being returned.

I checked with a packet capture and the full payload is delivered and received OK, so there’s no underlying network issue. Looking at the logs (see below), I think what’s happening is that something is going wrong when chunking the data from the remote node. It’s worth noting that the data in this instance isn’t huge… each row is a maximum of about 100 characters, and there’s only 23 rows in the data set, yet something is snipping off the returned JSON object some way through.

The following (related, I think) properties are set:

  ksql.streams.num.stream.threads: 16
  ksql.query.pull.table.scan.enabled: true
  ksql.query.pull.enable.standby.reads: true
  ksql.lag.reporting.enable: true
  ksql.streams.num.standby.replicas: 3
  ksql.heartbeat.enable: true

Though it’s worth noting that we’ve tried it with replicas set to 2, heartbeat disabled and enabled, num.stream.threads set to all kinds of things, and it’s not changed the outcome. The only thing that did change what happened was the lag.reporting.enable, which simply hides the error from the CLI while leaving issues in the logs.

Can anyone think of what’s going wrong here or point me in the right direction? We’ve tried stripping everything back as simply as possible, but so far to no avail.

Many thanks in advance,

Russ

Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: [2021-11-17 16:39:55,979] ERROR Error while handling chunk (io.confluent.ksql.rest.client.KsqlTarget)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: io.confluent.ksql.rest.client.KsqlRestClientException: Failed to deserialise object
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlClientUtil.deserialize(KsqlClientUtil.java:53)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.toRows(KsqlTarget.java:454)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.lambda$postQueryRequest$3(KsqlTarget.java:197)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.lambda$null$11(KsqlTarget.java:310)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.HttpClientResponseImpl.handleChunk(HttpClientResponseImpl.java:232)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.lambda$beginResponse$0(Http1xClientConnection.java:486)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:237)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:127)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleChunk(Http1xClientConnection.java:322)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.access$1900(Http1xClientConnection.java:242)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection.handleResponseChunk(Http1xClientConnection.java:631)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection.handleHttpMessage(Http1xClientConnection.java:601)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection.handleMessage(Http1xClientConnection.java:577)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.impl.EventLoopContext.execute(EventLoopContext.java:43)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:229)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:164)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.lang.Thread.run(Thread.java:834)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'STRING': was expecting (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at [Source: (byte[])"STRING, `cmdbUri` STRING, `action` STRING"}}"; line: 1, column: 8]
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2337)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:720)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidToken(UTF8StreamJsonParser.java:3593)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._handleUnexpectedValue(UTF8StreamJsonParser.java:2688)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:870)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:762)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4684)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4586)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3609)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlClientUtil.deserialize(KsqlClientUtil.java:51)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: ... 39 more
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: [2021-11-17 16:39:55,980] WARN Error forwarding query to node http://ksqldb-node2:8088/. Falling back to standby state which may return stale results (io.confluent.ksql.physical.pull.HARouting)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: io.confluent.ksql.rest.client.KsqlRestClientException: Error issuing POST to KSQL server. path:/query
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.executeSync(KsqlTarget.java:388)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.executeRequestSync(KsqlTarget.java:305)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.post(KsqlTarget.java:271)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.postQueryRequest(KsqlTarget.java:192)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.server.services.DefaultKsqlClient.makeQueryRequest(DefaultKsqlClient.java:129)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.physical.pull.HARouting.forwardTo(HARouting.java:349)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.physical.pull.HARouting.executeOrRouteQuery(HARouting.java:308)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.physical.pull.HARouting.lambda$executeRounds$5(HARouting.java:191)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.lang.Thread.run(Thread.java:834)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: Caused by: java.util.concurrent.ExecutionException: io.confluent.ksql.rest.client.KsqlRestClientException: Failed to deserialise object
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.executeSync(KsqlTarget.java:386)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: ... 11 more
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: Caused by: io.confluent.ksql.rest.client.KsqlRestClientException: Failed to deserialise object
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlClientUtil.deserialize(KsqlClientUtil.java:53)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.toRows(KsqlTarget.java:454)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.lambda$postQueryRequest$3(KsqlTarget.java:197)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.confluent.ksql.rest.client.KsqlTarget.lambda$null$11(KsqlTarget.java:310)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.HttpClientResponseImpl.handleChunk(HttpClientResponseImpl.java:232)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.lambda$beginResponse$0(Http1xClientConnection.java:486)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:237)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:127)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleChunk(Http1xClientConnection.java:322)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.access$1900(Http1xClientConnection.java:242)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection.handleResponseChunk(Http1xClientConnection.java:631)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection.handleHttpMessage(Http1xClientConnection.java:601)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.http.impl.Http1xClientConnection.handleMessage(Http1xClientConnection.java:577)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.impl.ContextImpl.executeTask(ContextImpl.java:366)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.impl.EventLoopContext.execute(EventLoopContext.java:43)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.impl.ContextImpl.executeFromIO(ContextImpl.java:229)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:164)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
Nov 17 16:39:56 ksqldb-node1 ksql-server-start[29885]: ... 1 more