CTAS with group by not producing data as expected

Hi all

Strange thing

I generate messages with the following structure

{
  "INVNUMBER": "7f1108a1-6d25-4776-a815-0d7d7e332e14",
  "SALEDATETIME": "2024-07-02T16:25:25.269+02:00",
  "SALETIMESTAMP": "1719930325269",
  "TERMINALPOINT": "13",
  "NETT": 1721.94,
  "VAT": 241.07,
  "TOTAL": 1963.01,
  "STORE": {
    "ID": "324213441",
    "NAME": "Milnerton"
  },
  "CLERK": {
    "ID": "10014",
    "NAME": "Winston"
  },
  "BASKETITEMS": [
    {
      "ID": "000000014",
      "NAME": "Coca-Cola Soft Drink 1.5L",
      "BRAND": "Coka Coke",
      "CATEGORY": "Beverage",
      "PRICE": 18.49,
      "QUANTITY": 2
    },
    {
      "ID": "000000038",
      "NAME": "PnP Full Cream Fresh Milk 2L",
      "BRAND": "PnP",
      "CATEGORY": "Food Cupboard",
      "PRICE": 34.99,
      "QUANTITY": 3
    }
  ],
  "FINTRANSACTIONID": "577d5cd8-9536-4989-88b0-89c41aa45998",
  "PAYDATETIME": "2024-07-02T16:25:43.269+02:00",
  "PAYTIMESTAMP": "1719930343269",
  "PAID": 1963.01
}

I then executed the below CTAS, to simplify the data/testing I’ve added the where clause

CREATE TABLE avro_sales_per_terminal_point WITH (KAFKA_TOPIC='avro_sales_per_terminal_point',
       FORMAT='AVRO',
       PARTITIONS=1)
       as  
SELECT 
	store->id as store_id,
	TerminalPoint as terminal_point,
    count(1) as sales_per_terminal
FROM pb_salescompleted
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE store->Id = '324213441'
group by store->id , TerminalPoint	
EMIT CHANGES;

Expecting to get each storeid/terminalpoint to only appear once per minute with a total count for that minute (window). (with the where added expect to only have one record per minute per terminalpoint)
I’m getting the store_id/terminapoint appearing more than once per minute.
I would expect a record to be added to the table at the end of the window periods…
what i seem to be seeing is a new record on every increment per terminalpoint

where is the gap in my understanding…

G

Try using the EMIT FINAL output refinement. EMIT CHANGES will output incremental / intermediate results while the window is open.

1 Like

Looking better…
thanks.
G

Question

The windowstart and windowend times, I don’t see it as part of the ktable, how can i can it to a date/time. tried a couple of methods,

1
SELECT FROM_UNIXTIME(windowstart,'%Y-%m-%d %H:%i:%s') from avro_sales_per_terminal_point;

2
SELECT to_char(to_timestamp(windowstart), 'YYYY-MM-DD HH24:MI:SS') as date FROM avro_sales_per_terminal_point;

I clearly assumed windowstart is a epoc value, apparently not :wink:

Inspection of ktable

ksql> describe avro_sales_per_terminal_point;

Name                 : AVRO_SALES_PER_TERMINAL_POINT
 Field              | Type
-----------------------------------------------------------------------------
 STORE_ID           | VARCHAR(STRING)  (primary key) (Window type: TUMBLING)
 TERMINAL_POINT     | VARCHAR(STRING)  (primary key) (Window type: TUMBLING)
 SALES_PER_TERMINAL | BIGINT
-----------------------------------------------------------------------------
For runtime statistics and query details run: DESCRIBE <Stream,Table> EXTENDED;
ksql>

Might be answering own question, or not… looking at above it includes windowstart and windowend which is added by the ktable create. Why would i group by again if the CTAS added them… as this does not look like it is selecting from source stream/topic.
It does show how to convert the windowstart… but causes more questions… :wink:

SELECT order_zipcode, 
  from_unixtime(WINDOWSTART) as Window_Start,
  from_unixtime(WINDOWEND) as Window_End,
  from_unixtime(max(ROWTIME)) as Window_Emit,
  count(orderId) as number_of_orders
FROM orders
  WINDOW TUMBLING (SIZE 5 minute, GRACE PERIOD 1 minute)
  GROUP BY order_zipcode
  EMIT CHANGES;

SOLVED… and checked output… working as desired.

Another step in right direction :slight_smile:
G

next… how to do the count, group by items in basket… per item type/make/brand…

G

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.