JdbcSourceConnector query definition and timestamp columns

Hi!

We are having some issues with performance in the query, and i wanted to tune it. We are using mode “timestamp”. So reading through the documentation it states:

Different query modes may still be used for incremental updates, but in order to properly construct the incremental query, it must be possible to append a WHERE clause to this query (i.e. no WHERE clauses may be used). If you use a WHERE clause, it must handle incremental queries itself.

I really don’t get what does it exactly mean with “If you use a WHERE clause, it must handle incremental queries itself” so shall i write the full query? Is there some placeholder i shall use for timestamp?

Just for sake of sample (our query is way more complex), if I would like so set a query like:

select * from myTable where date_1 between %1 and %2 or date_2 between %1 and %2

Would something like that be a possibility? So i would like the connector to just provide the timestamps, nothing else. Is something like that supported?

IMHO that would also help avoiding things like: select * from ( another query ) where COALESCE(field1, field2)

Maybe i’m missing something?

Hi @polaco,

You are right that the docs are currently not totally clear on this, and we have a ticket underway to try and improve them. You will probably find this issue useful in exploring how to use the WHERE clause.

Hi @rmoff ,
You mean attempting to do that pseudo sql injection by using “–” at the end of the query?
We are already using a nested query as mentioned in that thread. Am I missing something?
IMO that thread doesn’t still clarify for me the usage of WHERE clause, using “–” sounds more like a hack, I could give it a try, but before that I will ask a refactor on the tables the connector is consuming, so we can avoid thing like COALESCE and simplify the query in order to not need nested queries at all.
Anyway I think it would be great if connectors would allow WHERE clause and placeholders for the params. I have been reading through the connectors code, but so far haven’t realized if there is a better alternative already supported.
Thanks for your help!

My understanding of it is that as a general pattern you write a query to be used in incremental/timestamp mode like this:

SELECT TIMESTAMP_COL, COL1, COL2 FROM 
    (SELECT TIMESTAMP_COL, 
            A.COL1 AS COL1, 
            B.COL2 AS COL2
       FROM TABLE_A 
            INNER JOIN TABLE_B 
            ON PK=FK 
      WHERE A.COL1='FOO');

This exposes the TIMESTAMP_COL that you can then use in your config, thus

query=SELECT TIMESTAMP_COL, COL1, COL2 FROM (SELECT TIMESTAMP_COL, A.COL1 AS COL1, B.COL2 AS COL2 FROM TABLE_A INNER JOIN TABLE_B ON PK=FK WHERE A.COL1='FOO')
mode=timestamp
timestamp.column.name=TIMESTAMP_COL

In your example I don’t think you can have an upper-bound applied by the connector to a date predicate, only lower. So you could do something like this with a hardcoded upper bound (per your BETWEEN example):

query=select * from (select date1,* from myTable where date_1 < "2021-03-25 13:50:47")
mode=timestamp
timestamp.column.name=date1

I’d expect the connector to then execute the query as

select * from (select date1,* from myTable where date_1 < "2021-03-25 13:50:47") 
  where date_1 > {last_poll_timestamp}

@rmoff thanks, yes basically that’s what we are currently doing.
thanks for the clarification and help!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.