LATEST ON keyword
For scenarios where multiple time series are stored in the same table, it is
relatively difficult to identify the latest items of these time series with
standard SQL syntax. QuestDB introduces LATEST ON clause for a
SELECT statement to remove boilerplate
clutter and splice the table with relative ease.
Syntax#
where:
columnNameused in theLATEST ONpart of the clause is aTIMESTAMPcolumn.columnNamelist used in thePARTITION BYpart of the clause is a list of columns of one of the following types:SYMBOL,STRING,BOOLEAN,SHORT,INT,LONG,LONG256,CHAR.
Description#
LATEST ON is used as part of a SELECT statement
for returning the most recent records per unique time series identified by the
PARTITION BY column values.
To illustrate how LATEST ON is intended to be used, consider the
trips table in the QuestDB demo instance. This
table has a payment_type column as SYMBOL type which specifies the method of
payment per trip. We can find the most recent trip for each unique method of
payment with the following query:
| payment_type | pickup_datetime | trip_distance |
|---|---|---|
| Dispute | 2014-12-31T23:55:27.000000Z | 1.2 |
| Voided | 2019-06-27T17:56:45.000000Z | 1.9 |
| Unknown | 2019-06-30T23:57:42.000000Z | 3.9 |
| No Charge | 2019-06-30T23:59:30.000000Z | 5.2 |
| Cash | 2019-06-30T23:59:54.000000Z | 2 |
| Card | 2019-06-30T23:59:56.000000Z | 1 |
The above query returns the latest value within each time series stored in the
table. Those time series are determined based on the values in the column(s)
specified in the PARTITION BY part of the LATEST ON clause. In our example
those time series are represented by different payment types. Then the column
used in the LATEST ON part of the clause stands for the designated timestamp
column for the table. This allows the database to find the latest value within
each time series.
The below sections will demonstrate other ways to use the LATEST ON clause.
You can also write this query with the old syntax:
The old LATEST BY syntax is considered deprecated. While it's still supported
by the database, you should use the new LATEST ON PARTITION BY syntax in your
applications. There are two key requirements when using the new syntax:
- A timestamp column must always be specified
LATEST ONhas to follow theWHEREclause. In the old syntax, it was vice versa.
note
To use LATEST ON, a timestamp column used in the LATEST ON part needs to be
specified as a designated timestamp. More information can be found in the
designated timestamp page for specifying
this at table creation or at query time.
Examples#
For the next examples, we can create a table called balances with the
following SQL:
This provides us with a table with the following content:
| cust_id | balance_ccy | balance | ts |
|---|---|---|---|
| 1 | USD | 600.5 | 2020-04-21T16:01:22.104234Z |
| 2 | USD | 950 | 2020-04-21T16:03:43.504432Z |
| 2 | EUR | 780.2 | 2020-04-21T16:08:34.404665Z |
| 1 | USD | 1500 | 2020-04-21T16:11:22.704665Z |
| 1 | EUR | 650.5 | 2020-04-22T16:11:32.904234Z |
| 2 | USD | 900.75 | 2020-04-22T16:12:43.504432Z |
| 2 | EUR | 880.2 | 2020-04-22T16:18:34.404665Z |
| 1 | USD | 330.5 | 2020-04-22T16:20:14.404997Z |
Single column#
When LATEST ON is provided a single column of the type SYMBOL, the query will
end after all distinct symbol values are found.
The query returns two rows with the most recent records per unique cust_id
value:
| cust_id | balance_ccy | balance | ts |
|---|---|---|---|
| 2 | EUR | 880.2 | 2020-04-22T16:18:34.404665Z |
| 1 | USD | 330.5 | 2020-04-22T16:20:14.404997Z |
Multiple columns#
When multiple columns are specified in LATEST ON queries, the returned results
are the most recent unique combinations of the column values. This example
query returns LATEST ON customer ID and balance currency:
The results return the most recent records for each unique combination of
cust_id and balance_ccy.
| cust_id | balance_ccy | balance | inactive | ts |
|---|---|---|---|---|
| 1 | EUR | 650.5 | FALSE | 2020-04-22T16:11:32.904234Z |
| 2 | USD | 900.75 | FALSE | 2020-04-22T16:12:43.504432Z |
| 2 | EUR | 880.2 | FALSE | 2020-04-22T16:18:34.404665Z |
| 1 | USD | 330.5 | FALSE | 2020-04-22T16:20:14.404997Z |
info
For a single SYMBOL column, QuestDB will know all distinct values upfront and
stop scanning table contents once the latest entry has been found for each
distinct symbol value. When LATEST ON is provided multiple columns, QuestDB
has to scan the entire table to find distinct combinations of column values.
Although scanning is fast, performance will degrade on hundreds of millions of
records. If there are multiple columns in the LATEST ON clause, this will
result in a full table scan.
LATEST ON over sub-query#
For this example, we can create another table called unordered_balances with
the following SQL:
Note that this table doesn't have a designated timestamp column and also
contains time series that are unordered by ts column.
Due to the absent designated timestamp column, we can't use LATEST ON directly
on this table, but it's possible to use LATEST ON over a sub-query:
Just like with the balances table, the query returns two rows with the most
recent records per unique cust_id value:
| cust_id | balance_ccy | balance | ts |
|---|---|---|---|
| 2 | EUR | 880.2 | 2020-04-22T16:18:34.404665Z |
| 1 | USD | 330.5 | 2020-04-22T16:20:14.404997Z |
Execution order#
The following queries illustrate how to change the execution order in a query by using brackets.
WHERE first#
This query executes WHERE before LATEST ON and returns the most recent
balance which is above 800. The execution order is as follows:
- filter out all balances below 800
- find the latest balance by
cust_id
| cust_id | balance_ccy | balance | ts |
|---|---|---|---|
| 1 | USD | 1500 | 2020-04-22T16:11:22.704665Z |
| 2 | EUR | 880.2 | 2020-04-22T16:18:34.404665Z |
LATEST ON first#
This query executes LATEST ON before WHERE and returns the most recent
records, then filters out those below 800. The steps are:
- Find the latest balances by customer ID.
- Filter out balances below 800. Since the latest balance for customer 1 is equal to 330.5, it is filtered out in this step.
| cust_id | balance_ccy | balance | inactive | ts |
|---|---|---|---|---|
| 2 | EUR | 880.2 | FALSE | 2020-04-22T16:18:34.404665Z |
Combination#
It's possible to combine a time-based filter with the balance filter from the
previous example to query the latest values for the 2020-04-21 date and filter
out those below 800.
Since QuestDB allows you to omit the SELECT * FROM part of the query,
we omitted it to keep the query compact.
Such a combination is very powerful since it allows you to find the latest values for a time-slice of the data and then apply a filter to them in a single query.