oracle aide

September 1, 2011

Made simpler: a select from a table with a multi-column composite primary key

Filed under: sql — Tags: — oracleaide @ 4:46 pm

Let’s say we have a table full of smurfs with a 3 – column primary key

create table smurfs (smurf_id int, action_date date, action_sequence int, action varchar2(30));
create index pk_smurfs on smurfs (smurf_id, action_date, action_sequence);
alter table smurfs add constraint pk_smurfs primary key (smurf_id, action_date, action_sequence);

Sample data:


SMURF_ID ACTION_DATE ACTION_SEQUENCE ACTION
1 1/6/2009 5 smurf
1 1/6/2009 11 walk
1 1/6/2009 27 run
1 1/6/2009 30 run
1 1/6/2009 91 skip
1 2/15/2009 25 smurf
1 2/15/2009 53 walk
...
1 5/30/2014 98 run
-------------------------------
2 7/24/2009 14 smurf
2 7/24/2009 20 skip

What is the most economic way to find the latest action_date and action_id for each smurf?
Probably – a single pass select.

Let us pretend that we have not 3, but only 2 key columns!


select smurf_id, to_date(substr(date_seq, 1, 8), 'yyyymmdd') as action_date,
to_number(substr(date_seq, 10), '0999999999') as action_sequence
from (
select smurf_id, max(to_char(action_date, 'yyyymmdd') || to_char(action_sequence, '0999999999')) as date_seq
from smurfs group by smurf_id
)
order by 1


Plan
SELECT STATEMENT ALL_ROWSCost: 61 Bytes: 59,865 Cardinality: 3,991
2 SORT GROUP BY NOSORT Cost: 61 Bytes: 59,865 Cardinality: 3,991
1 INDEX FULL SCAN INDEX PK_SMURFS Cost: 61 Bytes: 2,949,945 Cardinality: 196,663

Sample data:

SMURF_ID ACTION_DATE ACTION_SEQUENCE
1 5/30/2014 98
2 4/5/2018 75
3 10/23/2016 94
4 10/15/2015 93
5 8/21/2014 67
6 2/8/2016 95

The trick is to represent date and sequence id as a sorted list of numbers:

SMURF_ID MAX(TO_CHAR(ACTION_DATE,'YYYYMMDD')||TO_CHAR(ACTION_SEQUENCE,'0999999999'))
1 20140530 0000000098
2 20180405 0000000075
3 20161023 0000000094
4 20151015 0000000093
5 20140821 0000000067

So – just one full index scan and cost (in my case – 61) – is a half of any solution with a self-join.
Because – a self join would do two full index scans.
For example:

select a.smurf_id, action_date, max(action_sequence)
from smurfs a
join (select smurf_id, max(action_date) max_date from smurfs group by smurf_id) b on a.smurf_id = b.smurf_id and a.action_date = b.max_date
group by a.smurf_id, a.action_date
order by 1,2;


Plan
SELECT STATEMENT ALL_ROWSCost: 122 Bytes: 79,044 Cardinality: 2,823
7 SORT GROUP BY NOSORT Cost: 122 Bytes: 79,044 Cardinality: 2,823
6 MERGE JOIN Cost: 122 Bytes: 111,748 Cardinality: 3,991
1 INDEX FULL SCAN INDEX PK_SMURFS Cost: 61 Bytes: 2,949,945 Cardinality: 196,663
5 SORT JOIN Cost: 62 Bytes: 51,883 Cardinality: 3,991
4 VIEW Cost: 61 Bytes: 51,883 Cardinality: 3,991
3 HASH GROUP BY Cost: 61 Bytes: 47,892 Cardinality: 3,991
2 INDEX FULL SCAN INDEX PK_SMURFS Cost: 61 Bytes: 2,359,956 Cardinality: 196,663

Sample data:

SMURF_ID ACTION_DATE MAX(ACTION_SEQUENCE)
1 5/30/2014 98
2 4/5/2018 75
3 10/23/2016 94
4 10/15/2015 93

Moral of the story is obvious: less key columns make grouping easier.

Advertisements

2 Comments »

  1. Andrey,
    While the trick with a merged column values is a very nice one, I have few concerns –
    1) Performance: Usage of aggregated function on calculated value shifts performance accent from table/index scanning to data calculation (in this example data conversion+concatenation+substrs+again data conversion). In most cases the cost of calculation is still less than slef-join, but not always.
    2) Generalization: the solution doesn’t require action_date and action_sequence to be a part of PK. Being part of PK they allow the query execute only PK index scan. Otherwise it still works, just involve a table scan.
    3) Limitation: Unfortunately, the solution is limited to a very special case when desired result includes only columns we group by. If extra column values are required (i.e. “action”), we still have to use self-join.
    With consideration of 2) and 3) the “moral of the story” is questionable.

    Comment by Alexander Zemerov — October 10, 2011 @ 2:16 pm

  2. Alex thanks for the comment. I thought I am in a “write only” mode.

    I still insist that “

      less columns make grouping easier

    ” – no less, no more.

    The trick with concatenating key columns allows to reduce grouping from N columns to a 1 column group by.
    An example with 6-column key gets hairy pretty quickly:

    select smurf_id, action_date, action_sequence1, action_sequence2, action_sequence3, max(action_sequence4)
    from smurfs where (smurf_id, action_date, action_sequence1, action_sequence2, action_sequence3) in (
    select smurf_id, action_date, action_sequence1, action_sequence2, max(action_sequence3)
    from smurfs where (smurf_id, action_date, action_sequence1, action_sequence2) in (
    select smurf_id, action_date, action_sequence1, max(action_sequence2)
    from smurfs where (smurf_id, action_date, action_sequence1) in (
    select smurf_id, action_date, max(action_sequence1)
    from smurfs where (smurf_id, action_date) in (
    select smurf_id, max(action_date) max_date from smurfs group by smurf_id
    )
    group by smurf_id, action_date
    )
    group by smurf_id, action_date, action_sequence1
    )
    group by smurf_id, action_date, action_sequence1, action_sequence2
    )
    group by smurf_id, action_date, action_sequence1, action_sequence2, action_sequence3
    order by 1, 2, 3, 4, 5

    SMURF_ID ACTION_DATE ACTION_SEQUENCE1 ACTION_SEQUENCE2 ACTION_SEQUENCE3 MAX(ACTION_SEQUENCE4)

    1 1/7/2017 97 197 297 397
    2 12/19/2016 53 153 253 353
    3 4/22/2016 41 141 241 341
    4 6/16/2016 87 187 287 387
    5 9/29/2015 82 182 282 382
    6 2/22/2018 72 172 272 372

    Plan
    SELECT STATEMENT ALL_ROWSCost: 875 Bytes: 1,239 Cardinality: 21
    19 SORT GROUP BY Cost: 875 Bytes: 1,239 Cardinality: 21
    18 NESTED LOOPS Cost: 874 Bytes: 1,239 Cardinality: 21
    16 VIEW VIEW SYS.VW_NSO_4 Cost: 852 Bytes: 672 Cardinality: 21
    15 HASH GROUP BY Cost: 852 Bytes: 1,071 Cardinality: 21
    14 NESTED LOOPS Cost: 851 Bytes: 1,071 Cardinality: 21
    12 VIEW VIEW SYS.VW_NSO_3 Cost: 829 Bytes: 588 Cardinality: 21
    11 HASH GROUP BY Cost: 829 Bytes: 588 Cardinality: 21
    10 VIEW SYS. Cost: 829 Bytes: 1,148 Cardinality: 41
    9 FILTER
    8 HASH GROUP BY Cost: 829 Bytes: 2,419 Cardinality: 41
    7 HASH JOIN Cost: 828 Bytes: 242,844 Cardinality: 4,116
    5 HASH JOIN Cost: 554 Bytes: 115,248 Cardinality: 4,116
    3 VIEW VIEW SYS.VW_NSO_1 Cost: 279 Bytes: 53,508 Cardinality: 4,116
    2 HASH GROUP BY Cost: 279 Bytes: 49,392 Cardinality: 4,116
    1 TABLE ACCESS FULL TABLE ABATISHC.SMURFS Cost: 273 Bytes: 2,386,188 Cardinality: 198,849
    4 TABLE ACCESS FULL TABLE ABATISHC.SMURFS Cost: 273 Bytes: 2,982,735 Cardinality: 198,849
    6 TABLE ACCESS FULL TABLE ABATISHC.SMURFS Cost: 273 Bytes: 6,164,319 Cardinality: 198,849
    13 INDEX RANGE SCAN INDEX ABATISHC.PK_SMURFS Cost: 2 Bytes: 23 Cardinality: 1
    17 INDEX RANGE SCAN INDEX ABATISHC.PK_SMURFS Cost: 2 Bytes: 27 Cardinality: 1

    Where a concatenated query looks still better:

    select smurf_id, to_date(substr(date_seq, 1, 8), ‘yyyymmdd’) as action_date,
    date_seq as action_sequence
    from (
    select smurf_id, max(to_char(action_date, ‘yyyymmdd’) ||
    to_char(action_sequence1, ‘FM0999999999’) ||
    to_char(action_sequence2, ‘FM0999999999’) ||
    to_char(action_sequence3, ‘FM0999999999’) ||
    to_char(action_sequence4, ‘FM0999999999’)) as date_seq
    from smurfs group by smurf_id
    )
    order by 1

    Plan
    SELECT STATEMENT ALL_ROWSCost: 280 Bytes: 111,132 Cardinality: 4,116
    2 SORT GROUP BY Cost: 280 Bytes: 111,132 Cardinality: 4,116
    1 TABLE ACCESS FULL TABLE ABATISHC.SMURFS Cost: 274 Bytes: 5,368,923 Cardinality: 198,849

    Sample data:

    SMURF_ID ACTION_DATE ACTION_SEQUENCE

    1 1/7/2017 201701070000000097000000019700000002970000000397
    2 12/19/2016 201612190000000053000000015300000002530000000353
    3 4/22/2016 201604220000000041000000014100000002410000000341
    4 6/16/2016 201606160000000087000000018700000002870000000387
    5 9/29/2015 201509290000000082000000018200000002820000000382
    6 2/22/2018 201802220000000072000000017200000002720000000372

    Of course, the same could be done with analytics.
    Something like

    select smurf_id, max(action_date), max(action_sequence1), max(action_sequence2), max(action_sequence3), max(action_sequence4)
    from (
    select smurf_id,
    first_value(action_date) over (partition by smurf_id) as action_date,
    first_value(action_sequence1) over (partition by smurf_id) as action_sequence1,
    first_value(action_sequence2) over (partition by smurf_id) as action_sequence2,
    first_value(action_sequence3) over (partition by smurf_id) as action_sequence3,
    first_value(action_sequence4) over (partition by smurf_id) as action_sequence4
    from ( select smurf_id, action_date, action_sequence1, action_sequence2, action_sequence3, action_sequence4 from smurfs order by 1, 2 desc, 3 desc, 4 desc, 5 desc)
    )
    group by smurf_id
    order by 1

    SMURF_ID MAX(ACTION_DATE) MAX(ACTION_SEQUENCE1) MAX(ACTION_SEQUENCE2) MAX(ACTION_SEQUENCE3) MAX(ACTION_SEQUENCE4)

    1 1/7/2017 97 197 297 397
    2 12/19/2016 53 153 253 353
    3 4/22/2016 41 141 241 341
    4 6/16/2016 87 187 287 387
    5 9/29/2015 82 182 282 382
    6 2/22/2018 72 172 272 372

    Plan
    SELECT STATEMENT ALL_ROWSCost: 1,784 Bytes: 267,540 Cardinality: 4,116
    6 SORT GROUP BY NOSORT Cost: 1,784 Bytes: 267,540 Cardinality: 4,116
    5 VIEW ABATISHC. Cost: 1,784 Bytes: 12,925,185 Cardinality: 198,849
    4 WINDOW BUFFER Cost: 1,784 Bytes: 14,714,826 Cardinality: 198,849
    3 VIEW ABATISHC. Cost: 1,784 Bytes: 14,714,826 Cardinality: 198,849
    2 SORT ORDER BY Cost: 1,784 Bytes: 5,368,923 Cardinality: 198,849
    1 TABLE ACCESS FULL TABLE ABATISHC.SMURFS Cost: 274 Bytes: 5,368,923 Cardinality: 198,849

    Still costlier though.

    The easiest grouping, of course, is with 0 columns 🙂

    Comment by oracleaide — October 10, 2011 @ 10:35 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: