My Works: Indexing Strategy and Column Order

We discussed a neat scenario where the skip scan be used because of the data distribution in the columns concerned.Let us now try with a different Distribution to create 256 distinct values for the first column and expanding the on the explanation we did on the closing of the previous post.

begin
for i in 1 .. 30000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/

begin
for i in 30001 .. 60000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/

begin
for i in 60001 .. 90000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/

begin
for i in 90001 .. 100000
loop
insert into SKIP values(chr(mod(i,256)), i);
end loop;
commit;
end;
/

TEST>select * from SKIP where b=60002;

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=1 Bytes=6)

1 0
TABLE ACCESS (FULL) OF 'SKIP' (Cost=18 Card=1 Bytes=6)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
172 consistent gets
0 physical reads
0 redo size
233 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

This went for a full scan as opposed to a skip because it would have to manage and play with the 256 sub indexes if a skip scan was considered.

Let us now check the consistent gets if it goes for a skip scan

TEST>select /*+INDEX_SS(SKIP IDX_SKIP) */ * from SKIP where b=60002
;

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=257 Card=1 Bytes=6)

1 0
INDEX (SKIP SCAN) OF 'IDX_SKIP' (NON-UNIQUE) (Cost=257 Card=1 Bytes=6)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
268 consistent gets
0 physical reads
0 redo size
234 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

We can see that COST as well as the Consistent gets is on the higher side.

Given the above data distribution and it would certainly benifit in going for an additional index on B.This demonstrates what we discussed in the previous post.

TEST>create index idx_new on skip(b);

Index created.

TEST>exec dbms_stats.gather_table_stats('SKIP_TEST','SKIP',CASCADE=
>TRUE);

PL/SQL procedure successfully completed.

TEST>SET AUTOTRACE TRACEONLY EXPLAIN statistics

TEST>select * from SKIP where b=60001;

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)

1 0
TABLE ACCESS (BY INDEX ROWID) OF 'SKIP' (Cost=2 Card=1 Bytes=6)

2 1
INDEX (RANGE SCAN) OF 'IDX_NEW' (NON-UNIQUE) (Cost=1 Card=1)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
233 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Now what if we change the index order

TEST>drop index IDX_SKIP;

Index dropped.

TEST>drop index IDX_NEW;

Index dropped.

TEST>CREATE INDEX IDX_REV ON SKIP (B,A);

Index created.

TEST>exec dbms_stats.gather_table_stats('SKIP_TEST','SKIP',CASCADE=
>TRUE);

PL/SQL procedure successfully completed.

TEST>SET AUTOTRACE TRACEONLY EXPLAIN statistics
TEST>select * from SKIP where b=60001;

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=6)

1 0
INDEX (RANGE SCAN) OF 'IDX_REV' (NON-UNIQUE) (Cost=2 Card=1 Bytes=6)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
233 bytes sent via SQL*Net to client
276 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

TEST>select b from skip where a='z' ;

391 rows selected.

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=18 Card=391 Bytes=2346)

1 0
TABLE ACCESS (FULL) OF 'SKIP' (Cost=18 Card=391 Bytes=2346)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
198 consistent gets
0 physical reads
0 redo size
3836 bytes sent via SQL*Net to client
458 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
391 rows processed

TEST>select /*+ index_ffs(SKIP IDX_REV)*/ b from skip where a='z' ;

391 rows selected.

Execution Plan
----------------------------------------------------------
0
SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=391 Bytes=2346)

1 0
INDEX (FAST FULL SCAN) OF 'IDX_REV' (NON-UNIQUE) (Cost=26 Card=391 Bytes=2346)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
284 consistent gets
0 physical reads
0 redo size
3837 bytes sent via SQL*Net to client
458 bytes received via SQL*Net from client
28 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
391 rows processed

This shows that the column order of the index has defenitly an impact on the query.To be certain about what sort of indexing strategy we need, we would need to revisit the queries executed, number of executions and their predicate values.
If the above query (where b=60001) is executed very less frequently , we dont need to realy care about the additional index or about reversing the order.But if the query pattern was (where b=:2 and a=:2 , where b=:2 , where a=:1) we would need the index in the order (B,A).

I am closing the discussion with an excellent explanation from tom kyte for a typical scenario which was discussed in ask tom.

For a query that references all of A, B, C the above three indexes will perform more or less the same -- regardless of the selectivity of A, B, or C.

Does it mean than queries:

a) select /*+ index (x index_abc) */ * from x
where a = 1
and c between 1 and 7
and b > 2

b) select /*+ index (x index_cba) */ * from x
where a = 1
and c between 1 and 7
and b > 2

will have the same performance regardless of the selictivity of A, B and C fields?

-----------------------------------------------------------------------------------
It'll depend -- but in general, for most cases, they would be the same.

In your case -- if the following was true:

there is
one row for b such that b > 2 (and hence only one index entry)
and c was equal to 2 for that row
and a = 1

and at the same time there is
1,000,000 rows such that a=1
and c was equal to 2 for all of those rows
and one of those rows had b > 2

an index on B,C,A or B,A,C would probably be better then one on A,C,B as the index on B,C,A would be able to stop "faster"

(but did that REALLY have to do with the selectivity of B over C??? NO because this is also true:

there are 5 million other rows with b = 1
each of those 5 million rows have unique values for A
each of those 5 million rows have unique values for C (and all of them could be between 1 and 7 still

So here -- B is not selective at all but A and C are.

Blog Archive

My Works

Monday, 12 November 2007

Indexing Strategy and Column Order - III

No comments:

About Me