Monday, 18 July 2016

When is an index access path better than a full table scan?

The cost based optimizer makes decisions that can be hard to understand. One of the hardest may be why it chooses indexed or scan access paths: a burning question for many DBAs.

You will often hear a rule-of-thumb regarding what percentage of a table needs to be selected before a full table scan is more efficient thatn an index range scan. 2% is frequently suggested. However, you cannot place any reliance on such a figure. The cross over point will be dependent on many factors, including (but not limited to):

Number of rows in the table
Block size
Average row length
Key length
Speed of the disc
Multiblock read count
Direct or indirect read
Memory configuration
Partitioning and parallelism

All of this comes down to statistics. Object statistics, and system statistics. One of the most critical statistics is the clustering factor of the index: how closely the physical order of the rows conforms to the order of the keys.

This (not very scientific) test shows that for a perfectly clustered index, the CBO switches from index range scan to table full scan when the percentage of rows selected is 14%, but for an unclustered index it switches at just 0.06%.

Set up the test:

create table t1(c1 number,c2 number);
create index c1i on t1(c1);
create index c2i on t1(c2);
insert into t1 select rownum,round(dbms_random.value(1,1000000)) from dual connect by level < 1000000;
exec dbms_stats.gather_table_stats(user,'t1');

Results:

orclz> set autot trace exp
orclz> select count(c2) from t1 where c1 between 1 and 145000;

Execution Plan
----------------------------------------------------------------------------------
Plan hash value: 3724264953

----------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time    |
----------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |      |    1 |    10 |  586  (2)| 00:00:01 |
|  1 |  SORT AGGREGATE    |      |    1 |    10 |            |          |
|*  2 |  TABLE ACCESS FULL| T1  |  145K|  1416K|  586  (2)| 00:00:01 |
----------------------------------------------------------------------------------


Predicate Information (identified by operation id):
-----------------------------------------------------------------------------------

  2 - filter("C1"<=145000 AND "C1">=1)

orclz> select count(c2) from t1 where c1 between 1 and 135000;


Execution Plan
-------------------------------------------------------------------------------------
Plan hash value: 359387059

---------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |      |    1 |    10 |  552  (1)| 00:00:01 |
|  1 |  SORT AGGREGATE                      |      |    1 |    10 |            |          |
|  2 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1  |  135K|  1318K|  552  (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | C1I  |  135K|      |  273  (1)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------------------------------------------

  3 - access("C1">=1 AND "C1"<=135000)

orclz>
orclz>
orclz> select count(c1) from t1 where c2 between 1 and 550;


Execution Plan
--------------------------------------------------------------------------------------
Plan hash value: 632084968

---------------------------------------------------------------------------------------
| Id  | Operation                            | Name | Rows  | Bytes | Cost (%CPU)| Time    |
---------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |      |    1 |    10 |  555  (0)| 00:00:01 |
|  1 |  SORT AGGREGATE                      |      |    1 |    10 |            |          |
|  2 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1  |  551 |  5510 |  555  (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | C2I  |  551 |      |    4  (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
-------------------------------------------------------------------------------------

  3 - access("C2">=1 AND "C2"<=550)

orclz> select count(c1) from t1 where c2 between 1 and 650;


Execution Plan
-------------------------------------------------------------------------------------
Plan hash value: 3724264953

------------------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time    |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |      |    1 |    10 |  586  (2)| 00:00:01 |
|  1 |  SORT AGGREGATE    |      |    1 |    10 |            |          |
|*  2 |  TABLE ACCESS FULL| T1  |  651 |  6510 |  586  (2)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
------------------------------------------------------------------------------------

  2 - filter("C2"<=650 AND "C2">=1)

orclz>

Tests done with DB 12.1.0.2 on my laptop: Windows 10, SSD disk, no system stats.