Tuesday, 25 October 2016

Online Statistics Gathering for Bulk Loads in Oracle Database 12c Release 1

Oracle is now able to gather statistics during some bulk operations, making it unnecessary to gather statistics subsequently. This means extra table scans for statistics collection are unnecessary and there is less chance that you will forget to gather statistics after bulk loads.
  • CREATE TABLE ... AS SELECT (CTAS)
Oracle is able to gather statistics during a CREATE TABLE ... AS SELECT (CTAS) operation for most regular heap organized table, as shown below.
CONN test/test@pdb1

DROP TABLE tab1 PURGE;

CREATE TABLE tab1 AS
SELECT level AS id,
       'Description of ' || level AS description
FROM   dual
CONNECT BY level <= 1000;

COLUMN table_name FORMAT A20

SELECT table_name,
       num_rows
FROM   user_tables
WHERE  table_name = 'TAB1';

TABLE_NAME      NUM_ROWS
------------------   ----------
TAB1                       1000


SQL>
  • INSERT INTO ... SELECT
In a similar fashion, INSERT INTO ... SELECT operations on empty segments using direct path can also benefit from online statistics gathering. These operations are performed using direct path if they are running in parallel, or if the APPEND hint is specified.

TRUNCATE TABLE tab1;

INSERT /*+ APPEND */ INTO tab1
SELECT level AS id,
       'Description of ' || level AS description
FROM   dual
CONNECT BY level <= 500;
COMMIT;

COLUMN table_name FORMAT A20

SELECT table_name,
       num_rows
FROM   user_tables
WHERE  table_name = 'TAB1';

TABLE_NAME     NUM_ROWS
-----------------    ----------
TAB1                       500

SQL>

The object must have an empty segment (missing segment due to delayed segment creation) for this to work. A regular delete will not leave the object in the correct state to allow the online gathering of statistics to kick in.

DELETE FROM tab1;
COMMIT;

INSERT /*+ APPEND */ INTO tab1
SELECT level AS id,
       'Description of ' || level AS description
FROM   dual
CONNECT BY level <= 700;
COMMIT;

COLUMN table_name FORMAT A20

SELECT table_name,
       num_rows
FROM   user_tables
WHERE  table_name = 'TAB1';

TABLE_NAME    NUM_ROWS
------------------  ----------
TAB1                    500

SQL>

The NO_GATHER_OPTIMIZER_STATISTICS hint can be used to explicitly prevent the online gathering of statistics.

TRUNCATE TABLE tab1;

INSERT /*+ APPEND */ INTO tab1
SELECT /*+ NO_GATHER_OPTIMIZER_STATISTICS */
       level AS id,
       'Description of ' || level AS description
FROM   dual
CONNECT BY level <= 700;
COMMIT;

COLUMN table_name FORMAT A20

SELECT table_name,
       num_rows
FROM   user_tables
WHERE  table_name = 'TAB1';

TABLE_NAME      NUM_ROWS
------------------    ----------
TAB1                        500

SQL>
  • Partitioned Tables
A bulk load into an empty partitioned table will trigger the gathering of global statistics, but no partition-level statistics.

DROP TABLE tab1 PURGE;

CREATE TABLE tab1
(id            NUMBER,
 created_date  DATE)
PARTITION BY RANGE (created_date)
(PARTITION tab1_2015 VALUES LESS THAN (TO_DATE('01/01/2016', 'DD/MM/YYYY')) TABLESPACE users,
 PARTITION tab1_2016 VALUES LESS THAN (TO_DATE('01/01/2017', 'DD/MM/YYYY')) TABLESPACE users);

INSERT /*+ APPEND */ INTO tab1
SELECT level,
       TO_DATE('01/01/2016', 'DD/MM/YYYY')
FROM   dual
CONNECT BY level <= 100;
COMMIT;

COLUMN table_name FORMAT A20

SELECT table_name,
       num_rows
FROM   user_tables
WHERE  table_name = 'TAB1';

TABLE_NAME      NUM_ROWS
------------------  ----------
TAB1                        100

SQL>

COLUMN table_name FORMAT A20
COLUMN partition_name FORMAT A20

SELECT table_name,
       partition_name,
       num_rows
FROM   user_tab_partitions
WHERE  table_name = 'TAB1'
ORDER BY partition_name;

TABLE_NAME     PARTITION_NAME     NUM_ROWS
----------------- -------------------- ----------
TAB1                   TAB1_2015
TAB1                   TAB1_2016

SQL>

A bulk load that explicitly references an empty partition will result in partition-level statistics, but no global statistics.

EXEC DBMS_STATS.delete_table_stats(USER, 'TAB1');
TRUNCATE TABLE tab1;

INSERT /*+ APPEND */ INTO tab1 PARTITION (TAB1_2016)
SELECT level,
       TO_DATE('01/01/2016', 'DD/MM/YYYY')
FROM   dual
CONNECT BY level <= 100;
COMMIT;

COLUMN table_name FORMAT A20

SELECT table_name,
       num_rows
FROM   user_tables
WHERE  table_name = 'TAB1';

TABLE_NAME       NUM_ROWS
------------------   ----------
TAB1

SQL>

COLUMN table_name FORMAT A20
COLUMN partition_name FORMAT A20

SELECT table_name,
       partition_name,
       num_rows
FROM   user_tab_partitions
WHERE  table_name = 'TAB1'
ORDER BY partition_name;

TABLE_NAME    PARTITION_NAME   NUM_ROWS
----------------  --------------------    ----------
TAB1                     TAB1_2015
TAB1                     TAB1_2016                   100

SQL>
  • Restrictions
There are a number of restrictions associated with online statistics gathering. They are not gathered for:

Index statistics or histograms. If those are required the must be gathered in a separate operation. The following call will gather missing statistics, but will not re-gather table or column statistics unless the existing statistics are already stale.

EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'TAB1', options => 'GATHER AUTO');
  • Non-empty segments, as described above.
  • Tables in built-in schemas. Only those in user-defined schemas.
  • Nested, index-organized or external tables.
  • Global temporary tables using the ON COMMIT DELETE ROWS clause.
  • Table with virtual columns.
  • Tables if the PUBLISH preference is set to FALSE for DBMS_STATS.
  • Tables with locked statistics.
  • Partitioned tables using incremental statistics, where the insert is not explicitly referencing a partition using the PARTITION clause.
  • Tables loaded using multitable inserts.