Thursday, 8 September 2016

SQL*Loader Enhancements in Oracle Database 12c Release 1

This article presents an overview of the enhancements to SQL*Loader in Oracle Database 12c Release 1.
  • Setup
  • Direct NFS (DNFS) Support
  • Extended Data Type Support
  • SQL*Loader Express
  • SQL*Loader Command Line Changes
  • SQL*Loader Control File Changes
  • Audit SQL*Loader Direct Path Loads

1. Setup


The examples in this article require the SCOTT schema. If you are working in a multitenant environment, edit the "$ORACLE_HOME/rdbms/admin/utlsampl.sql" script, amending the connection to the SCOTT schema as necessary, then run the script when connected to the relevant PDB as a privileged user.

conn sys@pdb1 as sysdba
@?/rdbms/admin/utlsampl.sql

Create a CSV file to load later. This should be placed in a suitable NFS mount point. In this case I'm also placing it into the "/tmp" directory so as not to confuse when discussing non-NFS related functionality.

CONN scott/tiger@pdb1

ALTER SESSION SET nls_date_format='DD-MON-YYYY HH24:MI:SS';

-- Create data file.
SET PAGESIZE 0 LINESIZE 100 TRIMSPOOL ON FEEDBACK OFF
SPOOL /nfs/EMP.dat.tmp

SELECT empno || ',' || ename || ',' || job || ',' || mgr || ',' ||
       hiredate || ',' || sal || ',' || comm || ',' || deptno
FROM   emp;

SPOOL OFF
SET PAGESIZE 14 FEEDBACK ON

-- Clean up the file.
HOST cat /nfs/EMP.dat.tmp | grep '[0-9]\{4\}' > /nfs/EMP.dat
HOST cp /nfs/EMP.dat /tmp/EMP.dat

Create directory objects to allow the SCOTT user to load the data file from both locations.

CONN sys@pdb1 AS SYSDBA

-- Create a directory pointing to an NFS location.
CREATE OR REPLACE DIRECTORY nfs_dir AS '/nfs/';
GRANT READ,WRITE ON DIRECTORY nfs_dir TO scott;

-- Create a directory pointing to an non-NFS location.
CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp/';
GRANT READ,WRITE ON DIRECTORY tmp_dir TO scott;

We also need a table to load into, so we will create a copy of the EMP table called EMP2 without any data.

CONN scott/tiger@pdb1

CREATE TABLE emp2 AS
SELECT *
FROM   emp
WHERE  1=2;

2. Direct NFS (DNFS) Support


SQL*Loader now supports the use of Direct NFS (DNFS) to improve performance when the data being loaded resides on NFS filers. DNFS is used by default for files over 1G in size. Explicit control is possible using the DNFS_ENABLE and DNFS_READBUFFERS parameters, with the DNFS_READBUFFERS parameter defaulting to 4.

Create a file called "/nfs/EMP.ctl" with the following contents.

OPTIONS (
  BADFILE '/nfs/EMP.bad'
  LOGFILE '/nfs/EMP.log'
  DNFS_ENABLE TRUE
  DNFS_READBUFFERS 10
)
LOAD DATA
INFILE '/nfs/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
   (empno,
    ename,
    job,
    mgr,
    hiredate   DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal,
    comm,
    deptno)

Run the following SQL*Loader command to load the data into the EMP table, replacing the existing data.

$ cd /nfs
$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

Extended Data Type Support


When a database is configured to use Extended Data Types, SQL*Loader operations against the database will support the extended data types also.

SQL*Loader Express


SQL*Loader Express is really just a simple extension of the command line interface, such that all the default values are targeted at handling basic CSV files that match the structure of a table. For example, if we want to load data into the EMP2 table, the default settings will look for a dump file called "EMP2.dat" that internally matches the structure of EMP2 table. As a result, we only need to specify command line arguments if the derived and default settings are not acceptable.

$ cd /tmp
$ cp EMP.dat EMP2.dat
$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\"
SQL*Loader: Release 12.1.0.2.0 - Production on Sat Sep 13 22:18:46 2014

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Express Mode Load, Table: EMP2
Path used:      External Table, DEGREE_OF_PARALLELISM=AUTO

Table EMP2:
  14 Rows successfully loaded.

Check the log files:
  EMP2.log
  EMP2_%p.log_xt
for more information about the load.
$

The SQL*Loader Express documentation lists all the derived and default settings.

SQL*Loader Command Line Changes


SQL*Loader Express is the main change for the command line, but there are some additional changes.

The TRIM parameter allows you to override the default LDRTRIM functionality with one of LRTRIM, NOTRIM, LTRIM, RTRIM.

$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\" \
  TRIM=LRTRIM

The DEGREE_OF_PARALLELISM parameter can be set to an integer value, DEFAULT, AUTO or NONE, with AUTO being the default value.

$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\" \
  DEGREE_OF_PARALLELISM=4

The BAD, DISCARD, and LOG parameters can be specified using just a directory object name.

$ sqlldr scott/tiger@pdb1 TABLE=EMP2 DATE_FORMAT=\"DD-MON-YYYY HH24:MI:SS\" \
  BAD=TMP_DIR DISCARD=TMP_DIR LOG=TMP_DIR

SQL*Loader Control File Changes


Not surprisingly, most of the SQL*Loader control file changes are similar to those introduced for ORACLE_LOADER access driver for external tables.

The INFILE clause now accepts wildcards, where "*" matches multiple characters, while a "?" matches a single character.

INFILE '/nfs/EMP*.dat'

INFILE '/nfs/EMP?.dat'

Files using CSV (comma-separated-values) format can use the simplified FIELDS CSV clause. The default settings for this can be modified as described here.

LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS CSV WITH EMBEDDED
   (empno,
    ename,
    job,
    mgr,
    hiredate    DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal,
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

A default datetime format can be specified for all datetime fields using the DATE FORMAT clause.

LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
DATE FORMAT "DD-MON-YYYY HH24:MI:SS"
   (empno,
    ename,
    job,
    mgr,
    hiredate DATE,
    sal,
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

A default NULLIF can be specified that applies to all character fields. A field-specific NULLIF overrides the default NULLIF. The NO NULLIF clause can be used against a field to prevent the default NULLIF applying to it. The full syntax is presented here.

LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
NULLIF = "NONE"
DATE FORMAT "DD-MON-YYYY HH24:MI:SS"
   (empno,
    ename,
    job,
    mgr NO NULLIF,
    hiredate    DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal NULLIF job="PRESIDENT",
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

The FIELD NAMES clause allows you to specify the order of the fields in the data file, saying they are either in the first line of the file, or in a separate file. I can't really see myself using this feature, but it is discussed here.

Audit SQL*Loader Direct Path Loads


Oracle 12c allows SQL*Loader direct path loads to be audited by creating an audit policy.

CREATE AUDIT POLICY policy_name
  ACTIONS COMPONENT=DIRECT_LOAD [LOAD | ALL];

I'm having some trouble getting this auditing to work using the "ALL" option. Thanks to Ronan for pointing out the "LOAD" method was working as expected.

When this policy is applied to a user, their SQL*Loader direct path operations will appear in the audit trail. The following policy audits all SQL*Loader operations. The policy is applied to the SCOTT user.

CONN sys@pdb1 AS SYSDBA
CREATE AUDIT POLICY audit_sl_load_policy ACTIONS COMPONENT=DIRECT_LOAD LOAD;
AUDIT POLICY audit_sl_load_policy BY scott;

Use the following control file to run SQL*Loader.

OPTIONS (DIRECT=TRUE)
LOAD DATA
INFILE '/tmp/EMP.dat'
INTO TABLE emp2
TRUNCATE
FIELDS TERMINATED BY ','
   (empno,
    ename,
    job,
    mgr,
    hiredate   DATE "DD-MON-YYYY HH24:MI:SS" ":hiredate",
    sal,
    comm,
    deptno)

$ sqlldr userid=scott/tiger@pdb1 control=EMP.ctl

Checking the audit trail shows the SQL*Loader run was audited.

CONN sys@pdb1 AS SYSDBA

-- Flush audit information to disk.
EXEC DBMS_AUDIT_MGMT.FLUSH_UNIFIED_AUDIT_TRAIL;

SET LINESIZE 200
COLUMN event_timestamp FORMAT A30
COLUMN object_schema FORMAT A10
COLUMN object_name FORMAT A30
COLUMN action_name FORMAT A20

SELECT event_timestamp,
       object_schema,
       object_name,
       action_name,
       direct_path_num_columns_loaded
FROM   unified_audit_trail
WHERE  audit_type = 'Direct path API'
ORDER BY 1;

EVENT_TIMESTAMP  OBJECT_SCH  OBJECT_NAME  ACTION_NAME          DIRECT_PATH_NUM_COLUMNS_LOADED
------------------- ---------- -----------------------------------
19-OCT-2015 07:50:08.335507   SCOTT  EMP2  LOAD  8

2 rows selected.

SQL>