Friday, 23 March 2018

Inline External Tables in Oracle Database 18c

Oracle Database 18c allows you to access data in flat files using an inline external table defined in a SELECT statement.

Oracle Database 18c, Oracle Database Certifications, Oracle Database Tutorials and Materials

This is a step further than the ability to override external table parameters from a query introduced in Oracle Database 12c Release 2 (12.2).


◈ Setup


In order to demonstrate an inline external table we need some data in flat files. The following code spools out four CSV files with 1000 rows each.

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

SET MARKUP CSV ON QUOTE ON
SET TRIMSPOOL ON LINESIZE 1000 FEEDBACK OFF PAGESIZE 0

SPOOL /tmp/gbr1.txt
SELECT 'GBR',
       object_id,
       owner,
       object_name
FROM   dba_objects
WHERE  object_id <= 2000
AND    rownum <= 1000;
SPOOL OFF

SPOOL /tmp/gbr2.txt
SELECT 'GBR',
       object_id,
       owner,
       object_name
FROM   dba_objects
WHERE  object_id BETWEEN 2000 AND 3999
AND    rownum <= 1000;
SPOOL OFF

SPOOL /tmp/ire1.txt
SELECT 'IRE',
       object_id,
       owner,
       object_name
FROM   dba_objects
WHERE  object_id <= 2000
AND    rownum <= 1000;
SPOOL OFF

SPOOL /tmp/ire2.txt
SELECT 'IRE',
       object_id,
       owner,
       object_name
FROM   dba_objects
WHERE  object_id BETWEEN 2000 AND 3999
AND    rownum <= 1000;
SPOOL OFF

SET MARKUP CSV OFF
SET TRIMSPOOL ON LINESIZE 1000 FEEDBACK OFF PAGESIZE 14

Create two directory objects to access these files. In this case both are pointing to the same directory, but it will still allow us to see the syntax.

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

CREATE OR REPLACE DIRECTORY tmp_dir1 AS '/tmp/';
GRANT READ, WRITE ON DIRECTORY tmp_dir1 TO test;

CREATE OR REPLACE DIRECTORY tmp_dir2 AS '/tmp/';
GRANT READ, WRITE ON DIRECTORY tmp_dir2 TO test;

CONN test/test@pdb1

◈ Inline External Table


An inline external table allows the external table definition to be placed in the FROM clause of a SQL statement using the EXTERNAL clause, so there is no need for an external table to be explicitly created.

SELECT country_code, COUNT(*) AS amount
FROM   EXTERNAL (
         (
           country_code  VARCHAR2(3),
           object_id     NUMBER,
           owner         VARCHAR2(128),
           object_name   VARCHAR2(128)
         )
         TYPE oracle_loader
         DEFAULT DIRECTORY tmp_dir1
         ACCESS PARAMETERS (
           RECORDS DELIMITED BY NEWLINE
           BADFILE tmp_dir1
           LOGFILE tmp_dir1:'inline_ext_tab_%a_%p.log'
           DISCARDFILE tmp_dir1
           FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
           MISSING FIELD VALUES ARE NULL (
             country_code,
             object_id,
             owner,
             object_name 
           )
        )
        LOCATION ('gbr1.txt', 'gbr2.txt', 'ire1.txt', 'ire2.txt')
        REJECT LIMIT UNLIMITED
      ) inline_ext_tab
GROUP BY country_code
ORDER BY 1;

COUNTRY_CODE  AMOUNT
------------          ----------
GBR                    2000
IRE                     2000

SQL>

In the following example we use a different directory object, and specify a different list of files in the LOCATION clause. This, not surprisingly gives us a different result.

SELECT country_code, COUNT(*) AS amount
FROM   EXTERNAL (
         (
           country_code  VARCHAR2(3),
           object_id     NUMBER,
           owner         VARCHAR2(128),
           object_name   VARCHAR2(128)
         )
         TYPE oracle_loader
         DEFAULT DIRECTORY tmp_dir2
         ACCESS PARAMETERS (
           RECORDS DELIMITED BY NEWLINE
           BADFILE tmp_dir2
           LOGFILE tmp_dir2:'inline_ext_tab_%a_%p.log'
           DISCARDFILE tmp_dir2
           FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
           MISSING FIELD VALUES ARE NULL (
             country_code,
             object_id,
             owner,
             object_name 
           )
        )
        LOCATION ('gbr1.txt', 'gbr2.txt')
        REJECT LIMIT UNLIMITED
      ) inline_ext_tab
GROUP BY country_code
ORDER BY 1;

COUNTRY_CODE   AMOUNT
------------          ----------
GBR                      2000

SQL>

The inline external table definition is a little ugly, so you may prefer to put it into a WITH clause if you are planning to join it to other tables.

WITH inline_ext_tab AS (
  SELECT *
  FROM   EXTERNAL (
           (
             country_code  VARCHAR2(3),
             object_id     NUMBER,
             owner         VARCHAR2(128),
             object_name   VARCHAR2(128)
           )
           TYPE oracle_loader
           DEFAULT DIRECTORY tmp_dir2
           ACCESS PARAMETERS (
             RECORDS DELIMITED BY NEWLINE
             BADFILE tmp_dir2
             LOGFILE tmp_dir2:'inline_ext_tab_%a_%p.log'
             DISCARDFILE tmp_dir2
             FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
             MISSING FIELD VALUES ARE NULL (
               country_code,
               object_id,
               owner,
               object_name 
             )
          )
          LOCATION ('gbr1.txt', 'gbr2.txt')
          REJECT LIMIT UNLIMITED
        )
)
SELECT country_code, COUNT(*) AS amount
FROM   inline_ext_tab 
GROUP BY country_code
ORDER BY 1;

COUNTRY_CODE   AMOUNT
------------          ----------
GBR                    2000

SQL>

◈ Security Implications


As pointed out by Pete Finnigan, you need to consider the security implications of this functionality. At the time you define the external table you have made a conscious decision about the directory objects and file locations you are going to use. The ability to alter these parameters at runtime mean you need to pay special attention to the directory objects that are available to the user, or risk a security issue.

Create a second test user.

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

CREATE USER test2 IDENTIFIED BY test2;
GRANT CREATE SESSION TO test2;
Create the basic external table from the first example in the initial test user, and grant access to the new test user.

CONN test/test@pdb1

DROP TABLE tab_ext;

CREATE TABLE tab_ext (
  country_code  VARCHAR2(3),
  object_id     NUMBER,
  owner         VARCHAR2(128),
  object_name   VARCHAR2(128)
)
ORGANIZATION EXTERNAL (
  TYPE oracle_loader
  DEFAULT DIRECTORY tmp_dir1
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    BADFILE tmp_dir1
    LOGFILE tmp_dir1:'part_tab_ext_%a_%p.log'
    DISCARDFILE tmp_dir1
    FIELDS CSV WITH EMBEDDED TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    MISSING FIELD VALUES ARE NULL (
      country_code,
      object_id,
      owner,
      object_name 
    )
  )
  LOCATION ('gbr1.txt', 'gbr2.txt', 'ire1.txt', 'ire2.txt')
)
REJECT LIMIT UNLIMITED;

GRANT SELECT ON tab_ext To test2;

Now attempt to use the external table from the new test user.

CONN test2/test2@pdb1

COLUMN country_code FORMAT A12

SELECT country_code, COUNT(*) AS amount
FROM   test.tab_ext
GROUP BY country_code
ORDER BY 1;
            *
ERROR at line 2:
ORA-06564: object TMP_DIR1 does not exist

SQL>

As we can see, the external table is run in the context of the current user, which doesn't have access to the directory objects. Let's grant access and try again.

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

GRANT READ, WRITE ON DIRECTORY tmp_dir1 TO test2;


CONN test2/test2@pdb1

COLUMN country_code FORMAT A12

SELECT country_code, COUNT(*) AS amount
FROM   test.tab_ext
GROUP BY country_code
ORDER BY 1;

COUNTRY_CODE  AMOUNT
------------          ----------
GBR           2000
IRE           2000

SQL>

This demonstrates the basic external table has now worked. We can see below we are able to alter the runtime parameters from the new test user.

SELECT country_code, COUNT(*) AS amount
FROM   test.tab_ext EXTERNAL MODIFY (
                      ACCESS PARAMETERS (
                        BADFILE tmp_dir1:'part_tab_ext_%a_%p.bad'
                        LOGFILE tmp_dir1
                        NODISCARDFILE
                      )
                      LOCATION ('gbr1.txt', 'ire1.txt', 'ire2.txt')
                      REJECT LIMIT 5
                    )
GROUP BY country_code
ORDER BY 1;

COUNTRY_CODE   AMOUNT
------------          ----------
GBR                       1000
IRE                        2000

SQL>

If another directory object is available to the new test user, we can also alter the directory references.

CONN / AS SYSDBA
ALTER SESSION SET CONTAINER = pdb1;

GRANT READ, WRITE ON DIRECTORY tmp_dir2 TO test2;


CONN test2/test2@pdb1

COLUMN country_code FORMAT A12

SELECT country_code, COUNT(*) AS amount
FROM   test.tab_ext EXTERNAL MODIFY (
                      DEFAULT DIRECTORY tmp_dir2
                      ACCESS PARAMETERS (
                        BADFILE tmp_dir2:'part_tab_ext_%a_%p.bad'
                        LOGFILE tmp_dir2
                        NODISCARDFILE
                      )
                      LOCATION ('gbr1.txt', 'ire1.txt', 'ire2.txt')
                      REJECT LIMIT 5
                    )
GROUP BY country_code
ORDER BY 1;

COUNTRY_CODE   AMOUNT
------------      ----------
GBR                1000
IRE                 2000

SQL>

◈ Miscellaneous


Some thoughts and comments related to this functionality.
  • As stated in the documentation, this functionality doesn't support partitioned external tables, but that is irrelevant as you have full control of the files being accessed, so there is no need to consider partitioning.
  • As mentioned above, there are security considerations related to this functionality.
  • It results in really ugly SQL.
  • It could be useful in a situation where you don't have privilege to create a metadata object, for example a read-only database.