Thursday, 22 September 2016

JSON Support in Oracle Database 12c Release 1

This article describes the server side support for JSON in Oracle Database 12c Release 1 (12.1.0.2). There is a lot of variety in the syntax of the JSON functionality, which for simplicity is avoided here.
  • Introduction to JSON
1. JSON stands for JavaScript Object Notation
2. JSON is a lightweight data-interchange format
3. JSON is language independent *
4. JSON is "self-describing" and easy to understand

The following JSON example defines an employees object, with an array of 3 employee records:
JSON Example

{"employees":[
    {"firstName":"John", "lastName":"Doe"},
    {"firstName":"Anna", "lastName":"Smith"},
    {"firstName":"Peter", "lastName":"Jones"}
]}
  • Creating Tables to Hold JSON
No new data type has been added to support JSON. Instead, it is stored in regular VARCHAR2 or CLOB columns. It can also be stored in NVARCHAR, NCLOB and BLOB columns, but it's less likely you will want to do this. The addition of the IS JSON constraint indicates the column contains valid JSON data. In the following example, a table is created to hold JSON documents.

DROP TABLE json_documents PURGE;

CREATE TABLE json_documents (
  id    RAW(16) NOT NULL,
  data  CLOB,
  CONSTRAINT json_documents_pk PRIMARY KEY (id),
  CONSTRAINT json_documents_json_chk CHECK (data IS JSON)
);

The IS JSON check constraint enforces lax JSON syntax by default. If you want to make it enforce strict JSON syntax, you need to add the (STRICT) qualifier.

DROP TABLE json_documents PURGE;

CREATE TABLE json_documents (
  id    RAW(16) NOT NULL,
  data  CLOB,
  CONSTRAINT json_documents_pk PRIMARY KEY (id),
  CONSTRAINT json_documents_json CHECK (data IS JSON (STRICT))
);

The rest of the article will assume the lax JSON syntax enforcement is used.

With the table in place, it is possible to insert data into the JSON column like any other VARCHAR2 or CLOB column.

INSERT INTO json_documents (id, data)
VALUES (SYS_GUID(),
        '{
          "FirstName"      : "John",
          "LastName"       : "Doe",
          "Job"            : "Clerk",
          "Address"        : {
                              "Street"   : "99 My Street",
                              "City"     : "My City",
                              "Country"  : "UK",
                              "Postcode" : "A12 34B"
                             },
          "ContactDetails" : {
                              "Email"    : "john.doe@example.com",
                              "Phone"    : "44 123 123456",
                              "Twitter"  : "@johndoe"
                             },
          "DateOfBirth"    : "01-JAN-1980",
          "Active"         : true
         }');

INSERT INTO json_documents (id, data)
VALUES (SYS_GUID(),
        '{
          "FirstName"      : "Jayne",
          "LastName"       : "Doe",
          "Job"            : "Manager",
          "Address"        : {
                              "Street"   : "100 My Street",
                              "City"     : "My City",
                              "Country"  : "UK",
                              "Postcode" : "A12 34B"
                             },
          "ContactDetails" : {
                              "Email"    : "jayne.doe@example.com",
                              "Phone"    : ""
                             },
          "DateOfBirth"    : "01-JAN-1982",
          "Active"         : false
         }');

COMMIT;

The presence of the IS JSON constraint means that invalid JSON will cause an error during insert or update.

UPDATE json_documents a
SET    a.data = '{"FirstName" : "Invalid Document"'
WHERE  a.data.FirstName = 'Jayne';
*
ERROR at line 1:
ORA-02290: check constraint (TEST.DOCUMENT_JSON) violated

SQL>
  • Querying JSON Data
    • Dot Notation
Provided the associated column has an IS JSON check constraint, the individual elements of a JSON document can be referenced directly from SQL using dot notation.

COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN Postcode FORMAT A10
COLUMN Email FORMAT A25

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.Address.Postcode AS Postcode,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
ORDER BY a.data.FirstName,
         a.data.LastName;

FIRSTNAME  LASTNAME   POSTCODE   EMAIL
--------------- ------------- ---------- -------------------------
Jayne               Doe             A12 34B    jayne.doe@example.com
John                Doe             A12 34B    john.doe@example.com

2 rows selected.

SQL>

If you use the dot notation to drill down into an element, the column name of the returned value remains set to the top-level element. In the example above, without the aliases, "Postcode" would return as "Address" and "Email" would return as "ContactDetails".

SELECT a.data.Address.Postcode
FROM   json_documents a;

ADDRESS
----------
A12 34B
A12 34B

2 rows selected.

SQL>

If a non-scalar value is a referenced, the result is returned as a JSON fragment.

SELECT a.data.ContactDetails
FROM   json_documents a;

CONTACTDETAILS
-----------------------------------------------------------------------------
{"Email":"john.doe@example.com","Phone":"44 123 123456","Twitter":"@johndoe"}
{"Email":"jayne.doe@example.com","Phone":""}

2 rows selected.

SQL>
    • IS JSON
We have already seen the IS JSON condition used as part of a check constraint when creating a table to hold JSON data. The IS JSON condition can be used to test if a column contains JSON data. The following example creates a test table with a CLOB column, but no IS JSON check constraint to control the contents. The subsequent queries show the use of the IS JSON condition to test the contents of the column.

DROP TABLE json_documents_no_constraint PURGE;

CREATE TABLE json_documents_no_constraint (
  id    RAW(16) NOT NULL,
  data  CLOB,
  CONSTRAINT json_documents_nocon_pk PRIMARY KEY (id)
);

INSERT INTO json_documents_no_constraint VALUES (SYS_GUID(), '{"FirstName" : "John"}');
INSERT INTO json_documents_no_constraint VALUES (SYS_GUID(), 'John');
COMMIT;

-- JSON_VALUE using NULL ON ERROR returns NULL for non-JSON data.
SELECT JSON_VALUE(a.data, '$.FirstName') AS first_name
FROM   json_documents_no_constraint a;

FIRST_NAME
---------------
John


2 rows selected.

SQL>

-- Only rows containing JSON are returned.
SELECT JSON_VALUE(a.data, '$.FirstName') AS first_name
FROM   json_documents_no_constraint a
WHERE  a.data IS JSON;

FIRST_NAME
---------------
John

1 row selected.

SQL>

Not surprisingly, you can use IS NOT JSON to target those rows that do not contain valid JSON in the column.
    • JSON_EXISTS
When you query JSON using the dot notation it isn't possible to tell the difference between an element that is missing and one that is present, but has a null value. We know from the data we inserted that Jayne Doe has a phone element with a null value and a missing Twitter element. The query below shows these are equivalent using dot notation.

COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN Email FORMAT A25
COLUMN Phone FORMAT A15
COLUMN Twitter FORMAT A10

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email,
       a.data.ContactDetails.Phone AS Phone,
       a.data.ContactDetails.Twitter AS Twitter
FROM   json_documents a
WHERE  a.data.ContactDetails.Phone IS NULL
AND    a.data.ContactDetails.Twitter IS NULL;

FIRSTNAME   LASTNAME  EMAIL                     PHONE           TWITTER
--------------- --------------- ------------------------- --------------- ----------
Jayne                Doe              jayne.doe@example.com

1 row selected.

SQL>
The JSON_EXISTS condition allows you to make the distinction between empty and missing elements.

-- Check for records where a Phone element is present, but has a null value.
COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN Email FORMAT A25

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  JSON_EXISTS(a.data.ContactDetails, '$.Phone' FALSE ON ERROR)
AND    a.data.ContactDetails.Phone IS NULL;

FIRSTNAME   LASTNAME        EMAIL
--------------- --------------- -------------------------
Jayne                Doe             jayne.doe@example.com

1 row selected.

SQL>


-- Check for records where Twitter element is missing.
COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN Email FORMAT A25

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  NOT JSON_EXISTS(a.data.ContactDetails, '$.Twitter' FALSE ON ERROR);

FIRSTNAME  LASTNAME        EMAIL
--------------- --------------- -------------------------
Jayne               Doe             jayne.doe@example.com

1 row selected.

SQL>

The default error handling is FALSE ON ERROR. Alternatives include TRUE ON ERROR and ERROR ON ERROR. The ERROR ON ERROR option, as the name implies, does not trap any errors produced by the JSON_EXISTS function.
    • JSON_VALUE
The JSON_VALUE function returns an element from the JSON document, based on the specified JSON path.

COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15

SELECT JSON_VALUE(a.data, '$.FirstName') AS first_name,
       JSON_VALUE(a.data, '$.LastName') AS last_name
FROM   json_documents a
ORDER BY 1, 2;

FIRST_NAME   LAST_NAME
--------------- ---------------
Jayne             Doe
John              Doe

2 rows selected.

SQL>

It will only return scalar values, not complex values like nested records or arrays. The default error handling is NULL OR ERROR, so an attempt to return a non-scalar value results in NULL. The follow example attempt to return a non-scalar value, using both the default error handling an the ERROR ON ERROR option.

SELECT JSON_VALUE(a.data, '$.ContactDetails') AS contact_details
FROM   json_documents a
ORDER BY 1;

CONTACT_DETAILS
----------------------------------------

2 rows selected.

SQL>

SELECT JSON_VALUE(a.data, '$.ContactDetails' ERROR ON ERROR) AS contact_details
FROM   json_documents a
ORDER BY 1;
       *
ERROR at line 2:
ORA-40456: JSON_VALUE evaluated to non-scalar value


SQL>

JSON supports boolean values true and false, which are not supported by SQL. The JSON_VALUE function converts boolean values to the strings true/false or the numbers 1/0.

COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN Active FORMAT A10

SELECT a.data.FirstName,
       a.data.LastName,
       JSON_VALUE(a.data, '$.Active') AS Active,
       JSON_VALUE(a.data, '$.Active' RETURNING NUMBER) AS ActiveNum
FROM   json_documents a
ORDER BY a.data.FirstName,
         a.data.LastName;

FIRSTNAME    LASTNAME        ACTIVE      ACTIVENUM
--------------- --------------- ---------- ----------
Jayne                 Doe             false               0
John                  Doe             true                1

2 rows selected.

SQL>

As mentioned previously, the default error handling is NULL ON ERROR, which means if the value found is non-scalar, no error will be raised. The ERROR ON ERROR option will return an error if the value found is a non-scalar, like an array.
    • JSON_QUERY
The JSON_QUERY function returns a JSON fragment representing one or more values. In the following example, JSON_QUERY is used to return a JSON fragment representing all the contact details for each person. The WITH WRAPPER option surrounds the fragment with square brackets.

COLUMN FirstName FORMAT A15
COLUMN LastName FORMAT A15
COLUMN contact_details FORMAT A40

SELECT a.data.FirstName,
       a.data.LastName,
       JSON_QUERY(a.data, '$.ContactDetails' WITH WRAPPER) AS contact_details
FROM   json_documents a
ORDER BY a.data.FirstName,
         a.data.Last_name;

FIRSTNAME     LASTNAME        CONTACT_DETAILS
--------------- --------------- ----------------------------------------
Jayne               Doe             [{"Email":"jayne.doe@example.com","Phone
                                ":""}]

John                Doe             [{"Email":"john.doe@example.com","Phone"
                                :"44 123 123456","Twitter":"@johndoe"}]


2 rows selected.

SQL>
    • JSON_TABLE

The JSON_TABLE function incorporates all the functionality of JSON_VALUE, JSON_EXISTS and JSON_QUERY. The syntax is a little more complicated that using the individual JSON functions, but it is more efficient to use a single JSON_TABLE call than combining multiple calls to the other individual functions into a single query.

JSON_TABLE is used for making JSON data look like relational data, which is especially useful when creating relational views over JSON data, as show below.

CREATE OR REPLACE VIEW json_documents_v AS
SELECT jt.first_name,
       jt.last_name,
       jt.job,
       jt.addr_street,
       jt.addr_city,
       jt.addr_country,
       jt.addr_postcode,
       jt.email,
       jt.phone,
       jt.twitter,
       TO_DATE(jt.dob, 'DD-MON-YYYY') AS dob,
       jt.active
FROM   json_documents,
       JSON_TABLE(data, '$'
         COLUMNS (first_name    VARCHAR2(50 CHAR) PATH '$.FirstName',
                  last_name     VARCHAR2(50 CHAR) PATH '$.LastName',
                  job           VARCHAR2(10 CHAR) PATH '$.Job',
                  addr_street   VARCHAR2(50 CHAR) PATH '$.Address.Street',
                  addr_city     VARCHAR2(50 CHAR) PATH '$.Address.City',
                  addr_country  VARCHAR2(50 CHAR) PATH '$.Address.Country',
                  addr_postcode VARCHAR2(50 CHAR) PATH '$.Address.Postcode',
                  email         VARCHAR2(100 CHAR) PATH '$.ContactDetails.Email',
                  phone         VARCHAR2(50 CHAR) PATH '$.ContactDetails.Phone',
                  twitter       VARCHAR2(50 CHAR) PATH '$.ContactDetails.Twitter',
                  dob           VARCHAR2(11 CHAR) PATH '$.DateOfBirth',
                  active        VARCHAR2(5 CHAR) PATH '$.Active')) jt;


COLUMN first_name FORMAT A15
COLUMN last_name FORMAT A15

SELECT first_name, last_name, dob
FROM   json_documents_v
ORDER BY first_name, last_name;

FIRST_NAME    LAST_NAME       DOB
--------------- --------------- --------------------
Jayne              Doe             01-JAN-1982 00:00:00
John               Doe             01-JAN-1980 00:00:00

2 rows selected.

SQL>

The COLUMNS clause defines the how the data for each column is identified and presented (column projection).

There are variety of options to traverse the JSON documents, including processing arrays using the NESTED clause, and to control how data is presented. In the following example, contact details are presented in JSON format.

COLUMN first_name FORMAT A15
COLUMN last_name FORMAT A15
COLUMN contact_details FORMAT A40

SELECT jt.first_name,
       jt.last_name,
       jt.contact_details
FROM   json_documents,
       JSON_TABLE(data, '$'
         COLUMNS (first_name    VARCHAR2(50 CHAR) PATH '$.FirstName',
                  last_name     VARCHAR2(50 CHAR) PATH '$.LastName',
                  contact_details  VARCHAR2(4000 CHAR)
                    FORMAT JSON WITH WRAPPER PATH '$.ContactDetails')) jt;

FIRST_NAME    LAST_NAME       CONTACT_DETAILS
--------------- --------------- ----------------------------------------
John                Doe             [{"Email":"john.doe@example.com","Phone"
                                :"44 123 123456","Twitter":"@johndoe"}]

Jayne              Doe             [{"Email":"jayne.doe@example.com","Phone
                                ":""}]


2 rows selected.

SQL>
    • JSON_TEXTCONTAINS
The JSON_TEXTCONTAINS condition is only available once a JSON search index has been created
  • Identifying Columns Containing JSON
The [USER|ALL|DBA]_JSON_COLUMNS views can be used to identify tables and columns containing JSON data.

COLUMN table_name FORMAT A15
COLUMN column_name FORMAT A15

SELECT table_name,
       column_name,
       format,
       data_type
FROM   user_json_columns;

TABLE_NAME    COLUMN_NAME   FORMAT    DATA_TYPE
--------------- --------------- --------- -------------
JSON_DOCUMENTS  DATA            TEXT      CLOB

1 row selected.

SQL>
  • Loading JSON Files Using External Tables
Oracle provide an example JSON dump file from a popular NoSQL database, which you can use to test the external table functionality.

Create the directory objects for use with the external table.

CONN sys@pdb1 AS SYSDBA

CREATE OR REPLACE DIRECTORY order_entry_dir
  AS '/u01/app/oracle/product/12.1.0.2/db_1/demo/schema/order_entry';
GRANT READ, WRITE ON DIRECTORY order_entry_dir TO test;

CREATE OR REPLACE DIRECTORY loader_output_dir AS '/tmp';
GRANT READ, WRITE ON DIRECTORY loader_output_dir TO test;

Create the external table and query it to check it is working.

CONN test/test@pdb1

CREATE TABLE json_dump_file_contents (json_document CLOB)
  ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY order_entry_dir
                         ACCESS PARAMETERS
                           (RECORDS DELIMITED BY 0x'0A'
                            DISABLE_DIRECTORY_LINK_CHECK
                            BADFILE loader_output_dir: 'JSONDumpFile.bad'
                            LOGFILE order_entry_dir: 'JSONDumpFile.log'
                            FIELDS (json_document CHAR(5000)))
                         LOCATION (order_entry_dir:'PurchaseOrders.dmp'))
  PARALLEL
  REJECT LIMIT UNLIMITED;


SELECT COUNT(*) FROM json_dump_file_contents;

  COUNT(*)
----------
     10000

1 row selected.

SQL>

You can now load the database table with the contents of the external table.

TRUNCATE TABLE json_documents;

INSERT /*+ APPEND */ INTO json_documents
  SELECT SYS_GUID(), json_document
  FROM   json_dump_file_contents
  WHERE  json_document IS JSON;

COMMIT;
  • Dot Notation Query Tranformation
It's worth keeping in mind the dot notation syntax is a query transformation. If we run a query using dot notation and perform a 10053 trace we can see this.

Check the trace file for the session.

SELECT value FROM v$diag_info WHERE  name = 'Default Trace File';

VALUE
----------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_2861.trc

1 row selected.

SQL>

Perform a 10053 trace of the statement.

ALTER SESSION SET EVENTS '10053 trace name context forever';

SELECT a.data.FirstName,
       a.data.LastName
FROM   json_documents a;

ALTER SESSION SET EVENTS '10053 trace name context off';

The section beginning with "Final query after transformations" shows the statement that was actually processed, after the query transformation.

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT JSON_QUERY("A"."DATA" FORMAT JSON , '$.FirstName' RETURNING VARCHAR2(4000)
         ASIS  WITHOUT ARRAY WRAPPER NULL ON ERROR) "FIRSTNAME",
       JSON_QUERY("A"."DATA" FORMAT JSON , '$.LastName' RETURNING VARCHAR2(4000)
         ASIS  WITHOUT ARRAY WRAPPER NULL ON ERROR) "LASTNAME"
FROM "TEST"."JSON_DOCUMENTS" "A"
As you can see, the statement has been rewritten to use JSON_QUERY calls. That is important for two reasons.
  1. Performance : It may be more efficient to use a single JSON_TABLE call to retrieve the data, rather than have a separate JSON_QUERY call for each dot notation reference.
  2. Indexing : When you start to index JSON data, you have to understand how the index usage will be affected when using combinations of direct function calls and dot notation. Depending on the circumstances, the query transformation can vary, causing your expensive indexes not to be used.
It may be a good idea to avoid dot notation entirely, as it will probably make the process of deciding on an indexing strategy much simpler.