Wednesday, 17 August 2016

Oracle's ANSI Join Syntax Can Result In Wrong Results

Ever since Oracle provided their version of ANSI syntax it's been improving with each release. Unfortunately some steps forward also take steps backwards. A case in point is the following example, brought to my attention by Jonathan Lewis. Let's look at the queries and see where Oracle has gone afoul of things.

The queries seem simple enough:

with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select *
    from (
        select sum(table3.table2.my_number) the_answer
            from table1
            left join table2 on table1.my_number = table2.my_number
            group by table1.my_number
        );

with
    table1 as ( select 1 my_number from dual ),
    table2 as ( select 1 my_number from dual )
select sum(table3.table2.my_number) the_answer
    from table1
    left join table2 on table1.my_number = table2.my_number
    group by table1.my_number;

Looking very carefully, notice that there's a 'table3' that is unaccounted for in the setup. Interestingly enough 11.2.0.4 executes these queries with nary a whimper or complaint:

SQL> with
        table1 as ( select 1 my_number from dual ),
        table2 as ( select 1 my_number from dual )
    select *
        from (
            select sum(table3.table2.my_number) the_answer
                from table1
                left join table2 on table1.my_number = table2.my_number
                group by table1.my_number
         );

THE_ANSWER
----------
         1

SQL>
SQL>
SQL> with
        table1 as ( select 1 my_number from dual ),
        table2 as ( select 1 my_number from dual )
    select sum(table3.table2.my_number) the_answer
        from table1
        left join table2 on table1.my_number = table2.my_number
        group by table1.my_number;

THE_ANSWER
----------
         1

SQL>
Try to run them on 12.1.0.2 and you see a different result:

SQL> with
        table1 as ( select 1 my_number from dual ),
        table2 as ( select 1 my_number from dual )
    select *
        from (
            select sum(table3.table2.my_number) the_answer
                from table1
                left join table2 on table1.my_number = table2.my_number
                group by table1.my_number
         );
        select sum(table3.table2.my_number) the_answer
                   *
ERROR at line 6:
ORA-00904: "TABLE3"."TABLE2"."MY_NUMBER": invalid identifier

SQL>
SQL>
SQL> with
        table1 as ( select 1 my_number from dual ),
        table2 as ( select 1 my_number from dual )
    select sum(table3.table2.my_number) the_answer
        from table1
        left join table2 on table1.my_number = table2.my_number
       group by table1.my_number;

THE_ANSWER
----------
         1
SQL>

Running a 10053 trace in 11.2.0.4 reveals this:

SELECT SUM(CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE NULL END ) "THE_ANSWER"
FROM "SYS"."DUAL" "DUAL","SYS"."DUAL" "DUAL"
WHERE CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END =CASE
WHEN ("DUAL".ROWID(+) IS NOT NULL) THEN 1 ELSE NULL END
GROUP BY 1

The errant 'table3' has disappeared! Oracle has done a merge and in the process eliminated the problem table reference, as shown in this excerpt from the 10053 trace file:

CVM:   Merging complex view SEL$5E38693D (#0) into SEL$3 (#0).

qbcp:******* UNPARSED QUERY IS *******

SELECT "from$_subquery$_003"."THE_ANSWER" "THE_ANSWER" FROM  (SELECT SUM("TABLE2"."MY_NUMBER") "THE_ANSWER" FROM "SYS"."DUAL" "DUAL", (SELECT 1 "MY_NUMBER" FROM "SYS"."DUAL" "DUAL") "TABLE2" WHERE CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END ="TABLE2"."MY_NUMBER"(+) GROUP BY 1) "from$_subquery$_003"

vqbcp:******* UNPARSED QUERY IS *******

SELECT SUM("TABLE2"."MY_NUMBER") "THE_ANSWER" FROM "SYS"."DUAL" "DUAL", (SELECT 1 "MY_NUMBER" FROM "SYS"."DUAL" "DUAL") "TABLE2" WHERE CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END ="TABLE2"."MY_NUMBER"(+) GROUP BY 1
CVM: result SEL$3 (#0)

******* UNPARSED QUERY IS *******

SELECT SUM("TABLE2"."MY_NUMBER") "THE_ANSWER" FROM "SYS"."DUAL" "DUAL", (SELECT 1 "MY_NUMBER" FROM "SYS"."DUAL" "DUAL") "TABLE2" WHERE CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END ="TABLE2"."MY_NUMBER"(+) GROUP BY 1

Registered qb: SEL$0A2D7B19 0xf9641f8 (VIEW MERGE SEL$3; SEL$5E38693D)

Oracle 12.1.0.2 doesn't do this on the first query, so it returns the error that would be expected. The second query in both systems executes, even though it also has the errant 'table3' specified. The trace file reports:

FPD: Considering simple filter push in query block SEL$F146EED8 (#1) CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END =CASE  WHEN ("DUAL".ROWID(+) IS NOT NULL) THEN 1 ELSE NULL END try to generate transitive predicate from check constraints for query block SEL$F146EED8 (#1) finally: CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END =CASE  WHEN ("DUAL".ROWID(+) IS NOT NULL) THEN 1 ELSE NULL END

Final query after transformations:******* UNPARSED QUERY IS *******

SELECT SUM(CASE  WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE NULL END ) "THE_ANSWER" FROM "SYS"."DUAL" "DUAL","SYS"."DUAL" "DUAL" WHERE CASE WHEN "DUAL".ROWID IS NOT NULL THEN 1 ELSE 1 END =CASE  WHEN ("DUAL".ROWID(+) IS NOT NULL) THEN 1 ELSE NULL END  GROUP BY 1 kkoqbc: optimizing query block SEL$F146EED8 (#1)

It appears that the second query is transformed using a simple filter push, which in both releases eliminates the troublesome 'table3' from the query text, allowing Oracle to successfully execute the query. This would indicate to me that there are two code paths involved and that only one has been modified between 11.2.0.4 and 12.1.0.2. To me, both queries should produce the same error, but at least Oracle is making progress since the first query, as written, throws the expected error in the latest release of the database.