The APPROX_RANK, APPROX_SUM and APPROX_COUNT functions were introduced in Oracle 18c to allow approximate top-n query processing. This extends the approximate query processing that was introduced in the previous two releases (12cR1 and 12cR2).
The examples in this article require the following table. The table is populated with random data, so your results may look a little different than those presented here.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
TRUNC(DBMS_RANDOM.value(1, 5))*10 department,
UPPER(TO_CHAR(TO_DATE(TRUNC(DBMS_RANDOM.value(1, 6)),'j'), 'jsp')) AS record_type,
ROUND(DBMS_RANDOM.value(1,51),2) AS record_value
FROM dual
CONNECT BY level <= 100000;
You can get an idea of your data set using the following query.
COLUMN record_type FORMAT A12
SELECT department,
record_type,
COUNT(*) AS record_count,
SUM(record_value) AS sum_record_val
FROM t1
GROUP BY department, record_type
ORDER BY department, record_type;
SQL>
In this section I'll keep showing the same query, highlighting different parts of the statement to illustrate the point being made. It's not important what the query actually does at this point. Just focus on the syntax. The examples are all based around the APPROX_SUM function, but the points apply equally to the APPROX_COUNT function.
The approximate top-n query processing currently focuses on scenarios where you are ranking groups of data based on a SUM of values or a COUNT of the number of rows in the group, hence the APPROX_SUM and APPROX_COUNT functions. These functions can only be used in combination with the APPROX_RANK function, which must be present in the HAVING clause of the query, but can also appear in the SELECT list, if you want to display the rank of the data.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The same approximate function used in the SELECT list must also appear in the ORDER BY clause of the APPROX_RANK function.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The ORDER BY clause is mandatory and must always include the DESC keyword, so this can't be used to perform bottom-n queries like most other methods.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The PARTITION BY clause must include all or a subset of the columns in the GROUP BY clause. If there is more than one column in the GROUP BY clause the PARTITION BY clause is mandatory. The PARTITION BY clause can be omitted if there is only a single column in the GROUP BY clause, as this will automatically be the partition.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The HAVING clause limits the ranks returned by the query. This is the top-n.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The APPROX_SUM and APPROX_COUNT functions can accept a second parameter of 'MAX_ERROR', which causes them to display the maximum error between the actual and approximate sum or count respectively. I've not seen this return anything but "0" in the data sets I've tried.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 5;
The following sections will provide some examples of the usage.
In this section we will show some examples of using the APPROX_SUM and APPROX_RANK functions together.
The following example displays the top-2 record types, based on the approximate sum of the values per record type across all departments.
SELECT record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The following example displays the top department, based on the approximate sum of the values per department for all record types.
SELECT department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 4;
SQL>
The following example displays the top department for each record type, based on the approximate sum of the values per department for each record type.
SELECT record_type,
department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 5;
The following example displays the top department for each record type, based on the approximate sum of the values per department for each record type.
SELECT record_type,
department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 5;
The following example displays the top-2 record types in each department, based on the approximate number of rows per record type in each department.
The following example displays the top department, based on the approximate number of rows of all record types in the department
The following example displays the top department for each record type, based on the approximate number of rows of that record type in the department
SQL>
◈ Setup
The examples in this article require the following table. The table is populated with random data, so your results may look a little different than those presented here.
DROP TABLE t1 PURGE;
CREATE TABLE t1 AS
SELECT level AS id,
TRUNC(DBMS_RANDOM.value(1, 5))*10 department,
UPPER(TO_CHAR(TO_DATE(TRUNC(DBMS_RANDOM.value(1, 6)),'j'), 'jsp')) AS record_type,
ROUND(DBMS_RANDOM.value(1,51),2) AS record_value
FROM dual
CONNECT BY level <= 100000;
You can get an idea of your data set using the following query.
COLUMN record_type FORMAT A12
SELECT department,
record_type,
COUNT(*) AS record_count,
SUM(record_value) AS sum_record_val
FROM t1
GROUP BY department, record_type
ORDER BY department, record_type;
DEPARTMENT | RECORD_TYPE | RECORD_COUNT | SUM_RECORD_VAL |
10 10 10 10 10 20 20 20 20 20 30 30 30 30 30 40 40 40 40 40 |
FIVE FOUR ONE THREE TWO FIVE FOUR ONE THREE TWO FIVE FOUR ONE THREE TWO FIVE FOUR ONE THREE TWO |
5012 5042 4912 4922 5160 4816 4983 4961 5049 5018 5059 4969 5042 5115 4918 4965 5007 5017 4944 5089 |
130044.4 131431.59 128534.48 126185.08 133586.67 124278.87 130158.06 128156.44 131219.07 129587.6 131168.33 128337.92 131607.88 131455.11 127673.23 128567.84 128968.23 130735.96 130084.71 131310.33 |
SQL>
Introduction
In this section I'll keep showing the same query, highlighting different parts of the statement to illustrate the point being made. It's not important what the query actually does at this point. Just focus on the syntax. The examples are all based around the APPROX_SUM function, but the points apply equally to the APPROX_COUNT function.
The approximate top-n query processing currently focuses on scenarios where you are ranking groups of data based on a SUM of values or a COUNT of the number of rows in the group, hence the APPROX_SUM and APPROX_COUNT functions. These functions can only be used in combination with the APPROX_RANK function, which must be present in the HAVING clause of the query, but can also appear in the SELECT list, if you want to display the rank of the data.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The same approximate function used in the SELECT list must also appear in the ORDER BY clause of the APPROX_RANK function.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The ORDER BY clause is mandatory and must always include the DESC keyword, so this can't be used to perform bottom-n queries like most other methods.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The PARTITION BY clause must include all or a subset of the columns in the GROUP BY clause. If there is more than one column in the GROUP BY clause the PARTITION BY clause is mandatory. The PARTITION BY clause can be omitted if there is only a single column in the GROUP BY clause, as this will automatically be the partition.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The HAVING clause limits the ranks returned by the query. This is the top-n.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
The APPROX_SUM and APPROX_COUNT functions can accept a second parameter of 'MAX_ERROR', which causes them to display the maximum error between the actual and approximate sum or count respectively. I've not seen this return anything but "0" in the data sets I've tried.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 5;
The following sections will provide some examples of the usage.
APPROX_SUM and APPROX_RANK
In this section we will show some examples of using the APPROX_SUM and APPROX_RANK functions together.
The following example displays the top-2 record types, based on the approximate sum of the values per record type across all departments.
SELECT record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;
RECORD_TYPE | APPROX_SUM_VAL | APPROX_SUM_VAL_ME | APPROX_RANK_VAL |
ONE TWO |
519034.76 522157.83 |
0 0 |
2 1 |
The following example displays the top-2 record types in each department, based on the approximate sum of the values per record type for each department.
SELECT department,
record_type,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 5;
DEPARTMENT | RECORD_TYPE | APPROX_SUM_VAL | PPROX_SUM_VAL_ME | APPROX_RANK_VAL |
10 10 20 20 30 30 40 40 |
TWO FOUR THREE FOUR ONE THREE TWO ONE |
133631 131452 131247 130162 131646 131482 131315 130750 |
0 0 0 0 0 0 0 0 |
1 2 1 2 1 2 1 2 |
The following example displays the top department, based on the approximate sum of the values per department for all record types.
SELECT department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 4;
DEPARTMENT | APPROX_SUM_VAL | PPROX_SUM_VAL_ME | APPROX_RANK_VAL |
30 | 650242.47 | 0 | 1 |
SQL>
The following example displays the top department for each record type, based on the approximate sum of the values per department for each record type.
SELECT record_type,
department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 5;
SELECT record_type,
department,
APPROX_SUM(record_value) AS approx_sum_val,
APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 5;
RECORD_TYPE | DEPARTMENT | APPROX_SUM_VAL | APPROX_SUM_VAL_ME | APPROX_RANK_VAL |
FIVE FOUR ONE THREE TWO |
30 10 30 30 10 |
131171 131452 131646 131482 133631 |
0 0 0 0 0 |
1 1 1 1 1 |
APPROX_COUNT and APPROX_RANK
In this section we will show some examples of using the APPROX_COUNT and APPROX_RANK functions together.
The following example displays the top-2 record types, based on the approximate number of rows per record type across all departments.
SELECT record_type,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 2
ORDER BY 1, 4;
RECORD_TYPE | APPROX_SUM_VAL | APPROX_SUM_VAL_ME | APPROX_RANK_VAL |
THREE TWO | 20030 20185 | 0 0 | 2 1 |
The following example displays the top-2 record types in each department, based on the approximate number of rows per record type in each department.
SELECT department,
record_type,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) <= 2
ORDER BY 1, 5;
DEPARTMENT | RECORD_TYPE | APPROX_SUM_VAL | PPROX_SUM_VAL_ME | APPROX_RANK_VAL |
10 10 20 20 30 30 40 40 | TWO FOUR THREE TWO THREE FIVE TWO ONE | 5160 5042 5049 5018 5115 5059 5089 5017 | 0 0 0 0 0 0 0 0 | 1 2 1 2 1 2 1 2 |
The following example displays the top department, based on the approximate number of rows of all record types in the department
SELECT department,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 1
ORDER BY 1, 4;
DEPARTMENT | APPROX_SUM_VAL | PPROX_SUM_VAL_ME | APPROX_RANK_VAL |
30 | 25103 | 0 | 1 |
The following example displays the top department for each record type, based on the approximate number of rows of that record type in the department
SELECT record_type,
department,
APPROX_COUNT(*) AS approx_count_val,
APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) <= 1
ORDER BY 1, 5;
RECORD_TYPE | DEPARTMENT | APPROX_SUM_VAL | APPROX_SUM_VAL_ME | APPROX_RANK_VAL |
FIVE FOUR ONE THREE TWO | 30 10 30 30 10 | 5059 5042 5042 5115 5160 | 0 0 0 0 0 | 1 1 1 1 1 |
SQL>