t1, t2 -- < join tables.
t1.join1 = t2.join1 --< join condition.
t1.v1 = 1; --< join filter.
And you we trying to understand how oracle calculate that hash join cardinality.
conn scott/password
create table t2 as select trunc(dbms_random.value(0,5000)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
create table t1 as select trunc(dbms_random.value(0,4)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
col TABLE_NAME format a10
col COLUMN_NAME format a10
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','V1');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
4 .25 JOIN1 T1
40 .025 V1 T1
4334 .000230734 JOIN1 T2
40 .025 V1 T2
set autotrace on
select count(*) from t1, t2 where t1.join1 = t2.join1 and t1.v1 = 1;
Execution Plan
----------------------------------------------------------
Plan hash value: 906334482
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 97 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 577 | 5770 | 97 (2)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T1 | 250 | 1500 | 48 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T2 | 10000 | 40000 | 48 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter("T1"."V1"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
322 consistent gets
0 physical reads
0 redo size
523 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Manual Calculations
Number of total rows under process = 10000*250
Selectivity = 1/max[4334,4] = 1/4334
-- cardinality = 250*10000/4334 = 576.834333 ~ 577
Test2:
drop table t1;
drop table t2;
create table t1 as select trunc(dbms_random.value(0, 100 )) filter,trunc(dbms_random.value(0,30 )) join1,lpad(rownum,10) v1, rpad('x',100) padding from all_objects where rownum <= 1000;
create table t2 as select trunc(dbms_random.value(0, 100 )) filter,trunc(dbms_random.value(0, 40 )) join1,lpad(rownum,10) v1, rpad('x',100) padding from all_objects where rownum <= 1000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
col TABLE_NAME format 10
col COLUMN_NAME format a10
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','FILTER');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
100 .01 FILTER T1
30 .033333333 JOIN1 T1
100 .01 FILTER T2
40 .025 JOIN1 T2
set autotrace on
select t1.v1, t2.v1 from t1,t2 where t2.join1 = t1.join1 and t2.filter = 1;
165 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 333 | 10323 | 17 (6)| 00:00:01 |
|* 1 | HASH JOIN | | 333| 10323 | 17 (6)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 10| 170 | 8 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T1 | 1000| 14000 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JOIN1"="T1"."JOIN1")
2 - filter("T2"."FILTER"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
52 consistent gets
0 physical reads
0 redo size
6323 bytes sent via SQL*Net to client
633 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
165 rows processed
Manual Calculations
Number of total rows under process = 10*1000
Selectivity = 1/max[40 with filter=1,30] = 1/30
-- Cardinality is 10*1000/30 = 333.333333333 ~ 333
Note:
ORACLE chose the selectivity as 1/NDV of the table that not filtered, in our case T1(NVD of join column)=30, Oracle expects max NDV in the real result set with this table, since the filter should eliminate/reduce the NDV of that filtered rows, which should make sense, the testcase had choose single value in predicate "t2.filter = 1"
Test3:
drop table t1;
drop table t2;
create table t2 as select trunc(dbms_random.value(0,50)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
create table t1 as select trunc(dbms_random.value(0,40)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','V1');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
40 .025 JOIN1 T1
40 .025 V1 T1
50 .02 JOIN1 T2
40 .025 V1 T2
set autot on
select count(*) from t1, t2 where t1.join1 = t2.join1 and t1.v1 = 1;
COUNT(*)
----------
50233
Execution Plan
----------------------------------------------------------
Plan hash value: 906334482
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 97 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
|* 2 | HASH JOIN | | 50000 | 439K| 97 (2)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T1 | 250 | 1500 | 48 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T2 | 10000 | 30000 | 48 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter("T1"."V1"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
322 consistent gets
0 physical reads
0 redo size
524 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Manual Calculations
Number of total rows under process = 250*10000
Selectivity = 1/NDV of join column of table has no filter (T2)= 1/50
-- So Cardinality is 250*10000/50 = 50000
Test4:
drop table t1;
drop table t2;
create table t1 as select trunc(dbms_random.value(0,4)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
create table t2 as select trunc(dbms_random.value(0,5000)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','V1');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
4 .25 JOIN1 T1
40 .025 V1 T1
4325 .000231214 JOIN1 T2
40 .025 V1 T2
select count(*) from t1, t2 where t1.join1 = t2.join1 and t2.v1 = 1;
COUNT(*)
----------
0
Execution Plan
----------------------------------------------------------
Plan hash value: 4274056747
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 97 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
|* 2 | HASH JOIN | | 9596 | 187K| 97 (2)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T2 | 250 | 1750 | 48 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 9443 | 119K| 48 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter("T2"."V1"=1)
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
322 consistent gets
0 physical reads
0 redo size
521 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Now in this case there is 9443 row of very low NDV and 250 row of high NDV that indicates to hi selectivity opportunity via that 250 row than that 9443 row, so that the CBO will tends to calculate the expected cardinality as following:
New_num_distinct= nd*(1-((nr-s)/nr)^(nr/nd))
where:
nd: NDV for join column
nr: table cardinality
s: number of filtered (selected) rows
Manual Calculations
New (expected) NDV for card 250 record set of original NDV 4325
New_num_distinct=4325*(1-((9443-250)/9443)^(9443/4325)) = 246.0925 ~ 246 > (4 NDV of T1)
so the expected cardinality should be 9443*250/246 = 9596.5447~ 9596
Conclusion
----------------
ORACLE calculate the cardinality as following:
- Cardinality = total processed rows * selectivity
- Selectivity = 1/max[NDV(T1.JOIN),NDV(T2.JOIN)]
- oracle consider the max ever NDV in the result set, considering the join filter as NDV eliminator
- oracle always apply join filter before join itself to reduce the I/O over heads
- "total processed rows" almost calculated as cartesian product of rows after filtering.