How ORACLE CBO Calculate Join Cardinality with Join Elimination Filter?This is a featured page

If we have staement:
select
count(*)
from
t1, t2 -- < join tables.
where
t1.join1 = t2.join1 --< join condition.
and
t1.v1 = 1; --< join filter.
We may expect execution plan like:
Execution Plan
----------------------------------------------------------
Plan hash value: 906334482
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 17 | 17 (6)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 17 | | |
|* 2 | HASH JOIN | | 25 | 425 | 17 (6)| 00:00:01 |<- looking for rows calculations.
|* 3 | TABLE ACCESS FULL| T1 | 1 | 14 | 8 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T2 | 1000 | 3000 | 8 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter(TO_NUMBER("T1"."V1")=1)
And you we trying to understand how oracle calculate that hash join cardinality.
When We read on metalink We will find:
CBO Theoretical Basic Formulas:
the join selectivity for simple two tables join couple with additional filter predicates on both tables looks like this:

Sel = 1/max[NDV(t1.c1),NDV(t2.c2)] *
( (Card t1 - # t1.c1 NULLs) / Card t1) *
( (Card t2 - # t2.c2 NULLs) / Card t2)

for the below test case it looks like
Sel = 1/max[NDV(t1.join1),NDV(t2.join1)] *
( (Card t1 - # t1..join1 NULLs) / Card t1) *
( (Card t2 - # t2..join1 NULLs) / Card t2)

According to absence any null values in join1 on both tables I can rewrite this formula in following way:

Expected Sel = 1/max[NDV(t1.join1),NDV(t2.join1)] (1)
Expected Card = number of all processed rows (Cartesian product) * expected Selectivity
Test 1:
conn scott/password
create table t2 as select trunc(dbms_random.value(0,5000)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
create table t1 as select trunc(dbms_random.value(0,4)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
col TABLE_NAME format a10
col COLUMN_NAME format a10
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','V1');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
4 .25 JOIN1 T1
40 .025 V1 T1
4334 .000230734 JOIN1 T2
40 .025 V1 T2
set autotrace on
select count(*) from t1, t2 where t1.join1 = t2.join1 and t1.v1 = 1;
Execution Plan
----------------------------------------------------------
Plan hash value: 906334482
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 10 | 97 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 10 | | |
|* 2 | HASH JOIN | | 577 | 5770 | 97 (2)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T1 | 250 | 1500 | 48 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T2 | 10000 | 40000 | 48 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter("T1"."V1"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
322 consistent gets
0 physical reads
0 redo size
523 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Manual Calculations
Number of total rows under process = 10000*250
Selectivity = 1/max[4334,4] = 1/4334
-- cardinality = 250*10000/4334 = 576.834333 ~ 577
Test2:
drop table t1;
drop table t2;
create table t1 as select trunc(dbms_random.value(0, 100 )) filter,trunc(dbms_random.value(0,30 )) join1,lpad(rownum,10) v1, rpad('x',100) padding from all_objects where rownum <= 1000;
create table t2 as select trunc(dbms_random.value(0, 100 )) filter,trunc(dbms_random.value(0, 40 )) join1,lpad(rownum,10) v1, rpad('x',100) padding from all_objects where rownum <= 1000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
col TABLE_NAME format 10
col COLUMN_NAME format a10
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','FILTER');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
100 .01 FILTER T1
30 .033333333 JOIN1 T1
100 .01 FILTER T2
40 .025 JOIN1 T2
set autotrace on
select t1.v1, t2.v1 from t1,t2 where t2.join1 = t1.join1 and t2.filter = 1;
165 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 333 | 10323 | 17 (6)| 00:00:01 |
|* 1 | HASH JOIN | | 333| 10323 | 17 (6)| 00:00:01 |
|* 2 | TABLE ACCESS FULL| T2 | 10| 170 | 8 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T1 | 1000| 14000 | 8 (0)| 00:00:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T2"."JOIN1"="T1"."JOIN1")
2 - filter("T2"."FILTER"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
52 consistent gets
0 physical reads
0 redo size
6323 bytes sent via SQL*Net to client
633 bytes received via SQL*Net from client
12 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
165 rows processed
Manual Calculations
Number of total rows under process = 10*1000
Selectivity = 1/max[40 with filter=1,30] = 1/30
-- Cardinality is 10*1000/30 = 333.333333333 ~ 333


Note:
ORACLE chose the selectivity as 1/NDV of the table that not filtered, in our case T1(NVD of join column)=30, Oracle expects max NDV in the real result set with this table, since the filter should eliminate/reduce the NDV of that filtered rows, which should make sense, the testcase had choose single value in predicate "t2.filter = 1"


Test3:
drop table t1;
drop table t2;
create table t2 as select trunc(dbms_random.value(0,50)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
create table t1 as select trunc(dbms_random.value(0,40)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','V1');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
40 .025 JOIN1 T1
40 .025 V1 T1
50 .02 JOIN1 T2
40 .025 V1 T2
set autot on
select count(*) from t1, t2 where t1.join1 = t2.join1 and t1.v1 = 1;
COUNT(*)
----------
50233
Execution Plan
----------------------------------------------------------
Plan hash value: 906334482
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 97 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
|* 2 | HASH JOIN | | 50000 | 439K| 97 (2)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T1 | 250 | 1500 | 48 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T2 | 10000 | 30000 | 48 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter("T1"."V1"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
322 consistent gets
0 physical reads
0 redo size
524 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Manual Calculations
Number of total rows under process = 250*10000
Selectivity = 1/NDV of join column of table has no filter (T2)= 1/50
-- So Cardinality is 250*10000/50 = 50000
Test4:
drop table t1;
drop table t2;
create table t1 as select trunc(dbms_random.value(0,4)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
create table t2 as select trunc(dbms_random.value(0,5000)) join1, trunc(dbms_random.value(0, 40 )) v1, rpad('x',100) padding from all_objects where rownum <= 10000;
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T2',method_opt=>'for all columns size 1');
execute dbms_stats.gather_table_stats(ownname=>'SCOTT',tabname=>'T1',method_opt=>'for all columns size 1');
select num_distinct, density, column_name, table_name from user_tab_col_statistics where table_name in ('T1','T2') and column_name in ('JOIN1','V1');
NUM_DISTINCT DENSITY COLUMN_NAM TABLE_NAME
------------ ---------- ---------- ----------
4 .25 JOIN1 T1
40 .025 V1 T1
4325 .000231214 JOIN1 T2
40 .025 V1 T2
select count(*) from t1, t2 where t1.join1 = t2.join1 and t2.v1 = 1;
COUNT(*)
----------
0
Execution Plan
----------------------------------------------------------
Plan hash value: 4274056747
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20 | 97 (2)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | 20 | | |
|* 2 | HASH JOIN | | 9596 | 187K| 97 (2)| 00:00:02 |
|* 3 | TABLE ACCESS FULL| T2 | 250 | 1750 | 48 (0)| 00:00:01 |
| 4 | TABLE ACCESS FULL| T1 | 9443 | 119K| 48 (0)| 00:00:01 |
----------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."JOIN1"="T2"."JOIN1")
3 - filter("T2"."V1"=1)
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
322 consistent gets
0 physical reads
0 redo size
521 bytes sent via SQL*Net to client
523 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Now in this case there is 9443 row of very low NDV and 250 row of high NDV that indicates to hi selectivity opportunity via that 250 row than that 9443 row, so that the CBO will tends to calculate the expected cardinality as following:

New_num_distinct= nd*(1-((nr-s)/nr)^(nr/nd))
where:
nd: NDV for join column
nr: table cardinality
s: number of filtered (selected) rows

Manual Calculations
New (expected) NDV for card 250 record set of original NDV 4325
New_num_distinct=4325*(1-((9443-250)/9443)^(9443/4325)) = 246.0925 ~ 246 > (4 NDV of T1)

so the expected cardinality should be 9443*250/246 = 9596.5447~ 9596


Conclusion
----------------
ORACLE calculate the cardinality as following:
- Cardinality = total processed rows * selectivity
- Selectivity = 1/max[NDV(T1.JOIN),NDV(T2.JOIN)]
- oracle consider the max ever NDV in the result set, considering the join filter as NDV eliminator
- oracle always apply join filter before join itself to reduce the I/O over heads
- "total processed rows" almost calculated as cartesian product of rows after filtering.


surachart
surachart
Latest page update: made by surachart , Oct 8 2008, 12:10 PM EDT (about this update About This Update surachart Edited by surachart

1687 words added

view changes

- complete history)
Keyword tags: cbo
More Info: links to this page
There are no threads for this page.  Be the first to start a new thread.

Related Content

  (what's this?Related ContentThanks to keyword tags, links to related pages and threads are added to the bottom of your pages. Up to 15 links are shown, determined by matching tags and by how recently the content was updated; keeping the most current at the top. Share your feedback on Wetpaint Central.)