DBAs often pre-create partitions, anticipating their need in the near
(or sometimes not-so-near) future. Mostly driven out of fear of an
end-user receiving the dreaded "ORA-14400: inserted partition key does
not map to any partition" error, this practice appears proactive and
conscientious. But when a pre-created partition gets suddenly
populated, Oracle can develop seriously flawed execution plans based on
the expectation that the partition is still empty.
Gathering optimizer statistics after large numbers of inserts,
deletes, or updates is always the best practice. Doing so gives the
cost-based optimizer (CBO) the best chance for developing the right
execution plan given the recent DML activity. But many IT shops rely
solely on scheduled jobs to gather optimizer statistics. While
generally adequate, this practice leaves databases vulnerable to
developing inappropriate execution plans after large changes to the data
but before the stats-gathering job runs.
This effect is especially problematic for tables with pre-created
partitions since they're identified as empty in the data
dictionary. Compare this with newly created partitions or tables:
default dynamic sampling kicks in for objects like these and gives the
optimizer a fighting chance to come up with the right plan. But
pre-created partitions already have statistics on them, statistics that
indicate the partition is empty.
This exempts pre-created partitions from the benefits of dynamic
sampling. And the partition's low (that is, non-existent) volume means
that access paths for that partition typically favor nested loop joins
and index-driven access paths. For low-volume segments, this sort of
plan makes sense since the optimizer expects very few records to be
returned. But using this sort of plan for tables with high volumes is
actually very costly. This can occur during those periods after the
empty partition is loaded but before its analyzed.
As an example of how situations like these can go wrong, consider the
plight of a former client. They had a weekly partitioning scheme, with
partitions beginning at 12:00 AM each Friday morning. What's more,
they decided to pre-create all the partitions they'd need for following
year. Their daily stats gathering job kicked off at 11:00 AM but their
highest volume loads occurred between 10:00 PM and 2:00 AM. In fact,
because of the nature of their business, about 60% of the data contained
in each partition was inserted during this window on Friday and
Saturday nights.
Executives logged into the system every Saturday morning to check
reports, only to find their client applications unresponsive. The poor
performance continued until roughly 11:15 AM, when the stats gathering
job completed. What's more, the executives all logged in around 8:00 AM
and ran the same report. Individually, each session performed badly.
But en masse the latch contention and consequent load on the system
skyrocketed, starving the already struggling sessions for their
desperately needed CPUs.
In this case, the optimizer treated the newly populated partition as
if it were empty, which it was when stats were most recently gathered.
Without realizing the new volume in the recently loaded partition, the
CBO ignored the full-scan/hash join option and went down the more costly
index scan/nested loop path. Consider the example below, which
simulates a pre-created partition getting populated and queried before
gathering fresh stats. The example begins by creating a partitioned
data table along with a table contataining reference data.
create.sql
whenever sqlerror exit failure
set echo on
-- clean up from previous runs
DROP TABLE part_data;
DROP TABLE owner_lookup;
-- create reference table
CREATE TABLE owner_lookup AS
SELECT object_id, owner
FROM dba_objects
WHERE 1 = 2;
ALTER TABLE owner_lookup
ADD CONSTRAINT owner_lookup_pk
PRIMARY KEY (object_id);
-- create partitioned data table
CREATE TABLE part_data
PARTITION BY RANGE (part_key) (
PARTITION p0 VALUES LESS THAN (1),
PARTITION p1 VALUES LESS THAN (2),
PARTITION p2 VALUES LESS THAN (3)) AS
SELECT
MOD(ROWNUM, 1) part_key, t.*
FROM dba_objects t
WHERE 1 = 2;
CREATE INDEX part_data_idx01
ON part_data(object_id);
ALTER TABLE part_data
ADD CONSTRAINT part_data_fk01
FOREIGN KEY (object_id) REFERENCES owner_lookup(object_id);
BEGIN
-- populate reference table
INSERT INTO owner_lookup
SELECT object_id, owner
FROM dba_objects
WHERE object_id IS NOT NULL;
-- populate p1 and p2, but leave p3 empty
-- p3 serves as the pre-created partition
FOR i IN 1 .. 10 LOOP
INSERT INTO part_data
SELECT
MOD(ROWNUM, 2) part_key, t.*
FROM dba_objects t
WHERE object_id IS NOT NULL;
COMMIT;
END LOOP;
END;
/
-- gather stats
BEGIN
dbms_stats.gather_table_stats(
user,
'part_data',
cascade => TRUE,
estimate_percent => 100,
no_invalidate => FALSE,
method_opt => 'FOR ALL COLUMNS SIZE 1');
dbms_stats.gather_table_stats(
user,
'owner_lookup',
cascade => TRUE,
estimate_percent => 100,
no_invalidate => FALSE,
method_opt => 'FOR ALL COLUMNS SIZE 1');
END;
/
set echo off
Note that the script gathers stats for both tables at the end, so the
optimizer "knows" p2 is empty. When we query against p1 and p2 by
filtering for their partition keys we see very different execution
plans. The query script below accepts the partition key as its first
argument:
query.sql
set lines 128 pages 128
set autotrace on
SELECT COUNT(*)
FROM part_data p, owner_lookup o
WHERE p.object_id = o.object_id
AND p.part_key = &1
AND o.owner = 'SYSTEM';
set autotrace off
With autotrace enabled, we can see the different execution plans and
execution statistics from querying p1 and p2:
Querying p1:
SQL> @query 1
SQL>
SQL> SELECT COUNT(*)
2 FROM part_data p, owner_lookup o
3 WHERE p.object_id = o.object_id
4 AND p.part_key = &1
5 AND o.owner = 'SYSTEM';
old 4: AND p.part_key = &1
new 4: AND p.part_key = 1
COUNT(*)
----------
2260
Execution Plan
----------------------------------------------------------
Plan hash value: 554393664
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 148 (3)| 00:00:02 | | |
| 1 | SORT AGGREGATE | | 1 | 15 | | | | |
|* 2 | HASH JOIN | | 13451 | 197K| 148 (3)| 00:00:02 | | |
|* 3 | TABLE ACCESS FULL | OWNER_LOOKUP | 1345 | 10760 | 6 (0)| 00:00:01 | | |
| 4 | PARTITION RANGE SINGLE| | 47080 | 321K| 141 (3)| 00:00:02 | 2 | 2 |
|* 5 | TABLE ACCESS FULL | PART_DATA | 47080 | 321K| 141 (3)| 00:00:02 | 2 | 2 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("P"."OBJECT_ID"="O"."OBJECT_ID")
3 - filter("O"."OWNER"='SYSTEM')
5 - filter("P"."PART_KEY"=1)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
649 consistent gets
0 physical reads
0 redo size
412 bytes sent via SQL*Net to client
384 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
Querying p2:
SQL> @query 2
SQL> set lines 128 pages 128
SQL> set autotrace on
SQL> set echo on
SQL>
SQL> SELECT COUNT(*)
2 FROM part_data p, owner_lookup o
3 WHERE p.object_id = o.object_id
4 AND p.part_key = &1
5 AND o.owner = 'SYSTEM';
old 4: AND p.part_key = &1
new 4: AND p.part_key = 2
COUNT(*)
----------
0
Execution Plan
----------------------------------------------------------
Plan hash value: 1085628044
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 2 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 34 | | | | |
| 2 | NESTED LOOPS | | 1 | 34 | 2 (0)| 00:00:01 | | |
| 3 | PARTITION RANGE SINGLE | | 1 | 26 | 2 (0)| 00:00:01 | 3 | 3 |
|* 4 | TABLE ACCESS FULL | PART_DATA | 1 | 26 | 2 (0)| 00:00:01 | 3 | 3 |
|* 5 | TABLE ACCESS BY INDEX ROWID| OWNER_LOOKUP | 1 | 8 | 0 (0)| 00:00:01 | | |
|* 6 | INDEX UNIQUE SCAN | OWNER_LOOKUP_PK | 1 | | 0 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("P"."PART_KEY"=2)
5 - filter("O"."OWNER"='SYSTEM')
6 - access("P"."OBJECT_ID"="O"."OBJECT_ID")
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
410 bytes sent via SQL*Net to client
384 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
As shown, Oracle took the nested loop approach for the empty
partition but decided to full scan and hash join the populated one. The
populated partition required 649 consistent gets. But when we populate
p2 with the same data in p1 using the insert.sql script below,
look what happens to the plan and consistent gets when re-querying p2.
insert.sql
INSERT INTO part_data
SELECT
2, /* remaps data to p2 */
owner,
object_name,
subobject_name,
object_id,
data_object_id,
object_type,
created,
last_ddl_time,
timestamp,
status,
temporary,
generated,
secondary
FROM part_data
WHERE part_key = 1 /* limits data to contents of p1 */
/
COMMIT;
Re-querying p2 with stale statistics:
SQL> @insert
47080 rows created.
Commit complete.
SQL> @query 2
old 4: AND p.part_key = &1
new 4: AND p.part_key = 2
COUNT(*)
----------
2260
Execution Plan
----------------------------------------------------------
Plan hash value: 1085628044
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 34 | 2 (0)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 34 | | | | |
| 2 | NESTED LOOPS | | 1 | 34 | 2 (0)| 00:00:01 | | |
| 3 | PARTITION RANGE SINGLE | | 1 | 26 | 2 (0)| 00:00:01 | 3 | 3 |
|* 4 | TABLE ACCESS FULL | PART_DATA | 1 | 26 | 2 (0)| 00:00:01 | 3 | 3 |
|* 5 | TABLE ACCESS BY INDEX ROWID| OWNER_LOOKUP | 1 | 8 | 0 (0)| 00:00:01 | | |
|* 6 | INDEX UNIQUE SCAN | OWNER_LOOKUP_PK | 1 | | 0 (0)| 00:00:01 | | |
-----------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("P"."PART_KEY"=2)
5 - filter("O"."OWNER"='SYSTEM')
6 - access("P"."OBJECT_ID"="O"."OBJECT_ID")
Statistics
----------------------------------------------------------
2 recursive calls
1 db block gets
95338 consistent gets
0 physical reads
39056 redo size
412 bytes sent via SQL*Net to client
384 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
The nested loop plan gets reused but generates 144 times more
consistent gets (95338 vs. 649) than the full scan/hash join plan! No
wonder the executives were mad! On top of the excessive CPU required to
access the buffer cache, the users also queued up behind each other
while waiting for the cache buffer chain latch, the latch that allows a
user session to examine a data block in memory. All this led to
absurdly high load averages on the machine and abysmal response times.
Remember that that the stats-gathering job ran after users were already
querying the data? Rather than taking my recommendation to gather stats
immediately after the load, the client elected to simply move stats job
from 11 AM to 2 AM. This narrowed the window but did not close it.
Nevertheless, the reports ran well when they needed to because the new
stats let Oracle make the right decision.
All this points to the vulnerabilty that precreated partitions
present. The risk of bad execution plans always exists when optimizer
statistics fail to represent the data that tables contain. This
condition prevents the CBO from making the right choice. But queries
against pre-created partitions are especially sensitive to this effect
since their stats represent them as empty. Once loaded, the partition
statistics are wildly wrong (as opposed to a just little off) and the
example above demonstrates the potential consequences. Large DML
changes demand new optimizer statistics in all cases, but especially for
pre-created partitions.
David Lipowitz
Principal Consultant
Tuning Knife Consulting
December 23, 2008