Automatic Degree of Parallism (Auto DOP)

Parallel Query

Parallel query works the same on Exadata as it does on non-Exadata 11gR2.  With Exadata, which you’ve cost-justified based on your database’s performance requirements, business-criticality, or some other reason, you want to get the most performance out of your investment, so exploiting the benefits of parallel query is important.

Depending on which size rack you’ve got, the Exadata compute grid has 24 cores, 48 cores, or 96 CPU cores, and your storage grid has even more.  When SQL statements are executed that qualify to be parallelized, Oracle will establish parallel query slaves based on the degree of parallelism capable, specified, or calculated. These parallel query slaves will work to “divide and conquer” the work (IO), typically returning results faster.

Some of the historical challenges with parallel query have been:

  • How many parallel query slaves does my query need?
  • How many concurrent parallel operations will saturate my system?
  • How can I ensure that parallelism will scale up to the point of saturation but not exceed it?
  • Can I ensure that my query will only run if it can establish a minimum amount of parallel query slaves?

Prior to 11gR2, finding this balance has been difficult.  With 11gR2, Oracle has introduced “automatic degree of parallelism”, or Auto DOP.  Auto DOP automatically calculates the degree of parallelism on a per SQL statement basis, and is the focus of this post.

Parallel Query Initialization Parameters

Parameter Old/New in 11gR2 Default Description
parallel_adaptive_multiuser Old True Method for reducing PQ slaves under system load – PQ “governor”
parallel_automatic_tuning Old False Used to enable an automatic DOP calculation based on objects defined with parallel parameters
parallel_degree_limit New CPU Sets upper limit on the DOP for a single statement – “CPU” means that this limit will be set based on system CPU information
parallel_degree_policy New MANUAL Controls several parallel features including Auto DOP, in-memory parallelism, parallel statement queuing, etc.
parallel_execution_message_size Old 16384 Size of parallel message buffers, in bytes
parallel_force_local New FALSE Determines whether to force PQ slaves on local connected instance or span RAC services
parallel_instance_group Old Used to restrict PQ process to specific instance or group of instances in RAC cluster
parallel_max_servers Old Maximum number of PQ slaves that can be created on an instance. Default is calculated as CPU_COUNT * PARALLEL_THREADS_PER_CPU
parallel_min_percent Old 0 This is a pre-11gR2 low-water mark that specifies the minimum number of PQ slaves required to run the query
parallel_min_servers Old 0 Minimum number of PQ slaves that should be running, system-wide
parallel_min_time_threshold

 

 

New AUTO Minimum estimated serial execution time that will trigger Auto DOP – Auto translates to 10 seconds.  When parallel_degree_policy=AUTO or limited, any statement estimated to take longer than parallel_min_time_threshold will be considered a candidate for Auto AOP
parallel_servers_target New Upper limit on the number of PQ slaves that may be in use in an instance
parallel_threads_per_cpu Old 2 Used to represent the number of concurrent processes a CPU can support

Auto DOP

Again, Oracle introduced Auto DOP in 11gR2 to make the determination of a query’s DOP “automatic”, instead of based on trial-and-error and often attempted without an understanding of the underlying hardware configuration.  Many times, I’ve seen bits of custom and Oracle packaged software code with hints like the below:

SELECT /*+ parallel (x,4) */ …

SELECT /*+ parallel (x,20) */ …

In these, the degree of parallelism seems to have been sized more out of guesswork than anything else.  And sometimes, I’m just as guilty as developers sometimes and often put in a PQ hint in based on a gut feel for how much horsepower I think it needs.  Auto DOP takes care of this guesswork and automatically determines your DOP.

You can enable Auto DOP by:

  • Setting parallel_degree_policy to AUTO or LIMITED.
  • Setting or adjusting parallel_min_time_threshold.
  • Set parallel_adaptive_multiuser = FALSE .  Note that this isn’t required to enabled Auto DOP, but setting to false will bypass potential issues.

Auto DOP In Action

Test without Automatic Parallelism

First let’s look at a full-scan of a large table without automatic parallelism.  Let’s check our environment:

As we can see our large table has a parallel degree of 1 and parallel_degree_policy is set to MANUAL.  Now I’m going to run a full-scan against DWB_RTL_SLS_RETURN_LINE_ITEM and also turn on SQL Compiler tracing so we can see what CBO does.  From another session, I’ll run a script pq_chk.sql, which is provided in Appendix A:

As you can see, the query took over 4 minutes and based on the execution plan, was not parallelized.

When the script was running, we checked our parallel execution and it showed this – no PQ operations for user D3:

Also, if we check our SQL Compiler trace file, we can see the following sections relating to parallelism, which show us that automatic parallelism is disabled.

 

We can also run a SQL Monitoring report for the given SQL_ID by pulling the SQL_ID out of the SQL Compiler trace file, and it looks like this:

 

Test with Automatic Parallelism

Now let’s turn on automatic parallelism, bounce our database, and re-run the test:

Now when we run our next test, we see the following:

As we can see, the query was not parallelized.

While running, if we check our parallel query statistics, we see that parallelism is not enabled:

Let’s check our SQL compiler trace:

 

 

From the above, we can see that automatic DOP was disabled because no parallel objects were referenced.  We know that we have a degree set to 1 for our large table because we did an “alter table <table> noparallel” to it.  This shouldn’t be a surprise; according to Oracle documentation, with parallel_degree_policy=LIMITED, only tables whose parallelism are set to the default (96 in this case) will be eligible for automatic parallelism.  So let’s test this:

 

And now let’s re-run our test:

 

Let’s check out parallel statistics during the run – as you can see (output truncated), the query was parallelized:

 

 

If we look at our SQL Compiler trace, we can learn the following:

 

 

An interesting thing to note here is this – according to the trace file, Auto DOP was disabled with a message “limited mode but no parallel objects referenced”.

Finally, let’s check our SQL Monitoring report, and as we can see, the query was parallelized.

 

 

 

The most recent test case was relatively inconclusive in terms of Auto DOP based on our trace file.  So let’s set our parallelism to DEFAULT for our table and re-run the test:

 

 

This test looks like this – note the “automatic DOP” section at the bottom.

 

 

While running, we see the following parallel query behavior (output truncated):

 

 

Our SQL Compiler trace now shows this:

 

 

So now we can see that Auto DOP is in effect for our query.

And our SQL Monitoring report looks like this:

 

 

As you can see from the bottom screen print above, we’ve effectively parallelized our operations against both instances in the RAC cluster.

Now that we’ve established how to enabled Auto DOP, let’s run a few more test cases.

Test Auto DOP with Shorter-Running Query

In this test, we’ll run a full-scan against a smaller table and see what we can learn.  We’ll use a table called SOE.ORDERS and make all indexes on it invisible so we can force a full-scan.  Knowing what we know from the p

 

revious tests with respect to setting the parallelism to the default for the table, we’ll do this:

 

 

 

Now let’s run a full-scan against it with parallel_degree_policy=LIMITED:

 

As we can see, automatic DOP was disabled because of the “parallel threshold”.  This, we know from the documentation, is set to 10 seconds and is established by the parallel_min_time_threshold initialization parameter.

Let’s check out SQL Compiler trace file to be sure:

 

 

This confirms our theory.

But let’s try to change parallel_min_time_threshold and see if we can get Auto DOP to kick in:

 

Now we’ll run our test again:

 

As we can see, Auto DOP set the computed degree of parallelism to 1.  Let’s look inside our SQL Compiler trace:

 

As we can see, Auto DOP calculated a degree of parallelism to 1 and it did execute in parallel, but it was similar to running in serial because the calculated scan cost was low.  What this tells us is that automatic DOP is doing its job – it’s not over-allocating PQ resources if it doesn’t need to.  Before continuing on to the next sections, we’ll set parallel_min_time_threshold back to DEFAULT.

Note that we could get different behavior (i.e., different calculations of maximum parallel query slaves spawned) by adjusting parallel_degree_limit and parallel_threads_per_cpu, but we’ll refrain from testing this and go with the defaults seeded by Exadata, as they’re field-tested to give the best results.  If you want to cap parallel query usage in environments where you’ve got databases consolidated or different resource needs for different workloads, use Database Resource Manager (and refer to the section in this document).

Parallel Statement Queuing

Parallel statement queuing and in-memory parallel execution are enabled by setting parallel_degree_policy=AUTO.  You can also set “_parallel_statement_queueing”=TRUE, but you should allow the parallel_degree_policy to control this behavior.

The parallel statement queuing is to establish queues for PQ operations when multiple parallel operations spawn PQ slaves up to parallel_servers_target.  In other words, it’s a way to control “runaway” PQ executions.

Let’s first start by showing what our value for parallel_servers_target looks like:

 

As you can see, it’s set to 384, which is a very high number.  For sake of showing this in our test cases, let’s set it to 4:

 

We’ll also set parallel_degree_limit=8 for sake of testing:

 

 

Before running tests, let’s show a couple of scripts:

 

 

Now let’s set parallel_degree_policy=AUTO:

 

 

Now we’ll launch 6 executions of a long-running query against our large table and see what our queuing looks like.  When it was running, we queried queuing information and saw this:

 

 

We can see below that the run-time of the full table scan went from about 30 seconds to nearly a minute on some executions:

 

 

And some took longer:

 

 

In Memory Parallel Execution

In-memory parallel execution and parallel statement queuing are enabled by setting parallel_degree_policy=AUTO. You can also set “_parallel_cluster_cache_policy”=CACHED, but you should allow the parallel_degree_policy to control this behavior.

The goal of in-memory parallel execution is to avoid disk IO.  Historically, parallel query behavior performed direct reads and bypassed the buffer cache, the intention being to avoid buffer cache flooding and shared resource contention.

The downside to in-memory parallel execution on Exadata is that it bypassed Smart Scan and cell offload functions.  In order to read into the buffer cache, serial direct reads must be disabled, and when this is true, Smart Scan is not in play.

It’s kind of difficult to get in-memory parallel execution to take place on Exadata.  The following needs to be considered:

  • The table needs to be small enough to fit in a local buffer cache or the aggregate buffer caches across both instances in our quarter rack.
  • The query needs to run in a long enough time to be over the parallel_min_time_threshold setting.

For our test, we disabled indexes on SOE.ORDER_ITEMS, set the parallel degree DEFAULT, and ran a select against it:

 

As you can see, it was NOT offload eligible, too 1 minute and 32 seconds, and executed with 2 PQ slaves.  Let’s run the same query again and see if we can benefit from in-memory parallel execution:

 

 

As we can see, the query time went from 1:30 to a little over 2 seconds, a pretty significant reduction, and this was due to buffer cache access.

Let’s set parallel_degree_limit to LIMITED and re-test, twice:

 

 

As we can see, with parallel_degree_limit=LIMITED, in-memory parallel execution is disabled and Smart Scan operations can continue.

This is something to think about with Exadata.  If we enabled in-memory parallel execution, in some situations, we may be eliminating the potential for Smart Scan.

Controlling PQ Operations using Database Resource Manager

If you want granular control of your parallel query behavior, DBRM is a sure-fire way to go.  It’s difficult to setup and needs careful planning.  Below are some of the things you need to think about and set in your resource consumer groups:

Consumer Group Attribute What it Does
PARALLEL_QUEUE_TIMEOUT Specifies maximum queue time for a consumer group for PQ slaves
PARALLEL_DEGREE_LIMIT_P1 Sets maximum parallel degree limit for queries belonging to sessions mapped to a consumer group
MGMT_P1, MGMT_P2, MGMT_P3, … Specifies attributes to modify the normal FIFO processing and allows for dequeue prioritization.
PARALLEL_TARGET_PERCENTAGE Used to limit a consumer group to a percentage of parallel servers
BEGIN_SQL_BLOCK and END_SQL_BLOCK Provides a way to queue individual SQL statements as if they were processed simultaneously

A detailed example is beyond the scope of this paper.

IO Calibration

In order for automatic DOP to work, in 11.2.0.2 Oracle implemented a requirement that the IO system be calibrated. This is done using the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure.  Below is a message you’ll get when IO is not calibrated and you run an explain plan or AutoTrace or look in a SQL Compiler trace:

automatic DOP: skipped because of IO calibrate statistics are missing

So the question is – do you calibrate IO or not?  Exadata is pre-calibrated, so the answer is no.  You follow MOS note 1269321.1 and do this:

delete from resource_io_calibrate$;

insert into resource_io_calibrate$

values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0);

commit;

Summary

  • Auto DOP is a good thing and generally does what you need it to.
  • Set the parallel degree on your tables to DEFAULT to take advantage of it.
  • Parallel statement queuing works as advertised, but to maximize resources and minimize impact, impose ceilings on parallel servers via either DBRM or adjust initialization parameters.
  • In memory parallel execution also works as advertised, but note that it will disable Smart Scan for qualifying queries.  This could be good, could be bad – it all depends on the workload.
  • Recommendation:
    • Set parallel_degree_policy=LIMITED to enabled automatic DOP.
    • Control PQ queuing by implemented DBRM.
    • Think carefully about using in-memory parallel execution – test, and if you want queuing but no in-memory parallel execution, set parallel_degree_policy=LIMITED and “_parallel_statement_queueing”=TRUE.
  • Follow MOS note 1269321.1 to fix IO calibration issue.

Appendix A: Scripts

Script: pq_chk.sql

The purpose of this script is to check parallel query statistics for the system:

Script: queued.sql

Purpose: Show parallel query queuing and servers busy information