- About Us
- Events and Webinars
- Contact Us
Parallel query works the same on Exadata as it does on non-Exadata 11gR2. With Exadata, which you’ve cost-justified based on your database’s performance requirements, business-criticality, or some other reason, you want to get the most performance out of your investment, so exploiting the benefits of parallel query is important.
Depending on which size rack you’ve got, the Exadata compute grid has 24 cores, 48 cores, or 96 CPU cores, and your storage grid has even more. When SQL statements are executed that qualify to be parallelized, Oracle will establish parallel query slaves based on the degree of parallelism capable, specified, or calculated. These parallel query slaves will work to “divide and conquer” the work (IO), typically returning results faster.
Some of the historical challenges with parallel query have been:
Prior to 11gR2, finding this balance has been difficult. With 11gR2, Oracle has introduced “automatic degree of parallelism”, or Auto DOP. Auto DOP automatically calculates the degree of parallelism on a per SQL statement basis, and is the focus of this post.
Parallel Query Initialization Parameters
|Parameter||Old/New in 11gR2||Default||Description|
|parallel_adaptive_multiuser||Old||True||Method for reducing PQ slaves under system load – PQ “governor”|
|parallel_automatic_tuning||Old||False||Used to enable an automatic DOP calculation based on objects defined with parallel parameters|
|parallel_degree_limit||New||CPU||Sets upper limit on the DOP for a single statement – “CPU” means that this limit will be set based on system CPU information|
|parallel_degree_policy||New||MANUAL||Controls several parallel features including Auto DOP, in-memory parallelism, parallel statement queuing, etc.|
|parallel_execution_message_size||Old||16384||Size of parallel message buffers, in bytes|
|parallel_force_local||New||FALSE||Determines whether to force PQ slaves on local connected instance or span RAC services|
|parallel_instance_group||Old||Used to restrict PQ process to specific instance or group of instances in RAC cluster|
|parallel_max_servers||Old||Maximum number of PQ slaves that can be created on an instance. Default is calculated as CPU_COUNT * PARALLEL_THREADS_PER_CPU|
|parallel_min_percent||Old||0||This is a pre-11gR2 low-water mark that specifies the minimum number of PQ slaves required to run the query|
|parallel_min_servers||Old||0||Minimum number of PQ slaves that should be running, system-wide|
|New||AUTO||Minimum estimated serial execution time that will trigger Auto DOP – Auto translates to 10 seconds. When parallel_degree_policy=AUTO or limited, any statement estimated to take longer than parallel_min_time_threshold will be considered a candidate for Auto AOP|
|parallel_servers_target||New||Upper limit on the number of PQ slaves that may be in use in an instance|
|parallel_threads_per_cpu||Old||2||Used to represent the number of concurrent processes a CPU can support|
Again, Oracle introduced Auto DOP in 11gR2 to make the determination of a query’s DOP “automatic”, instead of based on trial-and-error and often attempted without an understanding of the underlying hardware configuration. Many times, I’ve seen bits of custom and Oracle packaged software code with hints like the below:
SELECT /*+ parallel (x,4) */ …
SELECT /*+ parallel (x,20) */ …
In these, the degree of parallelism seems to have been sized more out of guesswork than anything else. And sometimes, I’m just as guilty as developers sometimes and often put in a PQ hint in based on a gut feel for how much horsepower I think it needs. Auto DOP takes care of this guesswork and automatically determines your DOP.
You can enable Auto DOP by:
Auto DOP In Action
Test without Automatic Parallelism
First let’s look at a full-scan of a large table without automatic parallelism. Let’s check our environment:
As we can see our large table has a parallel degree of 1 and parallel_degree_policy is set to MANUAL. Now I’m going to run a full-scan against DWB_RTL_SLS_RETURN_LINE_ITEM and also turn on SQL Compiler tracing so we can see what CBO does. From another session, I’ll run a script pq_chk.sql, which is provided in Appendix A:
As you can see, the query took over 4 minutes and based on the execution plan, was not parallelized.
When the script was running, we checked our parallel execution and it showed this – no PQ operations for user D3:
Also, if we check our SQL Compiler trace file, we can see the following sections relating to parallelism, which show us that automatic parallelism is disabled.
We can also run a SQL Monitoring report for the given SQL_ID by pulling the SQL_ID out of the SQL Compiler trace file, and it looks like this:
Test with Automatic Parallelism
Now let’s turn on automatic parallelism, bounce our database, and re-run the test:
Now when we run our next test, we see the following:
As we can see, the query was not parallelized.
While running, if we check our parallel query statistics, we see that parallelism is not enabled:
Let’s check our SQL compiler trace:
From the above, we can see that automatic DOP was disabled because no parallel objects were referenced. We know that we have a degree set to 1 for our large table because we did an “alter table <table> noparallel” to it. This shouldn’t be a surprise; according to Oracle documentation, with parallel_degree_policy=LIMITED, only tables whose parallelism are set to the default (96 in this case) will be eligible for automatic parallelism. So let’s test this:
And now let’s re-run our test:
Let’s check out parallel statistics during the run – as you can see (output truncated), the query was parallelized:
If we look at our SQL Compiler trace, we can learn the following:
An interesting thing to note here is this – according to the trace file, Auto DOP was disabled with a message “limited mode but no parallel objects referenced”.
Finally, let’s check our SQL Monitoring report, and as we can see, the query was parallelized.
The most recent test case was relatively inconclusive in terms of Auto DOP based on our trace file. So let’s set our parallelism to DEFAULT for our table and re-run the test:
This test looks like this – note the “automatic DOP” section at the bottom.
While running, we see the following parallel query behavior (output truncated):
Our SQL Compiler trace now shows this:
So now we can see that Auto DOP is in effect for our query.
And our SQL Monitoring report looks like this:
As you can see from the bottom screen print above, we’ve effectively parallelized our operations against both instances in the RAC cluster.
Now that we’ve established how to enabled Auto DOP, let’s run a few more test cases.
Test Auto DOP with Shorter-Running Query
In this test, we’ll run a full-scan against a smaller table and see what we can learn. We’ll use a table called SOE.ORDERS and make all indexes on it invisible so we can force a full-scan. Knowing what we know from the p
revious tests with respect to setting the parallelism to the default for the table, we’ll do this:
Now let’s run a full-scan against it with parallel_degree_policy=LIMITED:
As we can see, automatic DOP was disabled because of the “parallel threshold”. This, we know from the documentation, is set to 10 seconds and is established by the parallel_min_time_threshold initialization parameter.
Let’s check out SQL Compiler trace file to be sure:
This confirms our theory.
But let’s try to change parallel_min_time_threshold and see if we can get Auto DOP to kick in:
Now we’ll run our test again:
As we can see, Auto DOP set the computed degree of parallelism to 1. Let’s look inside our SQL Compiler trace:
As we can see, Auto DOP calculated a degree of parallelism to 1 and it did execute in parallel, but it was similar to running in serial because the calculated scan cost was low. What this tells us is that automatic DOP is doing its job – it’s not over-allocating PQ resources if it doesn’t need to. Before continuing on to the next sections, we’ll set parallel_min_time_threshold back to DEFAULT.
Note that we could get different behavior (i.e., different calculations of maximum parallel query slaves spawned) by adjusting parallel_degree_limit and parallel_threads_per_cpu, but we’ll refrain from testing this and go with the defaults seeded by Exadata, as they’re field-tested to give the best results. If you want to cap parallel query usage in environments where you’ve got databases consolidated or different resource needs for different workloads, use Database Resource Manager (and refer to the section in this document).
Parallel Statement Queuing
Parallel statement queuing and in-memory parallel execution are enabled by setting parallel_degree_policy=AUTO. You can also set “_parallel_statement_queueing”=TRUE, but you should allow the parallel_degree_policy to control this behavior.
The parallel statement queuing is to establish queues for PQ operations when multiple parallel operations spawn PQ slaves up to parallel_servers_target. In other words, it’s a way to control “runaway” PQ executions.
Let’s first start by showing what our value for parallel_servers_target looks like:
As you can see, it’s set to 384, which is a very high number. For sake of showing this in our test cases, let’s set it to 4:
We’ll also set parallel_degree_limit=8 for sake of testing:
Before running tests, let’s show a couple of scripts:
Now let’s set parallel_degree_policy=AUTO:
Now we’ll launch 6 executions of a long-running query against our large table and see what our queuing looks like. When it was running, we queried queuing information and saw this:
We can see below that the run-time of the full table scan went from about 30 seconds to nearly a minute on some executions:
And some took longer:
In Memory Parallel Execution
In-memory parallel execution and parallel statement queuing are enabled by setting parallel_degree_policy=AUTO. You can also set “_parallel_cluster_cache_policy”=CACHED, but you should allow the parallel_degree_policy to control this behavior.
The goal of in-memory parallel execution is to avoid disk IO. Historically, parallel query behavior performed direct reads and bypassed the buffer cache, the intention being to avoid buffer cache flooding and shared resource contention.
The downside to in-memory parallel execution on Exadata is that it bypassed Smart Scan and cell offload functions. In order to read into the buffer cache, serial direct reads must be disabled, and when this is true, Smart Scan is not in play.
It’s kind of difficult to get in-memory parallel execution to take place on Exadata. The following needs to be considered:
For our test, we disabled indexes on SOE.ORDER_ITEMS, set the parallel degree DEFAULT, and ran a select against it:
As you can see, it was NOT offload eligible, too 1 minute and 32 seconds, and executed with 2 PQ slaves. Let’s run the same query again and see if we can benefit from in-memory parallel execution:
As we can see, the query time went from 1:30 to a little over 2 seconds, a pretty significant reduction, and this was due to buffer cache access.
Let’s set parallel_degree_limit to LIMITED and re-test, twice:
As we can see, with parallel_degree_limit=LIMITED, in-memory parallel execution is disabled and Smart Scan operations can continue.
This is something to think about with Exadata. If we enabled in-memory parallel execution, in some situations, we may be eliminating the potential for Smart Scan.
Controlling PQ Operations using Database Resource Manager
If you want granular control of your parallel query behavior, DBRM is a sure-fire way to go. It’s difficult to setup and needs careful planning. Below are some of the things you need to think about and set in your resource consumer groups:
|Consumer Group Attribute||What it Does|
|PARALLEL_QUEUE_TIMEOUT||Specifies maximum queue time for a consumer group for PQ slaves|
|PARALLEL_DEGREE_LIMIT_P1||Sets maximum parallel degree limit for queries belonging to sessions mapped to a consumer group|
|MGMT_P1, MGMT_P2, MGMT_P3, …||Specifies attributes to modify the normal FIFO processing and allows for dequeue prioritization.|
|PARALLEL_TARGET_PERCENTAGE||Used to limit a consumer group to a percentage of parallel servers|
|BEGIN_SQL_BLOCK and END_SQL_BLOCK||Provides a way to queue individual SQL statements as if they were processed simultaneously|
A detailed example is beyond the scope of this paper.
In order for automatic DOP to work, in 126.96.36.199 Oracle implemented a requirement that the IO system be calibrated. This is done using the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure. Below is a message you’ll get when IO is not calibrated and you run an explain plan or AutoTrace or look in a SQL Compiler trace:
automatic DOP: skipped because of IO calibrate statistics are missing
So the question is – do you calibrate IO or not? Exadata is pre-calibrated, so the answer is no. You follow MOS note 1269321.1 and do this:
delete from resource_io_calibrate$;
insert into resource_io_calibrate$
values(current_timestamp, current_timestamp, 0, 0, 200, 0, 0);
Appendix A: Scripts
The purpose of this script is to check parallel query statistics for the system:
Purpose: Show parallel query queuing and servers busy information
Centroid is a cloud services and technology company that provides Oracle enterprise workload consulting and managed services across Oracle, Azure, Amazon, Google, and private cloud. From applications to technology to infrastructure, Centroid’s depth of Oracle expertise and breadth of cloud capabilities helps clients modernize, transform, and grow their business to the next level.