Oracle 11gR2 SQL Monitoring

What is SQL Monitoring?

Starting with 11g, Oracle introduced a new tool to add to the performance optimizer’s toolkit called “Real Time SQL Monitoring”.  Real Time SQL Monitoring, or just “SQL Monitoring”, enables you to measure the performance of SQL statements while they are executing.  SQL Monitoring is enabled any time a statement runs in parallel or when it consumes more than 5 seconds of CPU or IO time.

Oracle populates V$SQL_MONITOR and V$SQL_PLAN_MONITOR views with performance characteristics of running SQL statements, and stores these as long as the SQL_ID remains in the library cache.  You can use these two views along with V$SQL, V$SQL_PLAN, V$ACTIVE_SESSION_HISTORY, V$SESSION, and V$SESSION_LONGPS to gather more detail about the SQL statement being run.

V$SQL_MONITOR tracks key performance statistics for each qualifying SQL statement (a subset of what’s in V$SQL), and V$SQL_PLAN_MONITOR provides statistics for each execution path step for the statement.  This allows tools such as Enterprise Manager Grid Control to provide a graphical, intuitive representation of the time taken at each step in the path, as well as IO consumption, CPU consumption, etc.

Contrary to V$SQL, V$SQL_MONITOR tracks performance statistics for each execution of a SQL statement, not a summary of accumulated statistics.

You can use DBMS_SQLTUNE.REPORT_SQL_MONITOR, Grid Control, or EM Database Control to report on V$SQL_MONITOR and V$SQL_PLAN_MONITOR.

There are a number of nice features of SQL Monitoring, whether you use the EM UI or via SQL scripts:

  • SQL Monitoring provides intuitive visual feedback in the HTML or EM UI that shows you exactly how much time each operation in an execution plan is taking.  This is very helpful for the SQL tuner – it shows you where to start
  • SQL Monitoring shows not only estimated rows, but also actual rows, for qualifying operations.  This helps in determining if the optimizer is not estimating cardinality optimally – another great tool for a SQL tuner
  • SQL Monitoring provides the best and most information for parallel query operations, period.  All SQL statement with a DOP>1 are captured in V$SQL_MONITOR, and with SQL Monitoring you see a lot of detail about parallel execution
  • In the EM UI, the SQL Monitoring details allow for intuitive drill-downs into the query execution plan, query statistics, etc.
  • When running DBMS_SQLTUNE.REPORT_SQL_MONITOR, the report output displays activity detail for each operation.  This activity detail shows statistics including wait event information, CPU information, and so forth, so you can determine which events or statistics contributed time to each operation

Using Enterprise Manager
Enterprise Manager Grid Control or Database Control has a SQL Monitoring link in the Performance tab that can be used to show SQL Monitoring details.  See below:

When you click on it, it’ll show SQL Monitoring statistics for anything running over 5 seconds or running in parallel:

 

You can see the performance-related timings and details from the screen, and if you click on the SQL_ID link, it will show various details of the SQL statement’s performance profile.

 

 

 

 

 

As you can see, there is a tremendous amount of information contained in various tabs for each SQL_ID monitored.  Also, note the graphical representation of time-per-step – this should provide the SQL optimizer a very good idea of what to look for when optimizing.
Let’s take a look at a slightly more complicated query that does a handful of joins:

 

 

From the above and below, we can clearly see the step that’s taking the most time.

 

 

You can also click on “View Report” to view the SQL Monitoring report.  The output looks like the below (I’ve got several screen shots, but this all shows up on one page in EM):

 

 

 

 

 

The screen shot below shows the SQL Monitoring display after several long-running jobs have run:

 

 

Using DBMS_SQLTUNE.REPORT_SQL_MONITOR

What if you don’t have EM Grid Control or DB Control?  Oracle provides a means to report on SQL Monitoring details using DBMS_SQLTUNE.REPORT_SQL_MONITOR.  Below, I’ll show this after an execution of a long-running piece of SQL:

 

 

 

As you can see from the above, DBMS_SQLTUNE.REPORT_SQL_MONITOR picked up the last SQL statement that ran and qualified for SQL monitoring, since we didn’t provide a SQL_ID.

 

Run SQL Monitor Report for Last SQL
Let’s do a shorter-running SQL statement so other SQL statements don’t enter the queue first:

 

 

As you can see, there is a great deal of information in the report, but it’s missing a nice graphical time representation of how long each step took.
You’ll also likely think to yourself – how can I get a SQL report for a given SQL ID?  The next section goes into this.
Run SQL Monitor Report for Given SQL_ID
If you know the SQL_ID ahead of time, you can enter it as a parameter to the call to DBMS_SQLTUNE.REPORT_SQL_MONITOR:

Let’s run a SQL statement, get the SQL_ID in question (which we’ll get from V$SESSION.PREV_SQL_ID)

 

SQL Monitor for Partition-wise, Parallel Queries

First, our query:

 

The first section of the SQL Monitor report shows the Global Information.  The output is relatively self-explanatory:

 

Next, we see the global statistics from the query:

You can see that it took 3.69 seconds of elapsed time, of which 3.24 seconds were IO-related.  We read 44MB of data.  Next, it shows parallel execution details:

From the above we can see the DOP was selected at 8 and we can also see the relatively smooth distribution of IO for each PQ server.  Three of our PQ servers waited on direct path read waits, which is what you’d see for parallel query slaves because they return data to the user’s PGA.    Next, we see an instance breakdown of detail that shows which PQ servers were run on each instance, along with the query coordinator (QC).  You can see that on instance 1 (XRAC1), it spend slightly more time.

Finally, we see the execution plan:

 

One of the great things about SQL Monitor is the report of estimated rows and actual rows.  From the above, we can see that CBO estimated 446k rows on the full-scan of MYOBJ but retrieved 452k rows.  These are pretty close to the same thing, but if they weren’t it’d probably mean that we had bad statistics on the table.  Let’s examine these execution plan details and talk through what each line means.

 

1.    First, it does a parallel full-scan on MYOBJ.  We know that this is a parallel operation because the operation above it is a “PX BLOCK ITERATOR” operation:


2.    As mentioned, the operation associated with the full-scan is a “PX BLOCK ITERATOR” step.  PX BLOCK ITERATOR means that the parallel server processes will iterate over the generated block range granules to complete the table scan


3.    The next operation is a SORT AGGREGATE operation. Typically, a SORT AGGREGATE operation will return one row (and you can see later in the plan above that it does) – in the context of a parallel operation, however, it’s basically the sorted data from each of the parallel servers involved in the query.  The SQL Monitor report shows a row estimate of “1” and an actual rows of 8, which means 1 row returned for each parallel server


4.    Next we see the “PX SEND QC (RANDOM)” operation. This is our query coordinator, consuming rows that our PQ servers produced.   Any line below “PX SEND%” is a producer, and a “PX SEND” is a consumer of the parallel operation rows and in our case, a producer of the next set of operations.   Because our query did a grouping function (count(*)), it will be a producer to the QC sort operation


5.    The next operation is the PX COORDINATOR operation, which his fed by the QC to produce data to the final sort operation. You’ll note 9 “Execs” – one for each PQ server and one for the QC


6.    Finally, a SORT AGGREGATE operation is used to sort the results and produce the single-row to the user.

Now let’s do a 2-table join and explain the execution plan:

 

 

Here’s how to read the plan:

1.    The first thing done is at line 9 – an index fast full scan on SYS.OBJ$.I_OBJ1 index.  This is done in parallel, as indicated from the “PX SEND” line above.
2.    In line 8, we’re doing a “PX SEND BROADCAST” operation.  When joining tables in parallel, Oracle can choose to either broadcast results (rows) from one operation to apply to the other table scan, or it can choose PX SEND HASH.  In this case, our CBO determined that a BROADCOAST was appropriate because the results from the OBJ$ table were much lower than the MYOBJ table
3.    Line 7, the PX RECEIVE step, is basically the consumer of the broadcasted rows in step 8
4.    Line 6 is an in-memory BUFFER SORT of the rows returned from the index scan on OBJ$
5.    Lines 11 and 10, respectively, indicate the full scan and PX BOCK ITERATOR operation for the granules involved in the 8 PQ servers
6.     In line 5, Oracle is doing a hash join on the resulting rows from the parallel scans on MYOBJ and OBJ$
7.    Line 4 is a per-PQ server sort of data from the joind PQ servers
8.    Line 3 is the consumer QC that holds the result of the each of the PQ servers
9.    Line 2 is the PX Coordinator (QC) collecting, or consuming the rows of the joined data
10.    Line 1 is the final SORT AGGREGATE line that performs the grouping function

Let’s do another test joining MYOBJ to itself to produce a larger result and see what changes in our execution plan:

 

 

Above, there are a few key differences between this test and the previous test, in addition to the obvious fact that we’re joining to a different table

  • The PX SEND BROADCAST changed to a PX SEND HASH because the optimizer knew that the result sets of each full parallel scan were roughly (or exactly) the same size
  • The plan shows that we basically did two full parallel scans and hash joined the results together.  Specifically, each full scan operated on its own set of granules (each had a PX BLOCK ITERATOR operation), and each had a PX RECEIVE consumer to consume the rows of the parallel scan.

If we display the parallel execution statistics of the query, we can see 8 PQ serves established for each table scan in the join:

 

This points out something that’s worth noting – if we run a parallel-enabled join, Oracle will spawn PQ servers for each table that has a parallel DOP, so in this test we had 16 PQ servers accessing data.  The more tables you join in parallel with a non-serial DOP, the more work your system will do. In this instance, we’ve got parallel_max_servers=80, so we could possibly see up to 80 servers established if we had a query that needed it.  Let’s show what it would look like if we reduced parallel_max_servers to 10.

 

 

We still see 16 PQ servers established, although no more than 10 on a single instance, since parallel_max_servers works at the instance level.  So let’s set to parallel_max_servers is 4 and see what we can see:

 

 

Above we can see the DOP calculated at 4 and only 4 PQ servers established per node.   Let’s force our DOP to 8 when parallel_max_servers is only 4 and see what our plan looks like:

 

 

It looks similar to the previous test – we’ve requested 16 servers due to the DOP of 8 on the table, but it only allocated 8, 4 for each instance.

 

Generate SQL Monitor Output to HTML File

You can also generate an HTML file output.  This is helpful because it shows a graphical representation of how much time each step in the execution plan spent.