Oracle E-Business Suite, Exadata, and Smart Scan, Part II

In this post, I’m going do to a little follow-up to the blog at:
I received quite a bit off both online and offline feedback on this post and wanted a chance to re-position my messaging a bit.
First and foremost, the intent of my initial blog on the topic was *not* to recommend hacks to make smart scan work on Exadata.  Above all, the intent was to essentially communicate:
– Why smart scans may not happen very often on Exadata, and
– How to understand why smart scans aren’t happening, and
– Provide some quick tips to “force” direct reads in order to get smarts scans   to happen
These tips were definitely not meant to be done in production and/or without testing, they were simply listed as methods to get smart scan to work.  Even if it’s not the best way to get the data you need.
I also made multiple references to serial direct reads and excluded parallel direct reads from the conversation.  Smart scan can happen when direct reads happen, whether serially or in parallel.  There are a number of considerations Oracle evidently takes to decide whether to do direct reads or buffered reads, and one of the key factors is the setting of “_small_table_threshold”.  By default, this is 2% of your buffer cache size.  When the optimizer decides to do a full scan operation, if the segment being scanned has more than “_small_table_threshold”, direct reads can take place.  Other blogs say that the true threshold is “_small_table_threshold” times 5, but this doesn’t seem to always be the case and I’ve blogged several test cases and details on the topic at https://centroid.com/knowledgebase/blog/direct-reads-and-smart-scan-with-exadata.
So what happens when a full-scan is done on a segment below the threshold?  The storage cells will use metadata in the iDB message, and the cells will send blocks over the interconnect to the compute node’s buffer cache.  In other words, a potential smart scan is “downgraded” to a normal scan.
One thing I’m unclear of is whether the database instance is responsible, alone, for determining direct reads, or whether storage cells are.  The way I look at it is that it could work one of two ways, but I suppose neither may be entirely correct:
1) The instance’s foreground process is responsible for generating the execution plan, identifying the list of extents required, and also determing whether to do a buffered read or direct read.  All of this information (and more) is sent to the storage cells and the storage cells decide to be “block servers” or “row/column servers” based on whether direct read choices are come over in the iDB message
2 The instance’s foreground process is responsible for simply sending the plan and extent list to the storage cells, and the cells determine based on this information whether to do direct or buffered reads.  This would imply that things like table/index block counts would be bundled in the message so that the cell could make the right decisions
I suppose the difference between the two theories is that in #1, Oracle is sending data plus directives to the cell, and in #2, Oracle is sending only data and the cell is responsible for making the decisions to downgrade from smart scan to buffered reads.  In any event, if any readers want to comment I’d appreciate it.
Is Exadata suitable for OLTP Workloads?
Much of the feedback in the original blog carried the theme “Exadata isn’t designed for OLTP workloads so you shouldn’t put OLTP applications on Exadata”.  And while I agree with this statement, there are a couple of conditions to place on this position:
– If you’re using Exadata for a database with a “smart scan friendly” workload and also want to consolidate databases on Exadata, an OLTP-ish enviromnent should work just as well on Exadata as it does on any other comparable hardware, if not better.  Things like Smart Flash Cache can make certain OLTP workloads perform better than on hardware without the same flash storage volume
– There is some merit to Hybrid Columnar Compression for “OLTP” applications like Oracle E-Business Suite.  Whether or not the benefits justify the cost is a topic open for debate, but I’ll get into the HCC arguments in a bit
In my eyes, what it really boils down to is your workload and the business impact of specific queries/programs in your workload.   At the end of the day, if you have a single program that’s designed to do full-scan a large amount of data, and the performance can be helped by Exadata smart scan, then the cost savings/risk avoidance of improving the performance of this via smart scan *could* justify putting this mixed-workload environment on Exadata.  Even if every single other program is *truly* OLTP and is optimized via single-row lookups and buffered reads.
Is Oracle EBS an OLTP Application?
Or, is an Oracle EBS database workload an OLTP workload?  This is really the key point of the discussion.
Traditionally, Oracle EBS and other enterprise applications are generally designed, written, and optimized for OLTP.  As outlined in the previous blog, Oracle’s gone to lengths to ensure indexed reads using over-indexing, hinting, and so forth.
But in the real world, EBS workloads can almost always be characterized as mixed-workloads, and more specifically, mixed-workloads against a relational data model.  Every customer is different of course, but most customer workloads on EBS can be characterized in a couple of bullets:
– Most transactional activity is heavily OLTP in nature.  Modules like Order Management, Inventory, Procurement, Payables, Receivables, and so forth all essentially insert, update, and/or validate single entities (i.e., an order line, an inventory transaction, a payment, a receipt, etc.)
– The transactional activity generally takes place all the time, via front-end forms/JSP pages or backround concurrent processes
– At the end of each month, period, or year, transactions from the various subledgers are summarized and posted to General Ledger.  These transactions are multi-row problems, not single-row problems, and the SQL that makes it all happen typically uses full scan access methods.  Processes like Create Accounting in R12 are common performance issues at nearly all EBS implementations who do any respectable amounts of transaction activity
– Other “batch” types of programs including Supply Chain Planning or MRP processing, Payroll processing, things like Demantra for forecasting, and so forth.  These programs generate non-OLTP workload into your EBS environment, typically at the same time all your OLTP transactional activity is taking place.  This is one of the reasons why Oracle has distributed architecture solutions for many of these products – to segreate the batch IO problems from the OLTP IO
– Operational and even analytical reporting sometimes uses a different database or data store, but often the reporting discipline and business requirements dictate that reporting “runs out of production”.  A good number of companies develop custom reports or reporting solutions to handle a wide range of reporting business needs, and often these do full-scans and demands a great deal of IO bandwidth
I like to say things like “EBS is an OLTP workload for 27 days of the month, at which point it turns into a mixed-workload environment”.
Full Scans, Mid-Sized Oracle EBS Site
So to recap, for smart scan to work, we need:
– A workload that does full scans
– A workload that does full scans above the “_small_table_threshold” block count
How likely are these two conditions to be met in a mid-sized Oracle EBS environment?  I decided to try to find out by looking at some AWR data for segments that have undergone full-table scans.
SQL> select owner,object_name,scans,blocks,
  2  (case
  3          when blocks > ((.02*10*1024*1024*1024)/8192) then ‘Y’
  4          else ‘N’
  5  end) lbc,
  6  (case
  7          when blocks > ((.02*4*1024*1024*1024)/8192)
  8          then ‘Y’
  9          else ‘N’
 10  end) sbc
 11  from
 12  (
 13  select obj.owner, obj.object_name,
 14  max(ss.table_scans_total) scans,
 15  tab.blocks
 16  from dba_hist_seg_stat ss
 17  , dba_hist_seg_stat_obj obj
 18  , dba_tables tab
 19  where ss.obj#=obj.obj#
 20  and ss.ts#=obj.ts#
 21  and ss.dataobj#=obj.dataobj#
 22  and obj.object_type=’TABLE’
 23  and obj.object_name not like ‘%UNAVA%’
 24  and tab.owner=obj.owner
 25  and tab.table_name=obj.object_name
 26  group  by  obj.owner, obj.object_name,tab.blocks
 27  having max(ss.table_scans_total) > 0
 28  union
 29  select obj.owner, obj.object_name,
 30  max(ss.table_scans_total) scans,
 31  (ind.leaf_blocks +  ind.blevel) blocks
 32  from dba_hist_seg_stat ss
 33  , dba_hist_seg_stat_obj obj
 34  , dba_indexes ind
 35  where ss.obj#=obj.obj#
 36  and ss.ts#=obj.ts#
 37  and ss.dataobj#=obj.dataobj#
 38  and obj.object_type=’INDEX’
 39  and obj.object_name not like ‘%UNAVA%’
 40  and ind.owner=obj.owner
 41  and ind.index_name=obj.object_name
 42  group  by  obj.owner, obj.object_name,ind.leaf_blocks+ind.blevel
 43  having max(ss.table_scans_total) > 0)
 44  order by 3 desc
 45  /
                                                                              Direct Read  Direct Read
Owner           Segment                               Full Scans       Blocks 10G BC       4G BC
————— ———————————– ———— ———— ———— ————
OKS             OKS_BILL_CONT_LINES                      491,621          166 N            N
AR              RA_CUSTOMER_TRX_LINES_ALL                149,783       13,585 N            Y
INV             MTL_SYSTEM_ITEMS_B                        72,488      231,423 Y            Y
AR              HZ_PARTIES                                59,437        1,775 N            N
CSI             CSI_TXN_ERRORS                            35,486           43 N            N
APPLSYS         WF_ITEM_ACTIVITY_STATUSES                 20,380       26,445 Y            Y
ONT             OE_ORDER_LINES_ALL                        19,776       40,883 Y            Y
AP              AP_AE_HEADERS_ALL                         17,082       10,127 N            N
<< lines deleted >>
AR              AR_DISTRIBUTIONS_ALL                           1        3,678 N            N
INV             MTL_TRANSACTION_ACCOUNTS                       1       19,099 N            Y
98 rows selected.
From the query test, we had:
– 98 tables or indexes do full-scans since the beginning of our AWR retention period
– 19 of these tables would have qualified for direct reads with a 10GB buffer cache
– 35 would have qualified for direct reads with a 4GB buffer cache.
Of course, conditions like cached blocks, dirty blocks in cache, and other adaptive direct read choices may hamper this.  If you look at the tables on the list, these are all very common Oracle EBS tables so the likelihood of the dirty read influence could be relatively high
Full Scans, Large-Sized Oracle EBS Site
Now we’ll do the same type of query in a larger environment.  In our larger environment, we had 760 segments with full-scans, and rather than show all of the rows, I’ll only show the ones that would qualify for either direct reads with a 10GB buffer cache and 4GB buffer cache:
SQL> select owner,object_name,scans,blocks,
  2  (case
  3          when blocks > ((.02*10*1024*1024*1024)/8192) then ‘Y’
  4          else ‘N’
  5  end) lbc,
  6  (case
  7          when blocks > ((.02*4*1024*1024*1024)/8192)
  8          then ‘Y’
  9          else ‘N’
 10  end) sbc
 11  from
 12  (
 13  select obj.owner, obj.object_name,
 14  max(ss.table_scans_total) scans,
 15  tab.blocks
 16  from dba_hist_seg_stat ss
 17  , dba_hist_seg_stat_obj obj
 18  , dba_tables tab
 19  where ss.obj#=obj.obj#
 20  and ss.ts#=obj.ts#
 21  and ss.dataobj#=obj.dataobj#
 22  and obj.object_type=’TABLE’
 23  and obj.object_name not like ‘%UNAVA%’
 24  and tab.owner=obj.owner
 25  and tab.table_name=obj.object_name
 26  group  by  obj.owner, obj.object_name,tab.blocks
 27  having max(ss.table_scans_total) > 0
 28  union
 29  select obj.owner, obj.object_name,
 30  max(ss.table_scans_total) scans,
 31  (ind.leaf_blocks +  ind.blevel) blocks
 32  from dba_hist_seg_stat ss
 33  , dba_hist_seg_stat_obj obj
34  , dba_indexes ind
 35  where ss.obj#=obj.obj#
 36  and ss.ts#=obj.ts#
 37  and ss.dataobj#=obj.dataobj#
 38  and obj.object_type=’INDEX’
 39  and obj.object_name not like ‘%UNAVA%’
 40  and ind.owner=obj.owner
 41  and ind.index_name=obj.object_name
 42  group  by  obj.owner, obj.object_name,ind.leaf_blocks+ind.blevel
 43  having max(ss.table_scans_total) > 0)
 44  where (blocks > ((.02*10*1024*1024*1024)/8192)
 45  or blocks > ((.02*4*1024*1024*1024)/8192))
 46  order by 3 desc
 47  /
                                                                              Direct Read  Direct Read
Owner           Segment                               Full Scans       Blocks 10G BC       4G BC
————— ———————————– ———— ———— ———— ————
AR              RA_INTERFACE_LINES_ALL                    32,867       42,492 Y            Y
INV             MTL_SYSTEM_ITEMS_INTERFACE                29,199       20,406 N            Y
MSDEM           SALES_DATA                                22,032    1,867,214 Y            Y
MSDEM           MDP_MATRIX                                15,374      391,689 Y            Y
MSC             MSC_TP_SITE_ID_LID                         7,311      183,433 Y            Y
<< lines deleted >>
XXXXX           XXDPI_MTL_TXN_REQUEST_LINES_C1                 1      118,382 Y            Y
XXXXX           XXDPI_OE_ORDER_HEADERS_N11                     1       11,912 N            Y
Based on this customer:
– 329 of our full-scans would qualify for direct reads with a 4GB buffer cache
– 227 would qualify for direct reads with a 10GB buffer cache.
Of course, the same dirty reads argument applies in this test case, but what this example shows is that with a larger Oracle EBS environment, we have a large number of potentially important tables that would qualify for direct reads due to their size and the fact that they’ve been full-scanned.
One thing this example and the previous example does not show, obviously, is the actual SQL statements, programs, modules, etc. that performed the full-scans.  It’s possible these came from statistics collection, which while important, perhaps isn’t something that should drive you toward Exadata alone.
What about Hybrid Columnar Compression?
In a response to my other blog on the topic, I mentioned Hybrid Columnar Compression as a valuable feature with Oracle EBS.  Granted, HCC is not a good choice for OLTP tables and workloads, but using one of my earlier points, here’s a scenario where HCC can save a tremendous amount of disk space and potentially improve performance with historical reporting.  Consider this:
– Transactional data is inserted every day, for all subledgers
– During month-end close, this data is summarized and transferred to GL (smart scan can help here)
– After the period close, this transactional data will never be transacted on again.  Yet is remains in your Oracle EBS environment until and if you have a solution to purge it
Why is this a problem?  Aside from the capacity issues that many customers face, reporting performance suffers are folks run subledger reports on this historical data.  Whether or not they *should* be or not, or whether or not customers should implement OBIEE or something else, the fact is people *do* report on this old data.  And since it’ll never chance, and since Exadata offers HCC as a way to shrink it, then in my experience is makes a good deal of sense.
Could customers be more on-the-ball and take care of these reporting and data growth issues more proactively?  Of course, but again, in the real world it’s very common for these challenges to be an afterthought – getting EBS deployed is difficult and expensive enough, and most folks postpone retention, purge, archive, and reporting strategy decisions until down the road (and typically, by the time they worry about it all standard, packaged EBS utilities to help them just won’t run because like everything else, they’re designed to purge a row-at-a-time.  A topic for another day and another audience …).
Summary
So is Exadata right for your Oracle E-Business Suite database?
It depends on our workload.
If you workload has pieces of it that smart scan can help with, then maybe.
If your workload has pieces on which smart scan can shave 3 days off a period close cycle, allow you to forecast and plan daily instead of weekly, or allow your next payroll to run, and if these business scenarios can save you more money than Exadata costs, then the answer could be “more than maybe”.