Smart Scan: What Impact does Row Filtering have on Bytes Returned from Storage Cells

In one of recent tests demonstrating Smart Scan processing, we did a full-scan on a 4Gb table and returned all columns, all rows, blanketed by a “select count(*)”.  Over 4 Gb were eligible for predicate offload, but only 2 Gb was actually returned over the InfiniBand interconnect.  In this section I’ll attempt to show why.

 

Here’s the Query and Smart Scan Statistics:

 

 

So the disconnect in the statistics is this – 4.6 GB was eligible for predicate offload, we select ALL rows, yet only 2.1Gb was actually returned from the storage cells.  How can this be?

During research, we went down the path of trying to determine the actual space consumption by the table (and its partitions) using DBMS_SPACE but stopped in interest of time.  DBMS_SPACE wouldn’t work becuacse the tablespaces had automatic segment space management; specifically, tablespace SOE.  So to get this to work, I created a manual segment space management tablespace and will move SOE.ORDER_ITEMS_TEST2 into this tablespace.

 

 

 

From this, we can see that we’ve got 4426039296 bytes in the table.  Let’s do a scan against it.

 

 

We still see the same behavior.

If we disable smart scan by setting “_serial_direct_read” = never, we get this, which is what we’d expect:

 

 

This output shows us that there are 4+ Gb of data to return from disk.

The next thing I want to check is whether the simple fact that we’re doing a “SELECT COUNT(*)”, instead of selecting actual columns and returning rows, is impacting our offload efficiency.  So if we do another example on SOE.CUSTOMERS_TEST (which is based on SOE.CUSTOMERS), we get the below.  First, let’s select all rows from the table and “blanket” it by a SELECT COUNT:

 

 

Next, let’s put in a WHERE clause condition:

 

 

As we can see, the IO returned dropped from about 230MB to 9MB, which is what we’d expect with row filtering. Now, let’s actually select the rows by doing “set autotracetraceonly”:

 

 

We see here that our IO bytes increased from 9,574,096 to 50,683,072, or an increase in 39.2 MB.  Since we selected 702,952 rows, this equates to about 58.48 bytes/rows.  Now if we check the average row size of the rows returned, we get this:

 

 

Which shows an overhead of about 3 bytes/row by passing the actual data back from the cells.  Knowing and understanding this, it seems logical that the number of rows returned, combined with the sum of column widths per row selected, will impact the number of bytes returned over the interconnect.

With this in mind, let’s go back to the original query against SOE.ORDER_ITEMS_TEST2, in which 4.12 GB were eligible for Smart Scan and only 1.94 Gb were returned from the interconnect, even after selecting all rows, but while blanketing with a “SELECT count(*) FROM …”.

If we now do a “wide open” query against this table let’s see how many bytes are returned, and how many bytes were returned by Smart Scan:

 

 

With this output, we can note the following:

  • Smart scan was used.
  • The number of bytes returned over the interconnect was very close to the number eligible for offload processing.
  • The number of rows and columns returned to the database tier impacts the number of bytes sent from the storage cells over the interconnect.
  • If you limit your result set by imposing COUNT, MAX, MIN, or any aggregate function, you’ll see less bytes sent from the interconnect (See below) because of offloading.