Smart Scan Column and Predicate Filtering in Action
The examples in the previous section did several tests in which cell offload and Smart Scan processing was measured by selecting all rows from the table, or a “count(*)”. Let’s see what the Smart Scan impact is when selecting a subset of columns from a table. In the example below, we’ll use the SOE.ORDERS table and make all indices invisible.
Above, we can see that the second query essentially performed the same magnitude of work. Let’s do another test with “select distinct” and a column with nulls:
Again, same results. Let’s put this into a simple anonymous PL/SQL block and avoid the “select count(*)” part of it.
Again, the results are similar. Now, we’ll try with a non-partitioned table with some null column values, using MYOBJ (a copy of DBA_OBJECTS loaded with 150 million rows):
As we can see above, the Smart Scan savings was more than in the example against SOE.ORDERS, but we didn’t see much of a change in “select *” vs. “select generated” in the query.
Based on the examples so far in this section, it doesn’t look as if column filtering is doing anything. Let’s try an example with using a predicate on our SOE.ORDERS table:
In this example, we’re passing far less data back to the database tier and thus getting a better Smart Scan efficiency. One more example:
In this example above, we’re selecting based on a predicate that we know what return any rows. Smart Scan processing is taking place. In the first example, the amount of Smart Scan MB returned from Exadata cells is roughly 3,800Mb, yielding a Smart Scan efficiency of 99.8%. In the second example, we’ve put in the same predicate but instead select two additional columns, ORDER_ID and ORDER_STATUS. In this test, roughly 4,900Mb of data was returned via Smart Scan processing, giving a Smart Scan efficiency of 99.74%.
Based on these test results, we can see that row filtering certainly plays a very large role in Smart Scan efficiency, but clearly column filtering also plays a role. Like anything, your mileage will likely vary based on the SQL statements being issue, the column selection list, and predicate information.
Since I’m not entirely happy with the result of the test so far (I expected a larger scan efficiency delta to be seen), I’m going to add a very wide on MYOBJ and do some additional testing.
Below, I’m going to show the Smart Scan behavior with column filtering when selecting two columns from a large table (SYSTEM.MYOBJ):
Now, I’ll do the same query while selecting just one column, the OBJECT_NAME column:
In the examples above, I’m querying MYOBJ, which looks like this:
As you can see, BIGCOL is a column of data type VARCHAR2(4000). The tests above returned 18,513 rows, so if every row had BIGCOL fully-populated, we should be doing about (18513 * 4000) bytes less of IO. But since I know that I’m only populating 255 bytes of data…
…our IO savings should be about (18,513 * 255) =~ 4.5Mb.
In our second query, we benefited from the use of a storage index, which helped the run-time, but shouldn’t impact the number of bytes returned over the InfiniBand interconnect. If you look at the numbers from the queries above, we see 7,730,380 bytes returned from the first query and 1,230,720 returned from the second query – a savings of about 6.2Mb. Which is a greater savings than what we actually expected by NOT selecting the larger column, an indication that column filtering is playing a pretty big role.
But let’s look at some different tests against these tables and see the storage index impact in both:
Here, the difference in bytes over the InfiniBand interconnect between the first query and second is 1,176,032 – 651,744 = 524,288 bytes. Each query returned 2048 rows. The row length of our BIGCOL column is 255 bytes for all 2,048 rows:
Further, 524,288/2,048 = 256 bytes per row saved when NOT selecting the BIGCOL column. So this pretty much shows the impact of column filtering wit Smart Scan.
Summary
Exadata Smart Scan processing is a means to improve query response time and reduce the amount of data passed from the storage cell servers to the database tier nodes. In the tests in this post, predicate-filtering has a much greater impact on cell offload efficiency than column filtering. In other words, the number of bytes returned over the InterConnect is what determines offload efficiency, and in our tests, reducing or changing the number of columns selected did had an impact on interconnect bytes, but not nearly as much as reducing the number of rows returned.
1050 Wilshire Drive,
Suite 170,
Troy, MI 48084
Phone: (248) 465-9533
Toll free: 1-877-868-1753
Email: [email protected]
© Centroid, Inc. All rights reserved. Contact Privacy Policy Terms of Use CCPA Policy
Centroid is a cloud services and technology company that provides Oracle enterprise workload consulting and managed services across Oracle, Azure, Amazon, Google, and private cloud. From applications to technology to infrastructure, Centroid’s depth of Oracle expertise and breadth of cloud capabilities helps clients modernize, transform, and grow their business to the next level.
© Centroid, Inc. All rights reserved. Contact Privacy Policy Terms of Use CCPA Policy