Exadata Storage Indexes

What is a Storage Index?

Storage Indexes are a feature unique to Oracle Exadata. A storage index is a memory-based structure that reduces the amount of physical IO required by the cell. A storage index keeps track of minimum and maximum values from query predicates and builds storage index regions based on usage.

 

need help? message an expert now!

 

The example above shows that for a given table, T, the values for column B are tracked such that in the example where B<2 is provided, Oracle will know to use the storage index in track 1 and refer only to blocks in the B: 1/5 range. The goal is storage indexes, again, is to eliminate unnecessary IO calls to cells and ASM AUs/disks/cells on which the requested data does not exist. The way it’s physically laid out is:

  • The storage space inside each cell is logically divided into 1Mb chunks called “storage regions”
  • The boundaries of ASM allocation units (AUs) are aligned with the boundaries of the storage regions.
  • In each storage region, data distribution statistics are maintained in a memory structure called a region index.
  • Each region index contains distribution information for up to 8 columns.
  • The storage index is a collection of region indexes.

The storage statistics represent the data distribution of columns that are considered “well-clustered” by Exadata, and Exadata provides logic to determine what columns are clustered well enough to be included in a storage index. Storage indexes work best when data is roughly ordered so the same column values are clustered together, and when the query predicates contain =, <, >, or some combination of these. The only way to influence the user of a storage index is to load your tables with sorted data.

Test Data

The tests below are done on Centroid’s Exadata X2-2 Quarter Rack Database Machine. We’ve created a 38 million-row table called MYOBJ, which looks like DBA_OBJECTS. We’ve then created MYOBJ_ORDER as a copy of MYOBJ ordered by STATUS, then OBJECT_TYPE. We may have to re-visit this test case at some point, but for now we’ll test with these tables.

 

Storage Indexes in Action (???)

Use of storage indexes can be measured by looking at the “cell physical IO bytes saved by storage index” system statistic. The tests in this section will run a SQL statement and check “cell physical IO bytes saved by storage index” values from V$MYSTAT to measure impact.

Let’s begin with a test by querying all rows from MYOBJ where OBJECT_NAME LIKE ‘A%’: 

 

 

As we can see, the query was cell offload eligible and Smart Scan did a great job saving IO, serving 2.3 million rows and 29,318Mb of data over the interconnect – but no indication of using a Storage Index. Let’s do another test with a different predicate:

 

 

Again, we see no evidence of a storage index being used. Let’s change the SELECT clause to filter the column:

 

Once again, no storage indexes in use. At this point I’m not sure whether it’s because the data in the table isn’t sorted, whether the amount of bytes returned is “too large”, whether storage indexes don’t work for “LIKE” clauses (or maybe rather, are unfavorable), or maybe something else. The next test will do an equi-join on object_name:

 

Still, we see no storage index being used. So far we’ve successfully demonstrated how notto use storage indexes, which is great and an activity we can try all day long with the same result.

Storage Indexes in Action!

Now let’s try a query on our MYOBJ_ORDER table, which is a table exactly like MYOBJ except it’s sorted by STATUS:

 

 

Wow! Not only did it use a storage index, it returned the data amazingly fast. It also disproved our weaker theories. Let’s now try the same query against MYOBJ:

 

 

We see above that a storage index was not used, so clearly the ordering of the data in the table influenced Oracle’s decision to leverage a storage index. Now since we ordered MYOBJ_ORDER based on STATUS, and then OBJECT_TYPE, lets try a query on MYOBJ_ORDER.OBJECT_TYPE for a list of known rows where STATUS=’INVALID’:

 

 

We can see that although the count of objects was very high (7.8 million) and Smart Scan was enabled, we still were able to use a storage index to very quickly retrieve the rows. Next let’s try a query with a large number of “VALID” rows: 

 

 

As we can see, a storage index was still used. Revisiting a previous test, let’s select all columns and see if thestorage index is still used:

 

 

And as we can see, our storage index was still used. Finally, let’s do a test when we add a “STATUS” predicate to the OBJECT_TYPE clause:

 

The response time was 0.07 seconds with the use of a storage index. Now that we know some general guidelines of when storage indexes are used and when they’re not used, let’s do some additional test cases.

Summary

For storage indexes to be considered, the data in thetables needs to be well-ordered Storage indexes will function on a secondarily-ordered column (depending on how well-ordered this column is on thetable) OLTP tables with sequentially-populated columns (i.e., ORDER_ID, TRANSACTION_ID, etc.) could be good candidates for storage indexes