Exadata Storage Indexes, LIKE, BETWEEN, and Wild Cards
What is a Storage Index?
Storage Indexes are a feature unique to Oracle Exadata. A storage index is a memory-based structure that reduces the amount of physical IO required by the cell. A storage index keeps track of minimum and maximum values from query predicates and builds storage index regions based on usage.
The example above shows that for a given table, T, the values for column B are tracked such that in the example where B<2 is provided, Oracle will know to use the storage index in track 1 and refer only to blocks in the B: 1/5 range. The goal is storage indexes, again, is to eliminate unnecessary IO calls to cells and ASM AUs/disks/cells on which the requested data does not exist. The way it's physically laid out is:
The storage space inside each cell is logically divided into 1Mb chunks called "storage regions"
The boundaries of ASM allocation units (AUs) are aligned with the boundaries of the storage regions.
In each storage region, data distribution statistics are maintained in a memory structure called a region index.
Each region index contains distribution information for up to 8 columns.
The storage index is a collection of region indexes.
The storage statistics represent the data distribution of columns that are considered "well-clustered" by Exadata, and Exadata provides logic to determine what columns are clustered well enough to be included in a storage index. Storage indexes work best when data is roughly ordered so the same column values are clustered together, and when the query predicates contain =, <, >, or some combination of these. The only way to influence the user of a storage index is to load your tables with sorted data.
The tests below are done on Centroid's Exadata X2-2 Quarter Rack Database Machine. We've created a 38 million-row table called MYOBJ, which looks like DBA_OBJECTS. We've then created MYOBJ_ORDER as a copy of MYOBJ ordered by STATUS, then OBJECT_TYPE. We may have to re-visit this test case at some point, but for now we'll test with these tables.
Testing LIKE, BETWEEN, and Wild Cards with Storage Indexes
First, let's test a query with a wild-card and LIKE in the predicate:
From this testing, it appears as if wild cards are not storage index eligible. It's interesting to note that using LIKE will pick up a storage index, if the table is sorted on the predicate column, as long as wild cards are not in play - in reality this is probably just an academic exercise because LIKE is rarely used in predicates without wild carding.
Now let's test BETWEEN in the predicate:
The test above show some interesting things - storage indexes will be used for BETWEEN operations depending on the size of the data requested from the cell. In the case of a predicate between 'V' and 'Z' or 'VALID' and 'Z', which is the majority of the 38 million rows, storage indexes are not used. But in cases when the predicate includes 'INVALID' rows and excludes NULL and VALID, storage indexes will be used. To further test this, let's run the below:
From this we can see that the volume of rows being returned does play a role in storage indexes.
Storage indexes will NOT work for wild-carded SQL statements.
Storage indexes can work with BETWEEN clauses, as well as ">" and "<", as long as the range is low