Exadata Storage Configuration

The purpose of this post is to outline how storage works on Exadata.  We’ll look at the host-based storage on the compute nodes and storage server nodes, and then look at the cell storage characteristics, mapping these to ASM storage and finally, database storage.
Environment Description
The demonstrations in this document will be done using Centroid’s X2-2 Quarter rack.
Compute Node Storage
Let’s start with a “df –k” listing:
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/VGExaDb-LVDbSys1
                      30963708  22221844   7169000  76% /
/dev/sda1               126427     48728     71275  41% /boot
/dev/mapper/VGExaDb-LVDbOra1
                     103212320  57668260  40301180  59% /u01
tmpfs                 84132864     76492  84056372   1% /dev/shm
172.16.1.200:/exadump
                     14465060256 3669637248 10795423008  26% /dump
Other than an NFS mount I’ve got on this machine, we can see a 30GB root file-system, a small boot file-system, and a 100Gb /u01 mount point.  Now let’s look at an fdisk output:
[[email protected] ~]# fdisk -l
Disk /dev/sda: 598.8 GB, 598879502336 bytes
255 heads, 63 sectors/track, 72809 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          16      128488+  83  Linux
/dev/sda2              17       72809   584709772+  8e  Linux LVM
We can see a 600Gb drive partitioned into /dev/sda1 and /dev/sda2 partitions.  We know that /dev/sda1 is mounted to /boot from the df listing, so we also know that the / and /u01 file-systems are built on logical volumes. Before continuing, the SunFire servers that the compute nodes run on use an LSI MegaRaid controller, so we can use MegaCli64 to show the physical hardware:
[[email protected] ~]# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -aALL
System
        OS Name (IP Address)       : Not Recognized
        OS Version                 : Not Recognized
        Driver Version             : Not Recognized
        CLI Version                : 8.00.23
Hardware
        Controller
                 ProductName       : LSI MegaRAID SAS 9261-8i(Bus 0, Dev 0)
                 SAS Address       : 500605b002f054d0
                 FW Package Version: 12.12.0-0048
                 Status            : Optimal
BBU
                 BBU Type          : Unknown
                 Status            : Healthy
Enclosure
                 Product Id        : SGPIO
                 Type              : SGPIO
                 Status            : OK
PD 
                Connector          : Port 0 – 3<Internal>: Slot 3
                Vendor Id          : SEAGATE
                Product Id         : ST930003SSUN300G
                State              : Global HotSpare
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Spun down
                Connector          : Port 0 – 3<Internal>: Slot 2
                Vendor Id          : SEAGATE
                Product Id         : ST930003SSUN300G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal>: Slot 1
                Vendor Id          : SEAGATE
                Product Id         : ST930003SSUN300G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal>: Slot 0
                Vendor Id          : SEAGATE
                Product Id         : ST930003SSUN300G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 278.875 GB
                Power State        : Active
Storage
       Virtual Drives
                Virtual drive      : Target Id 0 ,VD name DBSYS
                Size               : 557.75 GB
                State              : Optimal
                RAID Level         : 5
Exit Code: 0x00
Based on this, we have 4 300Gb drives; one hot spare and 3 active in slots 0, 1, and 2.  The virtual drive created with the internal RAID controller matches up in size with the fdisk listing.  If we do a pvdisplay, we see this:
[[email protected] ~]# pvdisplay
  — Physical volume —
  PV Name               /dev/sda2
  VG Name               VGExaDb
  PV Size               557.62 GB / not usable 1.64 MB
  Allocatable           yes
  PE Size (KByte)       4096
  Total PE              142751
  Free PE               103327
  Allocated PE          39424
  PV UUID               xKSxo7-k8Hb-HM52-iGoD-tMKC-Vhxl-OQuNFG
Note that the PV size equals the virtual drive size from the MegaCli64 output.   There’s a single VG created on /dev/sda2 called VGExaDB:
[[email protected] ~]# vgdisplay
  — Volume group —
  VG Name               VGExaDb
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                3
  Open LV               3
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               557.62 GB
  PE Size               4.00 MB
  Total PE              142751
  Alloc PE / Size       39424 / 154.00 GB
  Free  PE / Size       103327 / 403.62 GB
  VG UUID               eOfArN-08zd-1oD4-C4iu-RJbh-2Pxb-yhWmSW
As you can see, there is about 400Gb of free space on the volume group.  An lvdisplay shows the swap partition, LVDbSys1, and LVDbOra1 (mapped to “/” and “/u01”, respectively):
[[email protected] ~]# lvdisplay
  — Logical volume —
  LV Name                /dev/VGExaDb/LVDbSys1
  VG Name                VGExaDb
  LV UUID                wsj1Dc-MXvd-6haj-vCb0-I8dY-dlt9-18kCwu
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                30.00 GB
  Current LE             7680
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  – currently set to     256
  Block device           253:0
  — Logical volume —
  LV Name                /dev/VGExaDb/LVDbSwap1
  VG Name                VGExaDb
  LV UUID                iH64Ie-LJSq-hchp-h1sg-OPww-pTx5-jQpj6T
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                24.00 GB
  Current LE             6144
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  – currently set to     256
  Block device           253:1
  — Logical volume —
  LV Name                /dev/VGExaDb/LVDbOra1
  VG Name                VGExaDb
  LV UUID                CnRtDt-h6T3-iMFO-EZl6-0OHP-D6de-xZms6O
  LV Write Access        read/write
  LV Status              available
  # open                 1
  LV Size                100.00 GB
  Current LE             25600
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  – currently set to     256
  Block device           253:2
These logical volumes are mapped to /dev/mapper devices like so:
[[email protected] ~]#  ls -ltar /dev/VGExaDb/LVDb*
lrwxrwxrwx 1 root root 28 Feb 20 21:59 /dev/VGExaDb/LVDbSys1 -> /dev/mapper/VGExaDb-LVDbSys1
lrwxrwxrwx 1 root root 29 Feb 20 21:59 /dev/VGExaDb/LVDbSwap1 -> /dev/mapper/VGExaDb-LVDbSwap1
lrwxrwxrwx 1 root root 28 Feb 20 21:59 /dev/VGExaDb/LVDbOra1 -> /dev/mapper/VGExaDb-LVDbOra1
So in short:
– On the compute nodes, file-systems are built on logical volumes
– Logical volumes are build on a volume group based on the LSI MegaRAID controlled devices
Cell Server Storage
Each Exadata storage server has 12 SAS disks.  Each disk in each storage server is the same type – either High Performance (600GB 15K RPM) or High Capacity (2TB or 3TB).  The first two disk drives in each storage cell contained mirrored copies of the Exadata storage server “system area”.  This system area contains the storage server software, storage cell operating system storage, metrics and alert repository, and so forth.  The storage servers used the LSI MegaRAID controller, just like the compute nodes, and if you run lsscsi you’ll see both physical disks and PCI flash disks:
[[email protected] ~]# lsscsi -v
[0:2:0:0]    disk    LSI      MR9261-8i        2.12  /dev/sda
  dir: /sys/bus/scsi/devices/0:2:0:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:0/0:2:0:0]
[0:2:1:0]    disk    LSI      MR9261-8i        2.12  /dev/sdb
  dir: /sys/bus/scsi/devices/0:2:1:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:1/0:2:1:0]
[0:2:2:0]    disk    LSI      MR9261-8i        2.12  /dev/sdc
  dir: /sys/bus/scsi/devices/0:2:2:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:2/0:2:2:0]
[0:2:3:0]    disk    LSI      MR9261-8i        2.12  /dev/sdd
  dir: /sys/bus/scsi/devices/0:2:3:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:3/0:2:3:0]
[0:2:4:0]    disk    LSI      MR9261-8i        2.12  /dev/sde
  dir: /sys/bus/scsi/devices/0:2:4:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:4/0:2:4:0]
[0:2:5:0]    disk    LSI      MR9261-8i        2.12  /dev/sdf
  dir: /sys/bus/scsi/devices/0:2:5:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:5/0:2:5:0]
[0:2:6:0]    disk    LSI      MR9261-8i        2.12  /dev/sdg
  dir: /sys/bus/scsi/devices/0:2:6:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:6/0:2:6:0]
[0:2:7:0]    disk    LSI      MR9261-8i        2.12  /dev/sdh
  dir: /sys/bus/scsi/devices/0:2:7:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:7/0:2:7:0]
[0:2:8:0]    disk    LSI      MR9261-8i        2.12  /dev/sdi
  dir: /sys/bus/scsi/devices/0:2:8:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:8/0:2:8:0]
[0:2:9:0]    disk    LSI      MR9261-8i        2.12  /dev/sdj
  dir: /sys/bus/scsi/devices/0:2:9:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:9/0:2:9:0]
[0:2:10:0]   disk    LSI      MR9261-8i        2.12  /dev/sdk
  dir: /sys/bus/scsi/devices/0:2:10:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:10/0:2:10:0]
[0:2:11:0]   disk    LSI      MR9261-8i        2.12  /dev/sdl
  dir: /sys/bus/scsi/devices/0:2:11:0  [/sys/devices/pci0000:00/0000:00:05.0/0000:13:00.0/host0/target0:2:11/0:2:11:0]
[1:0:0:0]    disk    Unigen   PSA4000          1100  /dev/sdm
  dir: /sys/bus/scsi/devices/1:0:0:0  [/sys/devices/pci0000:00/0000:00:1a.7/usb1/1-1/1-1:1.0/host1/target1:0:0/1:0:0:0]
[8:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdn
  dir: /sys/bus/scsi/devices/8:0:0:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:02.0/0000:1b:00.0/host8/port-8:0/end_device-8:0/target8:0:0/8:0:0:0]
[8:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdo
  dir: /sys/bus/scsi/devices/8:0:1:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:02.0/0000:1b:00.0/host8/port-8:1/end_device-8:1/target8:0:1/8:0:1:0]
[8:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdp
  dir: /sys/bus/scsi/devices/8:0:2:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:02.0/0000:1b:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0]
[8:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdq
  dir: /sys/bus/scsi/devices/8:0:3:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:02.0/0000:1b:00.0/host8/port-8:3/end_device-8:3/target8:0:3/8:0:3:0]
[9:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdr
  dir: /sys/bus/scsi/devices/9:0:0:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:04.0/0000:21:00.0/host9/port-9:1/end_device-9:1/target9:0:0/9:0:0:0]
[9:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sds
  dir: /sys/bus/scsi/devices/9:0:1:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:04.0/0000:21:00.0/host9/port-9:0/end_device-9:0/target9:0:1/9:0:1:0]
[9:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdt
  dir: /sys/bus/scsi/devices/9:0:2:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:04.0/0000:21:00.0/host9/port-9:2/end_device-9:2/target9:0:2/9:0:2:0]
[9:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdu
  dir: /sys/bus/scsi/devices/9:0:3:0  [/sys/devices/pci0000:00/0000:00:07.0/0000:19:00.0/0000:1a:04.0/0000:21:00.0/host9/port-9:3/end_device-9:3/target9:0:3/9:0:3:0]
[10:0:0:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdv
  dir: /sys/bus/scsi/devices/10:0:0:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:02.0/0000:29:00.0/host10/port-10:1/end_device-10:1/target10:0:0/10:0:0:0]
[10:0:1:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdw
  dir: /sys/bus/scsi/devices/10:0:1:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:02.0/0000:29:00.0/host10/port-10:0/end_device-10:0/target10:0:1/10:0:1:0]
[10:0:2:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdx
  dir: /sys/bus/scsi/devices/10:0:2:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:02.0/0000:29:00.0/host10/port-10:2/end_device-10:2/target10:0:2/10:0:2:0]
[10:0:3:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdy
  dir: /sys/bus/scsi/devices/10:0:3:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:02.0/0000:29:00.0/host10/port-10:3/end_device-10:3/target10:0:3/10:0:3:0]
[11:0:0:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdz
  dir: /sys/bus/scsi/devices/11:0:0:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:04.0/0000:2f:00.0/host11/port-11:1/end_device-11:1/target11:0:0/11:0:0:0]
[11:0:1:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdaa
  dir: /sys/bus/scsi/devices/11:0:1:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:04.0/0000:2f:00.0/host11/port-11:0/end_device-11:0/target11:0:1/11:0:1:0]
[11:0:2:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdab
  dir: /sys/bus/scsi/devices/11:0:2:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:04.0/0000:2f:00.0/host11/port-11:2/end_device-11:2/target11:0:2/11:0:2:0]
[11:0:3:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdac
  dir: /sys/bus/scsi/devices/11:0:3:0  [/sys/devices/pci0000:00/0000:00:09.0/0000:27:00.0/0000:28:04.0/0000:2f:00.0/host11/port-11:3/end_device-11:3/target11:0:3/11:0:3:0]
In the above listing, we can tell:
– The “MARVELL” devices are ATA attached PCI flash devices – we’ll cover these shortly
– The “MR9261-8i” LSI devices represent our 12 physical SAS disks.  Since they’re controlled via the LSI MegaRAID controller, we can use MegaCli64 to show more information:
[[email protected] ~]# /opt/MegaRAID/MegaCli/MegaCli64 -ShowSummary -aALL
System
        OS Name (IP Address)       : Not Recognized
        OS Version                 : Not Recognized
        Driver Version             : Not Recognized
        CLI Version                : 8.00.23
Hardware
        Controller
                 ProductName       : LSI MegaRAID SAS 9261-8i(Bus 0, Dev 0)
                 SAS Address       : 500605b002f4aac0
                 FW Package Version: 12.12.0-0048
                 Status            : Optimal
BBU
                 BBU Type          : Unknown
                 Status            : Healthy
Enclosure
                 Product Id        : HYDE12
                 Type              : SES
                 Status            : OK
                 Product Id        : SGPIO
                 Type              : SGPIO
                 Status            : OK
PD 
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 11
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 10
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 9
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 8
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 7
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 6
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 4
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 3
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 2
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 1
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 0
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
                Connector          : Port 0 – 3<Internal><Encl Pos 0 >: Slot 5
                Vendor Id          : SEAGATE
                Product Id         : ST360057SSUN600G
                State              : Online
                Disk Type          : SAS,Hard Disk Device
                Capacity           : 557.861 GB
                Power State        : Active
Storage
       Virtual Drives
                Virtual drive      : Target Id 0 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 1 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 2 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 3 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 4 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 6 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 7 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 8 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 9 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 10 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 11 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
                Virtual drive      : Target Id 5 ,VD name
                Size               : 557.861 GB
                State              : Optimal
                RAID Level         : 0
Exit Code: 0x00
You’ll notice above that we’ve got twelve (12) SEAGATE/ ST360057SSUN600G,557.861 GB high performance disks in this storage server.  Using cellcli, we can confirm this and note the corresponding sizes:
CellCLI> list physicaldisk attributes name,diskType,physicalSize
20:0     HardDisk 558.9109999993816G
20:1     HardDisk 558.9109999993816G
20:2     HardDisk 558.9109999993816G
20:3     HardDisk 558.9109999993816G
20:4     HardDisk 558.9109999993816G
20:5     HardDisk 558.9109999993816G
20:6     HardDisk 558.9109999993816G
20:7     HardDisk 558.9109999993816G
20:8     HardDisk 558.9109999993816G
20:9     HardDisk 558.9109999993816G
20:10     HardDisk 558.9109999993816G
20:11     HardDisk 558.9109999993816G
FLASH_1_0 FlashDisk 22.8880615234375G
FLASH_1_1 FlashDisk 22.8880615234375G
FLASH_1_2 FlashDisk 22.8880615234375G
FLASH_1_3 FlashDisk 22.8880615234375G
FLASH_2_0 FlashDisk 22.8880615234375G
FLASH_2_1 FlashDisk 22.8880615234375G
FLASH_2_2 FlashDisk 22.8880615234375G
FLASH_2_3 FlashDisk 22.8880615234375G
FLASH_4_0 FlashDisk 22.8880615234375G
FLASH_4_1 FlashDisk 22.8880615234375G
FLASH_4_2 FlashDisk 22.8880615234375G
FLASH_4_3 FlashDisk 22.8880615234375G
FLASH_5_0 FlashDisk 22.8880615234375G
FLASH_5_1 FlashDisk 22.8880615234375G
FLASH_5_2 FlashDisk 22.8880615234375G
FLASH_5_3 FlashDisk 22.8880615234375G
Cell Server OS Storage
We know from documentation that the operating system on the Exadata storage servers resides on the first two SAS disks on the cell.  Let’s do a “df –h” from the host:
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6              9.9G  3.6G  5.9G  38% /
tmpfs                  12G     0   12G   0% /dev/shm
/dev/md8              2.0G  647M  1.3G  34% /opt/oracle
/dev/md4              116M   60M   50M  55% /boot
/dev/md11             2.3G  130M  2.1G   6% /var/log/oracle
Based on the /dev/md* device names, we know we’ve got software RAID in play for these devices, and that this RAID was created using mdadm.  Let’s query our mdadm configuration on /dev/md6, /dev/md8, /dev/md4, and /dev/md11:
[[email protected] ~]# mdadm -Q -D /dev/md6
/dev/md6:
        Version : 0.90
  Creation Time : Mon Feb 21 13:06:27 2011
     Raid Level : raid1
     Array Size : 10482304 (10.00 GiB 10.73 GB)
  Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 6
    Persistence : Superblock is persistent
    Update Time : Sun Mar 25 20:50:28 2012
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
         UUID : 2ea655b5:89c5cafc:b8bacc8c:27078485
         Events : 0.49
    Number   Major   Minor   RaidDevice State
       0       8        6        0      active sync   /dev/sda6
       1       8       22        1      active sync   /dev/sdb6
[[email protected] ~]# mdadm -Q -D /dev/md8
/dev/md8:
        Version : 0.90
  Creation Time : Mon Feb 21 13:06:29 2011
     Raid Level : raid1
     Array Size : 2096384 (2047.59 MiB 2146.70 MB)
  Used Dev Size : 2096384 (2047.59 MiB 2146.70 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 8
    Persistence : Superblock is persistent
    Update Time : Sun Mar 25 20:50:16 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
           UUID : 4c4b589f:a2e42e48:8847db6b:832284bd
         Events : 0.78
    Number   Major   Minor   RaidDevice State
       0       8        8        0      active sync   /dev/sda8
       1       8       24        1      active sync   /dev/sdb8
[[email protected] ~]# mdadm -Q -D /dev/md5
/dev/md5:
        Version : 0.90
  Creation Time : Mon Feb 21 13:06:20 2011
     Raid Level : raid1
     Array Size : 10482304 (10.00 GiB 10.73 GB)
  Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 5
    Persistence : Superblock is persistent
    Update Time : Sun Mar 25 04:27:05 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
         UUID : bf701820:0c124b92:9c9bfc74:7d418b3f
         Events : 0.36
    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5
[[email protected] ~]# mdadm -Q -D /dev/md11
/dev/md11:
        Version : 0.90
  Creation Time : Mon Feb 21 13:06:29 2011
     Raid Level : raid1
     Array Size : 2433728 (2.32 GiB 2.49 GB)
  Used Dev Size : 2433728 (2.32 GiB 2.49 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 11
    Persistence : Superblock is persistent
    Update Time : Sun Mar 25 20:50:32 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
         UUID : 9d76d724:5a2e31a1:fa34e9e7:a875f020
         Events : 0.82
    Number   Major   Minor   RaidDevice State
       0       8       11        0      active sync   /dev/sda11
       1       8       27        1      active sync   /dev/sdb11
From the above output, we can see that the /dev/sda and /dev/sdb physical devices are software mirrored via mdadm.  If we do a “fdisk –l”, we see the following:
[[email protected] ~]# fdisk -l
Disk /dev/sda: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16          16        8032+  83  Linux
/dev/sda3              17       69039   554427247+  83  Linux
/dev/sda4           69040       72824    30403012+   f  W95 Ext’d (LBA)
/dev/sda5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sda6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sda7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sda8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sda9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sda10          72433       72521      714861   fd  Linux raid autodetect
/dev/sda11          72522       72824     2433816   fd  Linux raid autodetect
Disk /dev/sdb: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1          15      120456   fd  Linux raid autodetect
/dev/sdb2              16          16        8032+  83  Linux
/dev/sdb3              17       69039   554427247+  83  Linux
/dev/sdb4           69040       72824    30403012+   f  W95 Ext’d (LBA)
/dev/sdb5           69040       70344    10482381   fd  Linux raid autodetect
/dev/sdb6           70345       71649    10482381   fd  Linux raid autodetect
/dev/sdb7           71650       71910     2096451   fd  Linux raid autodetect
/dev/sdb8           71911       72171     2096451   fd  Linux raid autodetect
/dev/sdb9           72172       72432     2096451   fd  Linux raid autodetect
/dev/sdb10          72433       72521      714861   fd  Linux raid autodetect
/dev/sdb11          72522       72824     2433816   fd  Linux raid autodetect
Disk /dev/sdc: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdc doesn’t contain a valid partition table
Disk /dev/sdd: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdd doesn’t contain a valid partition table
Disk /dev/sde: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sde doesn’t contain a valid partition table
Disk /dev/sdf: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdf doesn’t contain a valid partition table
Disk /dev/sdg: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdg doesn’t contain a valid partition table
Disk /dev/sdh: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdh doesn’t contain a valid partition table
Disk /dev/sdi: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdi doesn’t contain a valid partition table
Disk /dev/sdj: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdj doesn’t contain a valid partition table
Disk /dev/sdk: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdk doesn’t contain a valid partition table
Disk /dev/sdl: 598.9 GB, 598999040000 bytes
255 heads, 63 sectors/track, 72824 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdl doesn’t contain a valid partition table
Disk /dev/sdm: 4009 MB, 4009754624 bytes
126 heads, 22 sectors/track, 2825 cylinders
Units = cylinders of 2772 * 512 = 1419264 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdm1               1        2824     3914053   83  Linux
Disk /dev/md1: 731 MB, 731906048 bytes
2 heads, 4 sectors/track, 178688 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md1 doesn’t contain a valid partition table
Disk /dev/md11: 2492 MB, 2492137472 bytes
2 heads, 4 sectors/track, 608432 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md11 doesn’t contain a valid partition table
Disk /dev/md2: 2146 MB, 2146697216 bytes
2 heads, 4 sectors/track, 524096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md2 doesn’t contain a valid partition table
Disk /dev/md8: 2146 MB, 2146697216 bytes
2 heads, 4 sectors/track, 524096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md8 doesn’t contain a valid partition table
Disk /dev/md7: 2146 MB, 2146697216 bytes
2 heads, 4 sectors/track, 524096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md7 doesn’t contain a valid partition table
Disk /dev/md6: 10.7 GB, 10733879296 bytes
2 heads, 4 sectors/track, 2620576 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md6 doesn’t contain a valid partition table
Disk /dev/md5: 10.7 GB, 10733879296 bytes
2 heads, 4 sectors/track, 2620576 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md5 doesn’t contain a valid partition table
Disk /dev/md4: 123 MB, 123273216 bytes
2 heads, 4 sectors/track, 30096 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk /dev/md4 doesn’t contain a valid partition table
Disk /dev/sdn: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdn doesn’t contain a valid partition table
Disk /dev/sdo: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdo doesn’t contain a valid partition table
Disk /dev/sdp: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdp doesn’t contain a valid partition table
Disk /dev/sdq: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdq doesn’t contain a valid partition table
Disk /dev/sdr: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdr doesn’t contain a valid partition table
Disk /dev/sds: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sds doesn’t contain a valid partition table
Disk /dev/sdt: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdt doesn’t contain a valid partition table
Disk /dev/sdu: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdu doesn’t contain a valid partition table
Disk /dev/sdv: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdv doesn’t contain a valid partition table
Disk /dev/sdw: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdw doesn’t contain a valid partition table
Disk /dev/sdx: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdx doesn’t contain a valid partition table
Disk /dev/sdy: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdy doesn’t contain a valid partition table
Disk /dev/sdz: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdz doesn’t contain a valid partition table
Disk /dev/sdaa: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdaa doesn’t contain a valid partition table
Disk /dev/sdab: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdab doesn’t contain a valid partition table
Disk /dev/sdac: 24.5 GB, 24575868928 bytes
255 heads, 63 sectors/track, 2987 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdac doesn’t contain a valid partition table
This is telling us the following:
– /dev/sda[6,8,4,11] and /dev/sdb[6,8,4,11] are partitioned to contain OS storage, mirrored via software RAID via mdadm
– /dev/sdc, /dev/sdd, /dev/sde, /dev/sdf, /dev/sdg, /dev/sdh, /dev/sdi, /dev/sdj, /dev/sdk, and /dev/sdl don’t contain valid partition tables because they’re wholly reserved for database storage
– /dev/sda3 and /dev/sdb3 don’t have usable partitions on them and are used for database storage on the first two disks
LUNs
The hierarchy of non-flash database storage in Exadata storage servers can be represented below:
A LUN is created on each physical disk.  For the first two drives, the LUN maps to the “non-system area” storage component on the physical disk.  For the remainder of the drives, it maps to the entire physical disk.   Exadata cell disks are created on LUNs.  Let’s look at a cellcli output:
CellCLI> list lun attributes name, deviceName, isSystemLun, physicalDrives, lunSize where disktype=harddisk
0_0 /dev/sda TRUE 20:0 557.861328125G
0_1 /dev/sdb TRUE 20:1 557.861328125G
0_2 /dev/sdc FALSE 20:2 557.861328125G
0_3 /dev/sdd FALSE 20:3 557.861328125G
0_4 /dev/sde FALSE 20:4 557.861328125G
0_5 /dev/sdf FALSE 20:5 557.861328125G
0_6 /dev/sdg FALSE 20:6 557.861328125G
0_7 /dev/sdh FALSE 20:7 557.861328125G
0_8 /dev/sdi FALSE 20:8 557.861328125G
0_9 /dev/sdj FALSE 20:9 557.861328125G
0_10 /dev/sdk FALSE 20:10 557.861328125G
0_11 /dev/sdl FALSE 20:11 557.861328125G
CellCLI>
From the above, we can see that the first two LUNs contain system areas (drives 20:0 and 20:1) and the remaining 10 do not.
Cell Disks
Cell disks are created on LUNs, and are the storage entities/abstractions on which grid disks are created:
From cellcli, our celldisks look like this:
CellCLI> list celldisk attributes name,deviceName,devicePartition,lun,size where disktype=harddisk
CD_00_cm01cel01 /dev/sda /dev/sda3 0_0 528.734375G
CD_01_cm01cel01 /dev/sdb /dev/sdb3 0_1 528.734375G
CD_02_cm01cel01 /dev/sdc /dev/sdc 0_2 557.859375G
CD_03_cm01cel01 /dev/sdd /dev/sdd 0_3 557.859375G
CD_04_cm01cel01 /dev/sde /dev/sde 0_4 557.859375G
CD_05_cm01cel01 /dev/sdf /dev/sdf 0_5 557.859375G
CD_06_cm01cel01 /dev/sdg /dev/sdg 0_6 557.859375G
CD_07_cm01cel01 /dev/sdh /dev/sdh 0_7 557.859375G
CD_08_cm01cel01 /dev/sdi /dev/sdi 0_8 557.859375G
CD_09_cm01cel01 /dev/sdj /dev/sdj 0_9 557.859375G
CD_10_cm01cel01 /dev/sdk /dev/sdk 0_10 557.859375G
CD_11_cm01cel01 /dev/sdl /dev/sdl 0_11 557.859375G
CellCLI>
A couple of things to note about the above:
– The cell disk size on the first two drives is about 30GB smaller than the remaining 10 drives – this is because the system area resides on the first two drives
– The device partition on the first two cell disks is /dev/sda3 and /dev/sdb3 – this is what we expected from the fdisk output in the Cell Server OS Storage section
Grid Disks
Grid disks are created on cell disks, and represent the storage available for ASM disks.  In other words, when you create ASM disk groups, the devices you use are grid disks, and grid disks are the disks available to ASM.  From cellcli, we can see the following grid disks:
CellCLI> list griddisk
DATA_CD_00_cm01cel01   active
DATA_CD_01_cm01cel01   active
DATA_CD_02_cm01cel01   active
DATA_CD_03_cm01cel01   active
DATA_CD_04_cm01cel01   active
DATA_CD_05_cm01cel01   active
DATA_CD_06_cm01cel01   active
DATA_CD_07_cm01cel01   active
DATA_CD_08_cm01cel01   active
DATA_CD_09_cm01cel01   active
DATA_CD_10_cm01cel01   active
DATA_CD_11_cm01cel01   active
DBFS_DG_CD_02_cm01cel01 active
DBFS_DG_CD_03_cm01cel01 active
DBFS_DG_CD_04_cm01cel01 active
DBFS_DG_CD_05_cm01cel01 active
DBFS_DG_CD_06_cm01cel01 active
DBFS_DG_CD_07_cm01cel01 active
DBFS_DG_CD_08_cm01cel01 active
DBFS_DG_CD_09_cm01cel01 active
DBFS_DG_CD_10_cm01cel01 active
DBFS_DG_CD_11_cm01cel01 active
RECO_CD_00_cm01cel01   active
RECO_CD_01_cm01cel01   active
RECO_CD_02_cm01cel01   active
RECO_CD_03_cm01cel01   active
RECO_CD_04_cm01cel01   active
RECO_CD_05_cm01cel01   active
RECO_CD_06_cm01cel01   active
RECO_CD_07_cm01cel01   active
RECO_CD_08_cm01cel01   active
RECO_CD_09_cm01cel01   active
RECO_CD_10_cm01cel01   active
RECO_CD_11_cm01cel01   active
CellCLI>
In this configuration, we have:
– 3 different types of grid disks, prefixed by DATA, RECO, and DBFS_DG
– The naming convention is “<PREFIX>_<CELLDISK>_<CELL_SERVER>”, but these can be whatever you’d like.
– We have one grid disk of each type on each cell disk.  This isn’t a requirement, but is probably what you want – when creating ASM disk groups you’d typically wildcard the disk string and ideally, you’d want this storage spread across every physical disks across each cell.
Let’s take a look at a couple of DATA_CD% disks, DATA_CD_00_cm01cel01 and DATA_CD_10_cm01cel01:
CellCLI> list griddisk attributes name,asmDiskGroupName,celldisk,offset,size where name=DATA_CD_10_cm01cel01
DATA_CD_10_cm01cel01 DATA_CM01 CD_10_cm01cel01 32M 423G
CellCLI> list griddisk attributes name,asmDiskGroupName,celldisk,offset,size where name=DATA_CD_00_cm01cel01
DATA_CD_00_cm01cel01 DATA_CM01 CD_00_cm01cel01 32M 423G
CellCLI>
This shows:
– A uniform Grid Disk size of 423G, as specified during Grid Disk creation.  If you create grid disks one-at-a-time without wildcarding the creation, you have flexibility to have different grid disk sizes on each cell disk or across cell disks, but this is probably a bad idea as it upsets the balance of extents to physical disks.
– A byte offset of 32M.  This essentially means that extents start 32MB from the outer section of the physical drive.
Let’s look at all of our grid disks:
CellCLI> list griddisk attributes name,asmDiskGroupName,celldisk,offset,size
DATA_CD_00_cm01cel01   DATA_CM01 CD_00_cm01cel01 32M         423G
DATA_CD_01_cm01cel01   DATA_CM01 CD_01_cm01cel01 32M         423G
DATA_CD_02_cm01cel01   DATA_CM01 CD_02_cm01cel01 32M         423G
DATA_CD_03_cm01cel01   DATA_CM01 CD_03_cm01cel01 32M         423G
DATA_CD_04_cm01cel01   DATA_CM01 CD_04_cm01cel01 32M         423G
DATA_CD_05_cm01cel01   DATA_CM01 CD_05_cm01cel01 32M         423G
DATA_CD_06_cm01cel01   DATA_CM01 CD_06_cm01cel01 32M         423G
DATA_CD_07_cm01cel01   DATA_CM01 CD_07_cm01cel01 32M         423G
DATA_CD_08_cm01cel01   DATA_CM01 CD_08_cm01cel01 32M         423G
DATA_CD_09_cm01cel01   DATA_CM01 CD_09_cm01cel01 32M         423G
DATA_CD_10_cm01cel01   DATA_CM01 CD_10_cm01cel01 32M         423G
DATA_CD_11_cm01cel01   DATA_CM01 CD_11_cm01cel01 32M         423G
DBFS_DG_CD_02_cm01cel01 DBFS_DG   CD_02_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_03_cm01cel01 DBFS_DG   CD_03_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_04_cm01cel01 DBFS_DG   CD_04_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_05_cm01cel01 DBFS_DG   CD_05_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_06_cm01cel01 DBFS_DG   CD_06_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_07_cm01cel01 DBFS_DG   CD_07_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_08_cm01cel01 DBFS_DG   CD_08_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_09_cm01cel01 DBFS_DG   CD_09_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_10_cm01cel01 DBFS_DG   CD_10_cm01cel01 264.046875G 29.125G
DBFS_DG_CD_11_cm01cel01 DBFS_DG   CD_11_cm01cel01 264.046875G 29.125G
RECO_CD_00_cm01cel01   RECO_CM01 CD_00_cm01cel01 211.546875G 105G
RECO_CD_01_cm01cel01   RECO_CM01 CD_01_cm01cel01 211.546875G 105G
RECO_CD_02_cm01cel01   RECO_CM01 CD_02_cm01cel01 211.546875G 105G
RECO_CD_03_cm01cel01   RECO_CM01 CD_03_cm01cel01 211.546875G 105G
RECO_CD_04_cm01cel01   RECO_CM01 CD_04_cm01cel01 211.546875G 105G
RECO_CD_05_cm01cel01   RECO_CM01 CD_05_cm01cel01 211.546875G 105G
RECO_CD_06_cm01cel01   RECO_CM01 CD_06_cm01cel01 211.546875G 105G
RECO_CD_07_cm01cel01   RECO_CM01 CD_07_cm01cel01 211.546875G 105G
RECO_CD_08_cm01cel01   RECO_CM01 CD_08_cm01cel01 211.546875G 105G
RECO_CD_09_cm01cel01   RECO_CM01 CD_09_cm01cel01 211.546875G 105G
RECO_CD_10_cm01cel01   RECO_CM01 CD_10_cm01cel01 211.546875G 105G
RECO_CD_11_cm01cel01   RECO_CM01 CD_11_cm01cel01 211.546875G 105G
CellCLI>
We can see from the above that although the size of the DATA% grid disks is 423G, the byte offset for the RECO disks is at 211G and the DBFS_DG diskgroup at 264G.  This clues us in that our grid disks are built on cell disks defined with interleaving.  We can confirm this by checking our cell disk configuration:
CellCLI> list celldisk attributes name, interleaving
CD_00_cm01cel01 normal_redundancy
CD_01_cm01cel01 normal_redundancy
CD_02_cm01cel01 normal_redundancy
CD_03_cm01cel01 normal_redundancy
CD_04_cm01cel01 normal_redundancy
CD_05_cm01cel01 normal_redundancy
CD_06_cm01cel01 normal_redundancy
CD_07_cm01cel01 normal_redundancy
CD_08_cm01cel01 normal_redundancy
CD_09_cm01cel01 normal_redundancy
CD_10_cm01cel01 normal_redundancy
CD_11_cm01cel01 normal_redundancy
Flash Disks
Before moving on to ASM storage, let’s talk about the PCI Flash Cards in each storage cell.  There are four (4) 96GB PCI flash cards in each server for a total of 384GB of PCI flash per cell. We can see that each flash card is split into four 22.88 areas:
CellCLI> list physicaldisk attributes name,physicalsize,slotnumber where disktype=FlashDisk
FLASH_1_0 22.8880615234375G “PCI Slot: 1; FDOM: 0”
FLASH_1_1 22.8880615234375G “PCI Slot: 1; FDOM: 1”
FLASH_1_2 22.8880615234375G “PCI Slot: 1; FDOM: 2”
FLASH_1_3 22.8880615234375G “PCI Slot: 1; FDOM: 3”
FLASH_2_0 22.8880615234375G “PCI Slot: 2; FDOM: 0”
FLASH_2_1 22.8880615234375G “PCI Slot: 2; FDOM: 1”
FLASH_2_2 22.8880615234375G “PCI Slot: 2; FDOM: 2”
FLASH_2_3 22.8880615234375G “PCI Slot: 2; FDOM: 3”
FLASH_4_0 22.8880615234375G “PCI Slot: 4; FDOM: 0”
FLASH_4_1 22.8880615234375G “PCI Slot: 4; FDOM: 1”
FLASH_4_2 22.8880615234375G “PCI Slot: 4; FDOM: 2”
FLASH_4_3 22.8880615234375G “PCI Slot: 4; FDOM: 3”
FLASH_5_0 22.8880615234375G “PCI Slot: 5; FDOM: 0”
FLASH_5_1 22.8880615234375G “PCI Slot: 5; FDOM: 1”
FLASH_5_2 22.8880615234375G “PCI Slot: 5; FDOM: 2”
FLASH_5_3 22.8880615234375G “PCI Slot: 5; FDOM: 3”
CellCLI>
We can also determine whether this is configured for Smart Flash Cache:
CellCLI> list flashcache detail
name:               cm01cel01_FLASHCACHE
cellDisk:           FD_07_cm01cel01,FD_12_cm01cel01,FD_09_cm01cel01,FD_04_cm01cel01,FD_02_cm01cel01,FD_01_cm01cel01,FD_13_cm01cel01,FD_14_cm01cel01,FD_08_cm01cel01,FD_00_cm01cel01,FD_06_cm01cel01,FD_03_cm01cel01,FD_10_cm01cel01,FD_15_cm01cel01,FD_05_cm01cel01,FD_11_cm01cel01
creationTime:       2012-02-20T23:09:15-05:00
degradedCelldisks:
effectiveCacheSize: 364.75G
id:                 08e69f5d-48ca-4c5e-b614-25989a33b269
size:               364.75G
status:             normal
CellCLI>
In the above output we can see that each Flash Disk is allocated to Flash Cache, for a total size of 364.75G
ASM Storage
As mentioned previously, ASM disk groups are built on storage cell grid disks.  In the example above, we’ve created a diskgroup called DATA_CM01 of type normal redundancy using an InfiniBand-aware disk string wild-card, ‘o/*/DATA*”.  Here’s what the wildcard means:
– “o” means to look for devices over the InfiniBand network.
– The first wildcard, “o/*”, means to build a disk group on devices across all storage server InfiniBand IP addresses.  From the compute node, Oracle determines these by examining cellip.ora.  See below:
[[email protected] ~]$ locate cellip.ora
/etc/oracle/cell/network-config/cellip.ora
/opt/oracle.SupportTools/onecommand/tmp/cellip.ora
[[email protected] ~]$ cat /etc/oracle/cell/network-config/cellip.ora
cell=”192.168.10.3″
cell=”192.168.10.4″
cell=”192.168.10.5″
– The “DATA*” indicates to build the disk group on each grid disk whose name starts with “DATA”
Let’s look more closely at what this ASM disk group looks like:
SQL> select a.name,b.path,b.state,b.failgroup
  2  from v$asm_diskgroup a, v$asm_disk b
  3  where a.group_number=b.group_number
  4  and a.name like ‘%DATA%’
  5  order by 4,1
  6  /
DATA_CM01       o/192.168.10.3/DATA_CD_01_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_04_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_10_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_02_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_06_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_05_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_07_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_08_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_00_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_11_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_03_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.3/DATA_CD_09_cm01cel01  NORMAL   CM01CEL01
DATA_CM01       o/192.168.10.4/DATA_CD_08_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_07_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_02_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_06_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_09_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_05_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_11_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_10_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_04_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_03_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_00_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.4/DATA_CD_01_cm01cel02  NORMAL   CM01CEL02
DATA_CM01       o/192.168.10.5/DATA_CD_02_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_01_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_06_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_10_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_05_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_09_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_08_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_11_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_04_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_07_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_00_cm01cel03  NORMAL   CM01CEL03
DATA_CM01       o/192.168.10.5/DATA_CD_03_cm01cel03  NORMAL   CM01CEL03
36 rows selected.
As we can see, we’ve got 36 disks in this DATA_CM01 disk group, one for each grid disks on each of 3 storage servers (recall we’re on a quarter rack, which has 3 storage cells).  The DBFG_DG and RECO_CM01 ASM disk groups would look very similar.
When we created this ASM disk group, we specified normal redundancy.  With Exadata, external redundancy is not an option – you either need to use normal or high redundancy.  With normal redundancy, each extent is mirrored to a different cell and with high redundancy; it’s mirrored via ASM to two additional cells.  Specifically, extents are mirrored to partner disks in different failure groups.  Let’s take a look at these relationships, focusing on DATA_CM01:
SQL> select group_number,name from v$asm_diskgroup;
GROUP_NUMBER NAME
———— ——————————
  1 DATA_CM01
  2 DBFS_DG
  3 RECO_CM01
SQL>
  1  SELECT count(disk_number)
  2  FROM v$asm_disk
  3* WHERE group_number = 1
SQL> /
COUNT(DISK_NUMBER)
——————
36
Now we’ll see how many partners the disks have:
  1  SELECT disk “Disk”, count(number_kfdpartner) “Number of partners”
  2  FROM x$kfdpartner
  3  WHERE grp=1
  4  GROUP BY disk
  5* ORDER BY 1
SQL> /
Disk Number of partners
———- ——————
0    8
1    8
2    8
3    8
4    8
5    8
6    8
7    8
8    8
9    8
<< output truncated >>
We’ve got 8 partners for each disk.  Now let’s see where they actually reside:
SQL> SELECT d.group_number “Group#”, d.disk_number “Disk#”, p.number_kfdpartner “Partner disk#”
  2  FROM x$kfdpartner p, v$asm_disk d
  3  WHERE p.disk=d.disk_number and p.grp=d.group_number
  4  ORDER BY 1, 2, 3;
Group#      Disk# Partner disk#
———- ———- ————-
         1          0            13
         1          0            16
         1          0            17
         1          0            23
         1          0            24
         1          0            29
         1          0            30
         1          0            34
         1          1             5
         1          1            12
         1          1            18
         1          1            20
         1          1            22
         1          1            23
         1          1            28
         1          1            34
         1          2            12
         1          2            17
         1          2            18
         1          2            20
<< output truncated >>
As we can see, the partner disks span multiple cells.
Grid Infrastructure Storage
On the compute nodes we’re running Oracle 11gR2 Grid Infrastructure and Oracle RAC.  You don’t have to use RAC with Exadata, but most companies do.  With Grid Infrastructure, each compute node accesses the cluster registry (OCR) and mirrored voting disks.  Where do these physically reside on  the Exadata X2-2?.
Let’s take a look:
[[email protected] ~]$ cd $ORACLE_HOME/bin
[[email protected] bin]$ ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          3
Total space (kbytes)     :     262120
Used space (kbytes)      :       3420
Available space (kbytes) :     258700
ID                       : 1833511320
Device/File Name         :   +DBFS_DG
                                    Device/File integrity check succeeded
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
                                    Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
[[email protected] bin]$ asmcmd
ASMCMD> ls
DATA_CM01/
DBFS_DG/
RECO_CM01/
ASMCMD> cd DBFS_DG
ASMCMD> ls
cm01-cluster/
ASMCMD> cd cm01-cluster
ASMCMD> ls
OCRFILE/
ASMCMD> cd OCRFILE
ASMCMD> ls
REGISTRY.255.753579427
ASMCMD> ls -l
Type     Redund  Striped  Time             Sys  Name
OCRFILE  MIRROR  COARSE   MAR 25 22:00:00  Y    REGISTRY.255.753579427
ASMCMD>
[[email protected] bin]$ ./crsctl query css votedisk
##  STATE    File Universal Id                File Name Disk group
—  —–    —————–                ——— ———
 1. ONLINE   948f35d3d9c44f94bfe7bb831758104a (o/192.168.10.4/DBFS_DG_CD_06_cm01cel02) [DBFS_DG]
 2. ONLINE   61fb620328a24f87bf8c4a0ac0275cd1 (o/192.168.10.5/DBFS_DG_CD_05_cm01cel03) [DBFS_DG]
 3. ONLINE   60ab0b9e7dfe4f0abfb16b4344f5ede6 (o/192.168.10.3/DBFS_DG_CD_05_cm01cel01) [DBFS_DG]
Located 3 voting disk(s).
From the above, it looks like:
– The OCR is stored in the DBFS_DG ASM disk group
– 3 copies of the vote disk are also stored in DBFS_DG, with a mirror on cell disks on each storage server
Database Storage
This is the easy part – Oracle uses ASM for database file storage on Exadata.  You are allowed to store files in NFS file-systems, but it’s generally discouraged because Exadata software features won’t be available for IO against these types of files.  Let’s take a look at a sample database’s files:
  1  select name from v$datafile
  2  union
  3  select name from v$tempfile
  4  union
  5* select member from v$logfile
SQL> set echo on
SQL> /
+DATA_CM01/dwprd/datafile/dw_data.559.777990713
+DATA_CM01/dwprd/datafile/dw_indx.563.777990715
+DATA_CM01/dwprd/datafile/dwdim_data.558.777990713
+DATA_CM01/dwprd/datafile/dwdim_indx.560.777990715
+DATA_CM01/dwprd/datafile/dwdiss_data.534.777990711
+DATA_CM01/dwprd/datafile/dwdiss_indx.561.777990715
+DATA_CM01/dwprd/datafile/dwfact_data.557.777990713
+DATA_CM01/dwprd/datafile/dwfact_indx.562.777990715
+DATA_CM01/dwprd/datafile/dwlibrary_data.556.777990713
+DATA_CM01/dwprd/datafile/dwportal_data.541.777990713
+DATA_CM01/dwprd/datafile/dwstage_data.564.777990715
+DATA_CM01/dwprd/datafile/dwstore_data.540.777990713
+DATA_CM01/dwprd/datafile/dwstore_indx.565.777990715
+DATA_CM01/dwprd/datafile/dwsum_data.531.777990709
+DATA_CM01/dwprd/datafile/inf.530.777990709
+DATA_CM01/dwprd/datafile/infolog_data.533.777990711
+DATA_CM01/dwprd/datafile/sysaux.507.774050315
+DATA_CM01/dwprd/datafile/system.505.774050303
+DATA_CM01/dwprd/datafile/undotbs1.506.774050327
+DATA_CM01/dwprd/datafile/undotbs2.448.774050349
+DATA_CM01/dwprd/datafile/usagedim_data.539.777990713
+DATA_CM01/dwprd/datafile/usagedim_indx.566.777990717
+DATA_CM01/dwprd/datafile/usagefact_data.532.777990709
+DATA_CM01/dwprd/datafile/usagefact_indx.567.777990717
+DATA_CM01/dwprd/datafile/usagereport_data.538.777990711
+DATA_CM01/dwprd/datafile/usagereport_indx.568.777990717
+DATA_CM01/dwprd/datafile/usagestage_data.537.777990711
+DATA_CM01/dwprd/datafile/usagestage_indx.569.777990717
+DATA_CM01/dwprd/datafile/usagestore_data.536.777990711
+DATA_CM01/dwprd/datafile/usagestore_indx.570.777990717
+DATA_CM01/dwprd/datafile/usagesum_data.535.777990711
+DATA_CM01/dwprd/datafile/usagesum_indx.571.777990717
+DATA_CM01/dwprd/datafile/users.499.774050361
+DATA_CM01/dwprd/redo01.log
+DATA_CM01/dwprd/redo02.log
+DATA_CM01/dwprd/redo03.log
+DATA_CM01/dwprd/redo04.log
+DATA_CM01/dwprd/redo05.log
+DATA_CM01/dwprd/redo06.log
+DATA_CM01/dwprd/redo07.log
+DATA_CM01/dwprd/redo08.log
+RECO_CM01/dwprd/tempfile/temp.270.778028067
+RECO_CM01/dwprd/tempfile/temp.272.778027033
+RECO_CM01/dwprd/tempfile/temp.438.778027027
+RECO_CM01/dwprd/tempfile/temp.463.774052319