Exadata: From Order Booking to Operation ("Crate-to-Production" in Approximately 10 Hours)

May 31th, 2011 | Written by John Clarke

It’s well documented and marketed that the Oracle Exadata Database machine installation and configuration process is smooth, easy, and fast, and at Centroid we certainly found this to be the case.  But despite how clean the process was, we thought it would be of value to outline the specific steps required to get our X2-2 Quarter Rack “fully functional”, from the time of booking the order.

 

The Process

 

Shortly after placement of an Exadata X2-2 Database Machine order, Oracle initiates a process to ensure a successful delivery, configuration, and installation.  The process involves the client filling out a couple of documents and running a script:

 

  • The “Exadata Pre-Delivery Survey” (“EXADATA PRE-DELIVERY SURVEY.xlsx”) is an Excel spreadsheet, intended to be filled in by the customer, that requests various pieces of information:
    • A “general client information” section, which requires customer contact names, contact details, addresses, etc.)
    • A series of Site Delivery” questions.  The “Site Delivery” section of the pre-delivery survey is requires the client to fill out various data center logistics questions, such as security requirements, loading dock specifications, and so forth.
    • A section for power-related information. 
    • KVM compatibility questions related to language
    • A series of “Internal delivery” questions geared toward ensuring the dimensions of the Exadata will “fit” in the data center.
  • After the pre-delivery survey is filled out and received by Oracle, Oracle sends a configuration template document.  This configuration template was a PDF document in our case, and it requires a variety of database, cluster, ASM storage, and networking questions.
  • Before scheduling installation resources, Oracle requires a successful completion of a script called “checkip.sh”.  This checkip.sh script validates network configuration details provided in the configuration template detailed above.

Each of the steps above is vital to ensuring a smooth Exadata installation.  Because of the demand for Exadata, scheduling installation resources is challenging and prone to delays if clients don’t fill out the documents above in a timely and accurate manner.  For us, the two most important pre-delivery tasks were:

 

  • Successfully filling out power specifications in the pre-delivery survey.  Getting these wrong means shipment of an Exadata rack with the wrong PDUs installed, which for us caused a month-long delay
  • Successfully getting checkip.sh to complete.  For Centroid, since we were building our own data center network from scratch, this task involved some additional work that would likely not be required for most customers.

 

Upon successful completion of the configuration template, Oracle builds a set of installation scripts tailored to the customer’s requirements and uses these to install all of the Exadata components.

 

Running the checkip.sh script

 

Prior to Oracle scheduling Hardware or Software (ACS) engineers to arrive, Oracle emails a “checkip.sh” script that must successfully complete.  This checkip.sh script is provided by Oracle as part of a zip file, which also includes some other data files including a “dbm.dat” file.  The “dbm.dat” file is basically a template seeded with values specific to the customer’s configuration template, and looks like this:

 

# DOMAIN
DOMAIN "Domain_name" centroid.com

# Name Servers
NAME "Name_Server" 172.16.10.20

# NTP Servers
NTP "NTP_Server" pool.ntp.org

# Gateways
GATEWAY "eth0_Gateway" 172.16.1.1
GATEWAY "eth1/bondeth0_Gateway" 172.16.10.1
GATEWAY "net3_Gateway" 172.16.100.1

# SCAN
SCAN "Scan" cm01-scan 172.16.10.14
SCAN "Scan" cm01-scan 172.16.10.15
SCAN "Scan" cm01-scan 172.16.10.16

# Compute
COMPUTE "Compute" cm01db01 172.16.1.10
COMPUTE "Compute" cm01db02 172.16.1.11
COMPUTE "Compute" cm0101 172.16.10.10
COMPUTE "Compute" cm0102 172.16.10.11

# Cell
CELL "Cell" cm01cel01 172.16.1.12
CELL "Cell" cm01cel02 172.16.1.13
CELL "Cell" cm01cel03 172.16.1.14

# Ilom
ILOM "Ilom" cm01db01-ilom 172.16.1.15
ILOM "Ilom" cm01db02-ilom 172.16.1.16
ILOM "Ilom" cm01cel01-ilom 172.16.1.17
ILOM "Ilom" cm01cel02-ilom 172.16.1.18
ILOM "Ilom" cm01cel03-ilom 172.16.1.19

# Switches
SWITCH "Switch" cm01sw-kvm 172.16.1.20
SWITCH "Switch" cm01sw-ip 172.16.1.21
SWITCH "Switch" cm01sw-ib2 172.16.1.23
SWITCH "Switch" cm01sw-ib3 172.16.1.24
SWITCH "Switch" cm01-pdua 172.16.1.25
SWITCH "Switch" cm01-pdub 172.16.1.26

# Vips
VIP "Vip" cm0101-vip 172.16.10.12
VIP "Vip" cm0102-vip 172.16.10.13

# Net2
# Net3
NET3 "Net3" cm0101-ext 172.16.100.10
NET3 "Net3" cm0102-ext 172.16.100.11

# Cell Alerting

 

The checkip.sh script runs a series of ping, nslookup, and dig commands to validate that the network addresses provided in the configuration template document (and in dbm.dat) are properly configured in DNS and properly not ping-able.  The goal of this is to confirm the configuration template, which is used by hardware and software installers to configure the machine.  

 

DNS Configuration

 

checkip.sh requires that all server addresses for the Exadata X2-2 are registered in DNS and have proper reverse lookup PTR records.  It also ensures gateways are reachable, NTP servers are reachable, and so forth.

Unlike nearly all Exadata clients, Centroid built its data center network infrastructure from the ground-up in conjunction with ordering the Exadata, so in addition for planning or our Quarter Rack we also needed to do all the normal “network build” and configuration tasks.  We didn’t have an existing internal DNS server (or network) to utilize, so we acquired a small 1U server, installed Windows 2003 Small Business Edition on it, plugged it into the “client access” VLAN on the Juniper SRX-240 firewall/router, and configured DNS.

 

Installation Steps

 

We learned in the Oracle Exadata Administrator’s class that when powered on, the Exadata nodes go through their initial boot sequence and then the following tasks are done:

  • Initial network preparation
  • Configure Exadata servers
  • Configure Exadata software
  • Configure database hosts to use Exadata
  • Configure ASM and database instances
  • Configure ASM disk group for Exadata

In reality, the actual delivery and configuration process involves multiple actions taken by different members of the Oracle team, each with specific actions required.  Some of these activities are done according to Oracle documentation or what we learn in training classes, but many are done by a series of configuration scripts (that read from configuration templates mentioned above) that automatically carry out tasks on the various hardware components in the Exadata X2-2 Database Machine.  In short:

  • The Oracle Hardware team gets the rack powered up and validates components are working correctly.  They also perform some (but not all) component and network configuration.
  • Oracle ACS (Advanced Consulting Services) completes the remainder of the node/network setup, patches the system, and installs Oracle software components.
  • Oracle ACS creates a database based on our configuration template details.

 

What Oracle’s Hardware Team Does

 

The Oracle hardware team, comprised mostly of ex-Sun hardware engineers, is responsible for ensuring that the hardware is functional, the network components are configured and working properly, proper power and environment details are in order, and so forth.  For Centroid, since we had to replace 3-phase PDUs with 2-phase PDUs, they were also required to do a PDU-swap as well.  The tasks below outline the steps undertaken by the Oracle and Centroid teams:

  • We performed additional build-out in co-location data center to facilitate our rack
  • We worked with electricians to install power circuits in the data center
  • We physically moved the Exadata to its spot in the data center
  • We racked our Juniper network equipment and DNS server inside the Oracle Exadata rack
  • We connected our DNS server to our network
  • Oracle swapped out the 3-phase PDUs with 2-phase PDUs
  • Oracle validated PDUs had power on all breakers and on both sides
  • Oracle powered up the rack
  • Oracle configured the KVM, added DNS server IP address information, and confirmed firmware
  • Oracle configured the Sun Datacenter 36-port Managed QDR InfiniBand Switch.  They connected the switches, configured IP addresses on them, and modified the IB configuration file.
  • Oracle configured NTP on servers
  • Oracle configured the internal Cisco switch
  • Oracle configured IP addresses the PDUs
  • Oracle booted up the 3 cell servers and 2 compute nodes for the first time
  • Oracle confirmed that the cells booted successfully – they checked connectivity, validated the nodes came up successfully, check memory, checked hard disks, and checked InfiniBand connectivity
  • Oracle validated internal cabling
  • Oracle validated and verified topology
  • We then connected the KVM switch to the data center network’s management VLAN
  • We connected PDUA and PDUB the data center network’s management VLAN
  • We connected Cisco uplink to the data center network’s management VLAN
  • We connected both compute node’s NET1 Ethernet connections to our data center network’s client access VLAN
  • We connected both NET3 interfaces to our additional VLNA

When the cabling was complete, the team validated via the KVM by going to:

https://cm01sw-kvm.centroid.com/index.php

 

From here, we could see the 3 cells and 2 compute nodes and were able to successfully login to each of them.

 

26 Target Devices

 

We launched a KVM session on each node to validate connectivity (as root/welcome1) – see below.

 

27 KVM Session

 

What Oracle ACS Does

 

At the completion of the hardware installation, the following tasks were (and should be) complete:

  • Power validated on rack, and all components inside rack booted successfully
  • All network connections were established
  • The embedded Cisco switch is configured
  • The KVM is configured and functional
  • The PDUs are operational
  • The InfiniBand switches had proper IP addresses and were functional

Oracle ACS performs the remainder of the installation.  At a high level, this consist of three major tasks, each with sub-tasks:

  • Oracle loads Centroid-specific configuration templates/settings on one of the two compute nodes
  • Oracle loads necessary patches on one of the compute nodes – these patches are used during the subsequent steps and get our Exadata prepared to run the most current certified software using the current configuration utilities.
  • Oracle performs the initial network configuration, based on a template file provided as part of the pre-delivery checklist and checkip.sh output, that sets up network connectivity for the compute nodes and storage cells.  This is done using an Oracle “onecommand” script.
  • Oracle runs through a set of 31 deployment steps, each of which installs and configures various components of the Exadata X2-2 Database Machine

 

Staging and Preparing onecommand     

 

The ACS engineer arrived at Centroid with Centroid-specific configuration “data” on a USB drive.  The USB drive is plugged in to cm01dbm01 and the following takes place:

  • Engineer mounts USB drive
    • # mkdir /mnt/usb
    • # mount /dev/sdb1 /mnt/usb
  • Files are copied to /tmp and un-tarred
    • # cp /mnt/usb/onecommand/p10387024_112220_Linux-x86-64.zip /tmp
    • # cd /tmp; unzip ./p1038*zip
  • onecommand  files are placed in /opt/oracle.SupportTools/onecommand
    • # cd in /opt/oracle.SupportTools/onecommand
    • # tar –Ppxvf /tmp/onecmd.tar
  • Patch files from /mnt/usb are copied into /opt/oracle.SupportTools/onecommand/patches
    • # cp /mnt/usb/p10098816_112020_Linux-x86-64_*.zip ./

 

Running applyconfig.sh

 

After the necessary files are copied from the USB drive and staged to /opt/oracle.SupportTools/onecommand, the ACS engineer navigated to /opt/oracle.SupportTools/firstconf and ran the following:

#./applyconfig.sh  quarter /opt/oracle.SupportTools/onecommand/preconf.csv

This script sets IP addresses and other network configuration details on both compute nodes and all three cell servers, and then reboots them. 

When complete, the 31-step deployment process can begin.  First, we installed VNC server on one of the compute nodes and established VNC connectivity, as some of the steps take time and shouldn’t be interrupted.

 

Thirty One Steps, More or Less

 

The 31-step deployment process is what transitions a working Exadata hardware infrastructure into a fully-functional Oracle Exadata Database machine.  Each step should be run sequentially, as root, with the following command:

# /opt/oracle.SupportTools/onecommand/deploy112.sh –i –s <step number>
Log files are written to /opt/oracle.SupportTools/onecommand/tmp and are prefixed with “STEP[N]”, where N is the step number.  The table below outlines the steps.  Note that in certain places, manual steps might need to be done.

 

Step

Command

Description

0

./deploy112.sh –i –s 0

Validates DNS, NTS, params.sh, dbmachine.params, and all files generated by the DB Machine Configurator

1

./deploy112.sh –i –s 1

Setup SSH for root between nodes

2

./deploy112.sh –i –s 2

Validate configuration on all nodes

3

./deploy112.sh –i –s 3

Unzips files

4

./deploy112.sh –i –s 4

Update /etc/hosts

5

./deploy112.sh –i –s 5

Creates cellip.ora and cellinit.ora

6

./deploy112.sh –i –s 6

Validate hardware

7

./deploy112.sh –i –s 7

Validate InfiniBand switches

8

./deploy112.sh –i –s 8

Validate cells and cell groups

9

./deploy112.sh –i –s 9

Check connectivity

10

./deploy112.sh –i –s 10

Calibrate cells

11

./deploy112.sh –i –s 11

Validate time (NTP)

12

./deploy112.sh –i –s 12

Update /etc/security/limits/conf

13

./deploy112.sh –i –s 13

Create user accounts on all nodes

14

./deploy112.sh –i –s 14

Setup SSH for users

15

./deploy112.sh –i –s 15

Create ORACLE_HOMEs

16

./deploy112.sh –i –s 16

Create Grid Disks

17

./deploy112.sh –i –s 17

Install Grid software

18

./deploy112.sh –i –s 18

Run orainstRoot.sh on Grid home

19

./deploy112.sh –i –s 19

Install 11gR2 DB software

20

./deploy112.sh –i –s 20

Create 11g listener

21

./deploy112.sh –i –s 21

Creates ASM disk groups

22

./deploy112.sh –i –s 22

Unlocks GI Home

23

./deploy112.sh –i –s 23

Update OPatch

Manual

Apply Bundle Patch 6

See section below

24

./deploy112.sh –i –s 24

Upgrades GI and DB software binaries to latest versions.  For us, we skipped this and applied Bundle Patch 6 as outlined below

25

./deploy112.sh –i –s 25

Relinks all binaries in both GI and DB Homes.

26

./deploy112.sh –i –s 26

Locks GI Home and starts CRS stack.  Prior to this step, we shutdown CRS services.

27

./deploy112.sh –i –s 27

Sets up Cell email alerts.  We skipped this section because the template files didn’t provide email details in them.  We will configure this later

28

./deploy112.sh –i –s 28

Creates DBM RAC database using DBCA and existing ASM disk groups.

29

./deploy112.sh –i –s 29

Configures Enterprise Manager Database Control on both nodes

30

./deploy112.sh –i –s 30

Applies security fixes and bounces entire Oracle stack on both nodes

Manual

Install latest OS image

See section below

Manual

Validate Flash Cache

See section below

Manual

Conduct Health Check

See section below

31

./deploy112.sh –i –s 31

Re-secures entire Exadata stack, cleans up temporary files, and forces password changes

 

Applying Bundle Patch 6

 

Typically, deploy step 24 patches the Grid Infrastructure and RDBMS Oracle Home to the latest Exadata-certified patch bundle.  This is done automatically during step 24.  In Centroid’s case, Oracle released Bundle Patch 6 (12326685) the same week we performed our configuration, so we transferred the patch and applied manually.  The steps to apply the patch are listed below:

  • Upload p12326685_112020_Linux-x86-64.zip from the USB drive to /opt/oracle.SupportTools/onecommand/patches and unzip it.
  • Upload the latest OPatch patch (p6880880_112000_Linux-x86-64.zip, version OPatch Version: 11.2.0.1.5) to /opt/oracle/SupportTools/onecommand/patches and unzip it.
  • Re-run “/opt/oracle.SupportTools/onecommand/deploy112.sh –i –s 23”, the previous step, to update OPatch in the GI and DB Home on both cm01dbm01 and cm01dbm02 nodes
  • Go to /u01/app/11.2.0/grid/OPatch/ocm/bin and run “emocmrsp” to create the response file, ocm.rsp
  • Repeat the same inside /u01/app/oracle/product/11.2.0/dbhome_1/OPatch/ocm/bin
  • At this point, the OPatch response files will exist on both homes on cm01dbm01.  Since the contents would be the same on both nodes, scp the ocm.rsp file to the locations on cm01dbm02
  • On cm01dbm01:
    • # /u01/app/11.2.0/grid/OPatch/opatch auto ./ -och /u01/app/11.2.0/grid
    • # /u01/app/oracle/product/11.2.0/dbhome_1/OPatch/opatch auto ./ -oh /u01/app/oracle/product/11.2.0/dbhome_1
    • # /u01/app/11.2.0/grid/bin/crsctl stop cluster -all –f
  • On cm01dbm02:
    • # /u01/app/11.2.0/grid/OPatch/opatch auto ./ -och /u01/app/11.2.0/grid
    • # /u01/app/oracle/product/11.2.0/dbhome_1/OPatch/opatch auto ./ -oh /u01/app/oracle/product/11.2.0/dbhome_1
    • # /u01/app/11.2.0/grid/bin/crsctl stop cluster -all –f
  • Reboot both nodes
  • Run “crsctl stat res –t” from the GI home on both nodes after reboot until all resources are up.  All but gsd services should start.  We actually had to reboot cm01dbm02 twice in order for things to “work” as the first time, it couldn’t connect to the InfiniBand network and this services didn’t start

When complete, both Oracle Homes will be upgraded to the latest versions.

 

Install latest OS Image

 

After the deployment process is complete, the OS one each node is upgraded to the latest OS image.  Here, we upgrade the Active Image version on all 5 nodes (compute and cell nodes) to 11.2.2.2.2.

  • Ensure p11721471_112220_Linux-x86-64.zip is on /opt/oracle.SupportTools/onecommand/patches
  • Unzip p11721471_112220_Linux-x86-64.zip
  • Change directories to ./patch_11.2.2.2.2.110311
  • Ensure “cell_group” file exists in current directory (use “locate cell_group” and copy to ./patch_11.2.2.2.2.110311) so that we can use dcli to patch
  • Shutdown entire GI and DB stack
  • # ./patchmgr -cells cell_group –patch

This will update the OS on all 3 cells and  reboot them all

  • When done, cd to /opt/oracle.SupportTools/onecommand/patches/patch_11.2.2.2.2.110311/db_patch_11.2.2.2.2.110311
  • Run “imageinfo” to validate current image
  • Run ./install.sh

 

Validate Flash Cache

 

From time to time, due to a bug in the deployment process, Flash Cache is not built on all cell servers. To validate this
:
# dcli -g ./cell_group cellcli -e list flashcache detail
# dcli –g /cell_group cellcli –e create flashcache all

 

 [root@cm01dbm01 onecommand]# dcli -l root  -g ./cell_group cellcli -e list flashcache
cm01cel03: name:                   cm01cel03_FLASHCACHE

As we can see above, we’ve only got Flash Cache configured on one of the three cell servers. To build it on the other two:

[root@cm01dbm01 onecommand]# dcli -l root -g ./cell_group cellcli -e create flashcache all
cm01cel01: Flash cache cm01cel01_FLASHCACHE successfully created
cm01cel02: Flash cache cm01cel02_FLASHCACHE successfully created
cm01cel03:
cm01cel03: CELL-02645: Flash cache already exists.

 

Health Check

 

After the completion of the deployment, Oracle provides a set of “Health Check “ scripts located in /opt/oracle.SupportTools/onecommand/HealthCheck.  These health check scripts are designed to do just what you’d expect – validate the health of your Exadata Database Machine.  The scripts basically check all the components and ensure everything is in working order.  The Oracle ACS engineer will examine the output and save to his/her USB drive for purposes of documenting the installation.  I won’t go into the details of the verbose health check output at out site, but suffice it to say that we had no reports of problems.

 

Summary

 

The process of installing and configuring Exadata was indeed “as easy as advertised” for us.  Oracle has managed to take a complex Oracle architecture solution and simplify the deployment through a set of simple configuration templates.  For Oracle architects who do this sort of thing for a living, consider what this means to an implementation timeline.  Mixed vendor, high-performing server, storage, networking, and Oracle solutions typically take weeks or months to deploy and test – with our Exadata X2-2 Quarter Rack, we went from “crate-to-production” in about 10 hours.

 

25 Centroid Exatada Machine