Upgrading Exadata to 11.2.0.3 and Applying BP14

Upgrading Exadata to 11.2.0.3 and Applying Bundle Patch 14
In this blog, I’ll walk you through the abbreviated steps to apply Bundle Patch 14 on our Exadata X2-2 Quarter Rack.  I say “abbreviated” because I’m simply going to bullet all the steps – this is no substitute for reading the various README files.
For BP14, I’m going to apply all patches in a rolling upgrade fashion.  The nodes in our Exadata are:
– cm01dbm01 (Compute node 1)
– cm01dbm02 (Compute node 2)
– cm01cel01 (Cell 1)
– cm01cel02 (Cell 2)
– cm01cel03 (Cell 3)
Preparation
1) Downloaded p13551280_112030_Linux-x86-64.zip from MOS
2) Transferred p13551280_112030_Linux-x86-64.zip to our first compute node, cm01dbm01
3) Unzipped p13551280_112030_Linux-x86-64.zip
4) Read the various README.txt files
5) Login to each storage server, compute node, Infiniband switch, and validate the current versions using “imageinfo”
Patch Contents
Bundle Patch 14 (13551280) contains the latest software versions for the entire Exadata techology stack.  The patch contents are split into 3 sections:
* Infrastructure
– Includes patches for Exadata Storage Server nodes, version 11.2.2.4.2
– InfiniBand switches, version 1.3.3-2
– PDUs, firmware version 1.04
* Database
– Oracle RDBMS 11.2.0.3
– Grid Infrastructure , 11.2.0.3
– OPatch 11.2.0.1.9
– OPlan 11.2.0.2.7
* Systems Management
– EM Agent, 11.1.0.1.0
– EM Plugins for InfiniBand Switches, Cisco switches, PDUs, KVMs, ILOMs
– OMS patches for any/all OMS homes monitoring Exadata targets (11.1.0.1.0)
Patching Storage Servers
1) Transfer 13551280/Infrastructure/ExadataStorageServer/11.2.2.4.2 contents to storage cell cm01cel01:/tmp, our first cell and unzip the zip file
2) Read MOS note 1388400.1 and do the following:
– “# cellcli -e list griddisk where diskType=FlashDisk”.  Make sure we don’t have any Flash Grid disks, which we didn’t.
– “# cellcli -e list physicaldisk attributes name, status, slotNumber”.  Make sure no duplicate disks exists with the same slot number.  In our case, we didn’t have any.
– “# cellcli -e list physicaldisk”.  Make sure they’re all normal.
– “# grep -in ‘Failed to parse the command’ $CELLTRACE/ms-odl.trc*”.  Make sure we don’t have any flash disk population errors.  We didn’t.
– Since our current cell version image is > 11.2.2.2.x, we skipped steps 3a and 3b.
– Transfer validatePhysicalDisks from MOS note to /tmp and run it. It should look like this:
[[email protected] patch_11.2.2.4.2.111221]# /tmp/validatePhysicalDisks
[SUCCESS] CellCLI output and MegaCLI output are consistent.
[[email protected] patch_11.2.2.4.2.111221]#
– Ensure database tier hosts are > 11.2.0.1 to support rolling upgrades.  In our case, they are.
3) Validate that all physical disks have valid physicalInsertTime:
[[email protected] patch_11.2.2.4.2.111221]#  cellcli -e ‘list physicaldisk attributes luns where physicalInsertTime = null’
[[email protected] patch_11.2.2.4.2.111221]#
4) Verify that no duplicate slotNumbers exist.  This was done per MOS note 1388400.1, step 2
5) Obtain LO and serial console access for cell
– Login to cm01cel01-ilom as root
– Type “start /SP/console”
– Login to console as root
6) Check version of ofa by doing “rpm -qa|grep ofa”.  Ours was higher than the minimum version, so we’re OK
7) Since we’re doing in rolling fashion, ensure the Grid Disk are all offline:
[[email protected] ~]#  cellcli -e “LIST GRIDDISK ATTRIBUTES name WHERE asmdeactivationoutcome != ‘Yes'”
[[email protected] ~]# cellcli -e “ALTER GRIDDISK ALL INACTIVE”
GridDisk DATA_CD_00_cm01cel01 successfully altered
GridDisk DATA_CD_01_cm01cel01 successfully altered
GridDisk DATA_CD_02_cm01cel01 successfully altered
GridDisk DATA_CD_03_cm01cel01 successfully altered
GridDisk DATA_CD_04_cm01cel01 successfully altered
GridDisk DATA_CD_05_cm01cel01 successfully altered
GridDisk DATA_CD_06_cm01cel01 successfully altered
GridDisk DATA_CD_07_cm01cel01 successfully altered
GridDisk DATA_CD_08_cm01cel01 successfully altered
GridDisk DATA_CD_09_cm01cel01 successfully altered
GridDisk DATA_CD_10_cm01cel01 successfully altered
GridDisk DATA_CD_11_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_02_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_03_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_04_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_05_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_06_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_07_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_08_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_09_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_10_cm01cel01 successfully altered
GridDisk DBFS_DG_CD_11_cm01cel01 successfully altered
GridDisk RECO_CD_00_cm01cel01 successfully altered
GridDisk RECO_CD_01_cm01cel01 successfully altered
GridDisk RECO_CD_02_cm01cel01 successfully altered
GridDisk RECO_CD_03_cm01cel01 successfully altered
GridDisk RECO_CD_04_cm01cel01 successfully altered
GridDisk RECO_CD_05_cm01cel01 successfully altered
GridDisk RECO_CD_06_cm01cel01 successfully altered
GridDisk RECO_CD_07_cm01cel01 successfully altered
GridDisk RECO_CD_08_cm01cel01 successfully altered
GridDisk RECO_CD_09_cm01cel01 successfully altered
GridDisk RECO_CD_10_cm01cel01 successfully altered
GridDisk RECO_CD_11_cm01cel01 successfully altered
[[email protected] ~]# cellcli -e “LIST GRIDDISK WHERE STATUS != ‘inactive'”
8) Shutdown cell services
[[email protected] ~]# shutdown -F -r now
Broadcast message from root (ttyS0) (Sat Feb 11 19:57:42 2012):
The system is going down for reboot NOW!
audit(1329008264.759:2153236): audit_pid=0 old=7383 by auid=4294967295
type=1305 audit(1329008264.850:2153237): auid=4294967295 op=remove rule key=”time-change” list=4 res=1
9) Since we’re doing this in rolling fashion, activate all disks and check grid disk attributes.  I’m wondering if steps 7 and 8 were actually required, but I believe they were to ensure we had a healthy disk status:
[[email protected] ~]# cellcli -e ‘list griddisk attributes name,asmmodestatus’
DATA_CD_00_cm01cel01   OFFLINE
DATA_CD_01_cm01cel01   OFFLINE
DATA_CD_02_cm01cel01   OFFLINE
(wait)
[[email protected] ~]# cellcli -e ‘list griddisk attributes name,asmmodestatus’ \
> |grep -v ONLINE
10) Ensure network configuration is consistent with cell.conf by running “/opt/oracle.cellos/ipconf -verify”
[[email protected] ~]# /opt/oracle.cellos/ipconf -verify
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf
Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks
11) Prep for patchmgr – ensure that root has user equivalence by running dcli commands below:
[[email protected] ~]# dcli -g cell_group -l root ‘hostname -i’
cm01cel01: 172.16.1.12
cm01cel02: 172.16.1.13
cm01cel03: 172.16.1.14
12) Check pre-requisites by running “./patchmgr -cells ~/cell_group -patch_check_prereq -rolling” from patch stage location:
[[email protected] patch_11.2.2.4.2.111221]# ./patchmgr -cells ~/cell_group \
> -patch_check_prereq -rolling
[NOTICE] You will need to patch this cell by starting patchmgr from some other cell or database host.
20:10-11-Feb:2012        :Working: DO: Check cells have ssh equivalence for root user. Up to 10 seconds per cell …
20:10-11-Feb:2012        :SUCCESS: DONE: Check cells have ssh equivalence for root user.
20:10-11-Feb:2012        :Working: DO: Check space and state of Cell services on target cells. Up to 1 minute …
20:10-11-Feb:2012        :SUCCESS: DONE: Check space and state of Cell services on target cells.
20:10-11-Feb:2012        :Working: DO: Copy and extract the prerequisite archive to all cells. Up to 1 minute …
20:10-11-Feb:2012        :SUCCESS: DONE: Copy and extract the prerequisite archive to all cells.
20:10-11-Feb:2012        :Working: DO: Check prerequisites on all cells. Up to 2 minutes …
20:11-11-Feb:2012        :SUCCESS: DONE: Check prerequisites on all cells.
[[email protected] patch_11.2.2.4.2.111221]#
13) Check ASM disk group repair time.  I’m leaving mine at 3.6 hours:
 1  select dg.name,a.value from v$asm_diskgroup dg, v$asm_attribute a
  2* where dg.group_number=a.group_number and a.name=’disk_repair_time’
SQL> /
NAME
——————————
VALUE
——————————————————————————–
DATA_CM01
3.6h
DBFS_DG
3.6h
RECO_CM01
3.6h
14) Make sure you’re not using the LO or serial console – stay logged into the LO console to monitor things in case things go wrong:
[[email protected] ~]#  echo $consoletype
pty
15) Apply the patch in rolling fashion – note that this will patch cm01cel02 and cm01cel03, since I’m launching it from cm01cel01.  After it’s done, we’ll have to patch cm01cel01.  I should have launched this from a compute node, for some reason I always forget =)
[[email protected] patch_11.2.2.4.2.111221]# ./patchmgr -cells ~/cell_group -patch -rolling
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using
<< outut truncated >>
16) Validate cm01cel02 and cm01cel03 using “imageinfo”:
[[email protected] ~]# imageinfo
Kernel version: 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64
Cell version: OSS_11.2.2.4.2_LINUX.X64_111221
Cell rpm version: cell-11.2.2.4.2_LINUX.X64_111221-1
Active image version: 11.2.2.4.2.111221
Active image activated: 2012-02-11 20:58:06 -0500
Active image status: success
Active system partition on device: /dev/md6
Active software partition on device: /dev/md8
17) Validate Grid disks are active and in the correct state, on both cm01cel02 and cm01cel03, using “cellcli -e ‘list griddisk attributes name,status,asmmodestatus'”
18) Check /var/log/cellos/validations.log on both cm01cel02 and cm01cel03
19) From cm01dbm01 (first compute node), un-staged the Infrastructure patch in /tmp/patch_11.2.2.4.2.111221
20) Create cell_group file containing only “cm01cel01”
21) Check user-equivalence by doing below:
[[email protected] patch_11.2.2.4.2.111221]#  dcli -g cell_group -l root ‘hostname -i’
cm01cel01: 172.16.1.12
[[email protected] patch_11.2.2.4.2.111221]# p
22) Run “./patchmgr -cells cell_group -patch_check_prereq -rolling”
23) Patch cm01cel01 by doing “./patchmgr -cells cell_group -patch -rolling”:
[[email protected] patch_11.2.2.4.2.111221]# ./patchmgr -cells cell_group -patch -rolling
NOTE Cells will reboot during the patch or rollback process.
NOTE For non-rolling patch or rollback, ensure all ASM instances using
24) Check Grid Disk and ASM status on cm01cel01 using “cellcli -e ‘list griddisk attributes name,status,asmmodestatus'”
25) Check imageinfo and /var/log/cellos/validations.log on cm01cel01
26) Cleanup using “./patchmgr -cells cell_group -cleanup” (from cm01dbm01 – it will cleanup on all 3 cells)
27) Login to cm01cel01 to check InfiniBand.  As a side-note, we should have patched IBs first according to something very far down in the README, but luckily our IB versions are in good shape:
[[email protected] oracle.SupportTools]# ./CheckSWProfile.sh -I cm01sw-ib2,cm01sw-ib3
Checking if switch cm01sw-ib2 is pingable…
Checking if switch cm01sw-ib3 is pingable…
Use the default password for all switches? (y/n) [n]: n
Use same password for all switches? (y/n) [n]: y
Enter admin or root password for All_Switches:
Confirm password:
[INFO] SUCCESS Switch cm01sw-ib2 has correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS Switch cm01sw-ib2 has correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5
[INFO] SUCCESS Switch cm01sw-ib3 has correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS Switch cm01sw-ib3 has correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5
[INFO] SUCCESS All switches have correct software and firmware version:
           SWVer: 1.3.3-2
[INFO] SUCCESS All switches have correct opensm configuration:
           controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 for non spine and 8 for spine switch5
[[email protected] oracle.SupportTools]#
28) Apply the minimal pack to database tier hosts (Section 6 of the README.txt).  Start by starting an LO console by SSH-ing into cm01dbm01-ilom and doing “start /SP/console”
28) Check imagehistory by running “# imagehistory”.  We’re in good shape, since we recently applied BP 13
29) Stop dbconsole for each database on cm01dbm01 (and cm01dbm02)
30) Stop cluster using /u01/app/11.2.0/grid/bin/crsctl stop cluster -f -all
31) Stop OSW by running “/opt/oracle.oswatcher/osw/stopOSW.sh”
32) Set memory settings in /etc/security/limits.conf.  On this step, since we’d setup hugepages, the previous values calculated by “let -i x=($((`cat /proc/meminfo | grep ‘MemTotal:’ | awk ‘{print $2}’` * 3 / 4))); echo $x” are commented out, and this is OK.
33) SCP db_patch_11.2.2.4.2.111221.zip from /tmp/patch_11.2.2.4.2.111221 to /tmp on cm01dbm01 and cm01dbm02.  When we apply these patches, we’ll be applying from an SSH session on each Database tier host
34) Unzip /tmp/db_patch_11.2.2.4.2.111221.zip and go to /tmp/db_patch_11.2.2.4.2.111221 directory
35) Run “./install.sh -force” on cm01dbm02.  This will take a little while …
36) While this is running, repeat above on cm01dbm01
37) On cm01dbm02 (first node patched), check imageinfo.  It should look like below:
[[email protected] ~]# /usr/local/bin/imageinfo
Kernel version: 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64
Image version: 11.2.2.4.2.111221
Image activated: 2012-02-11 23:26:55 -0500
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys1
38) Verify the ofa rpm by running “rpm -qa | grep ofa”, comparing against kernel version.  It should look like this:
[[email protected] ~]# rpm -qa | grep ofa
ofa-2.6.18-238.12.2.0.2.el5-1.5.1-4.0.53
[[email protected] ~]# uname -a
Linux cm01dbm02.centroid.com 2.6.18-238.12.2.0.2.el5 #1 SMP Tue Jun 28 05:21:19 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
39) Verify the controller cache is on using “/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0”.  You should see this:
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
40) Run “/opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aAll | grep ‘FW Package Build'” and ensure is says “FW Package Build: 12.12.0-0048”
41) Reboot the server (cm01dbm02) after running “crsctl stop crs”
42) Repeat steps 37-41 on cm01dbm01 when it’s back up
Patching InfiniBand Switches
1) Login to cm01sw-ib2 as root
2) Check the version of the software – on our switches, we’re already at  1.3.3.2 so we didn’t actually need to do anything:
[[email protected] ~]# version
SUN DCS 36p version: 1.3.3-2
Build time: Apr  4 2011 11:15:19
SP board info:
Manufacturing Date: 2010.08.21
Serial Number: “NCD4V1753”
Hardware Revision: 0x0005
Firmware Revision: 0x0000
BIOS version: SUN0R100
BIOS date: 06/22/2010
3) Validate on cm01sw-ib3
Patching PDUs
1) Go to 13551280/Infrastructure/SunRackIIPDUMeteringUnitFirmware/1.04 and unzip the zip file
2) Transfer the *DL files to laptop
3) Login to PDUA (http://cm01-pdua.centroid.com/, in our case)
4) Click on Network Configuration and login as dmin
5) Go down to Firmware Upgrade and Choose MKAPP_V1.0.4.DL and click Submit
6) When done, update the HTML DL file.  It seems to “hang” for a very long time, but eventually …
7) Repeat on cm01-pdub.centroid.com
Upgrade GI Home and RDBMS Home from 11.2.0.2 to 11.2.0.3
Prior to patching the latest BP14 updates to 11.2.0.3, we need to get our GI and RDBMS Homes updated to 11.2.0.3 by following MOS note 1373255.1.  There are a couple of sections of steps required:
– Prepare environments
– Install and Upgrade GI to 11.2.0.3
– Install 11.2.0.3 database software
– Upgrade databases to 11.2.0.3.  In our case, this includes dwprd, dwprod, and visx cluster database
– Do some post upgrade steps
1) Download 11.2.0.3 from https://updates.oracle.com/ARULink/PatchDetails/process_form?patch_num=1… and transfer to /u01/stg on cm01dbm01.
2) Since we’re already at BP13, we don’t need to apply 12539000
3) Run Exachk to validate that the cluster is ready to patch.  MOS Document 1070954.1 contains details.  Download exachk_213_bundle.zip, unzip it, and run:
[[email protected] ~]$ cd /u01/stg/exachk/
[[email protected] exachk]$ ls
collections.dat  exachk_213_bundle.zip       exachk_dbm_121311_115203-public.html  ExachkUserGuide.pdf  readme.txt  UserGuide.txt
exachk ExachkBestPracticeChecks.xls  Exachk_Tool_How_To.pdf     exachk.zip  rules.dat
[[email protected] exachk]$ ./exachk
Our Exachk run showed a couple of issues, and we fixed the following:
– Set processes initialization parameter to 200 for both ASM instances
– Set cluster_interconnects to appropriate interface for dwprd and dwprod1
– Cleaned up audit dest files and trace/trm file for ASM instances, both nodes
– Set filesystemio_options=setall on all instances
– When done, bounced cluster using “crsctl stop cluster -f -all”, followed by “crsctl start cluster -all”
4) Validate readiness of CRS by running cluvfy. Go to <stage>/grid, login as grid, and run this:
[[email protected] grid]$ ./runcluvfy.sh stage -pre crsinst -upgrade \
> -src_crshome /u01/app/11.2.0/grid \
> -dest_crshome /u01/app/11.2.0.3/grid \
> -dest_version 11.2.0.3.0 \
> -n cm01dbm01,cm01dbm02 \
> -rolling \
> -fixup -fixupdir /home/grid/fixit
– Failed on kernel parameters because grid didn’t have access to /etc/sysctl.conf.  Ignore it.
– Failed on bondeth0 and some VIP stuff – ignore it.  I think this is a cluvfy bug
5) Create new GI Homes for 11.2.0.3.  Example below from cm01dbm01, but do this on both nodes:
[[email protected] ~]# mkdir -p /u01/app/11.2.0.3/grid/
[[email protected] ~]# chown grid /u01/app/11.2.0.3/grid
[[email protected] ~]# chgrp -R oinstall /u01/app/11.2.0.3
6) Unzip all the 10404530 software
7) No need to update OPatch software – I didn’t this with the BP14 stuff from patch 13513783 (see next section)
8) Disable AMM in favor of ASMM for ASM instance. Follow the steps in the 1373255.1 document.  In our case, our SPFILE is actually a data file in $GI_HOME/dbs/DBFS_DG instead of the ASM disk group – I’m thinking about moving it with spmove and asmcmd, but I think I’ll hold off for now.  When done it should look like below for both instances:
SQL> select instance_name from v$instance;
INSTANCE_NAME
—————-
+ASM2
SQL> show sga
Total System Global Area 1319473152 bytes
Fixed Size    2226232 bytes
Variable Size 1283692488 bytes
ASM Cache   33554432 bytes
SQL>
9) Bounce databases and ASM and validate __shared_pool_size and __large_pool_size, along with values changed above.  Again, refer to the output above
10) Validate cluster interconnects.  This is how it shoud look – they need to be manually set:
SQL> select inst_id, name, ip_address from gv$cluster_interconnects
  2  /
   INST_ID NAME   IP_ADDRESS
———- ————— —————-
2 bondib0   192.168.10.2
1 bondib0   192.168.10.1
SQL> create pfile=’/tmp/asm.ora’ from spfile;
File created.
SQL> !cat /tmp/asm.ora|grep inter
*.cluster_interconnects=’192.168.10.1′
+ASM1.cluster_interconnects=’192.168.10.1′
+ASM2.cluster_interconnects=’192.168.10.2′
SQL>
11) Shutdown visx, dwprd, and dwprod databases.  I’m going to shutdown everything for now for simplicity’s sake
12) Login as “grid” to cm01dbm01 and unset ORACLE_HOME, ORACLE_BASE, and ORACLE_SID.  Get a VNC session established to we can launch the installer.  Run “./runInstaller” and follow the instructions on 1373255.1.  It will fail on VIP, Node connectivity, and patch 12539000 but I’m crossing my fingers and assuming this is a bug.
(insert deep breath …)
Things did install/upgrade fine, it took about 45 minutes.  The post-install CVU step failed with the same errors are the pre-CVU stage (network/VIP stuff), but this is OK.
13) Stop CRS on both nodes
14) Relink GI oracle executable with RDS
[[email protected] ~]$ dcli -g ./dbs_group ORACLE_HOME=/u01/app/11.2.0.3/grid \
> make -C /u01/app/11.2.0.3/grid/rdbms/lib -f ins_rdbms.mk \
> ipc_rds ioracle
15) Start CRS on both nodes
16) Login as oracle on cm01dbm01 and start a VNC session.  At this point, the 11.2.0.3 software has already been installed.
17) Unset ORACLE_HOME, ORACLE_BASE, and ORACLE_SID, and launch runInstaller.  The pre-req checks will fail on subnet and VIP details, as above – choose to ignore.
18) When installation completes, link oracle with RDS:
dcli -l oracle -g ~/dbs_group ORACLE_HOME=/u01/app/oracle/product/11.2.0.3/dbhome_1 \
          make -C /u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/lib -f ins_rdbms.mk ipc_rds ioracle
19) Copy OPatch from 11.2.0.2 directory to 11.2.0.3 directory – at the same time, might has well do the GI home as “grid”.  When this is done, we can move on to the bundle patch
Patching Compute Nodes to 11.2.0.3 BP 14 (13513783)
1) Go to 13551280/Database/11.2.0.3 on first node and unzip p13513783_112030_Linux-x86-64.zip
2) Extract OPatch from 13551280/Database/OPatch on both cm01dbm01 and cm01dbm02, RDBMS and GI Homes.  For example:
[[email protected] OPatch]# unzip p6880880_112000_Linux-x86-64.zip -d /u01/app/11.2.0/grid/
Archive:  p6880880_112000_Linux-x86-64.zip
replace /u01/app/11.2.0/grid/OPatch/docs/FAQ? [y]es, [n]o, [A]ll, [N]one, [r]ename:
3) Make sure GI/OPatch files are owned by grid:oinstall and RDBMS/OPatch files are owned by oracle
4) Check inventory for RDBMS home (both nodes):
[[email protected] ~]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch lsinventory -detail -oh /u01/app/oracle/product/11.2.0.3/dbhome_1/
5) Check inventory for GI home (both nodes):
[[email protected] ~]$ /u01/app/11.2.0/grid/OPatch/opatch lsinventory -detail -oh /u01/app/11.2.0/grid/
6) Set permissions to oracle:oinstall to patch location:
[[email protected] 11.2.0.3]# chown -R oracle:oinstall 13513783/
[[email protected] 11.2.0.3]#
7) Check for patch conflicts in GI home.  Login as “grid” and run the below.  You will see some conflicts:
[[email protected] ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13513783/
[[email protected] ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13540563/
[[email protected] ~]$ /u01/app/11.2.0/grid/OPatch/opatch prereq \
> CheckConflictAgainstOHWithDetail -phBaseDir /u01/stg/13551280/Database/11.2.0.3/13513783/13513982/
8) Check for patch conflicts on the RDBMS home, as oracle:
[[email protected] 11.2.0.3]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch  prereq CheckConflictAgainstOHWithDetail -phBaseDir ./13513783/13513783/
[[email protected] 11.2.0.3]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch  prereq CheckConflictAgainstOHWithDetail -phBaseDir ./13513783/13540563/custom/server/13540563
9) Login as root and add opatch directory for GI home to path
10) Patch by running the opatch command below – don’t try to patch the GI and RDBMS homes alone:
[[email protected] 11.2.0.3]# opatch auto ./13513783/
Executing /usr/bin/perl /u01/app/11.2.0.3/grid/OPatch/crs/patch112.pl -patchdir . -patchn 13513783 -paramfile /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
opatch auto log file location is /u01/app/11.2.0.3/grid/OPatch/crs/../../cfgtoollogs/opatchauto2012-02-13_00-34-53.log
Detected Oracle Clusterware install
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
OPatch  is bundled with OCM, Enter the absolute OCM response file path:
/u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/ocm/bin/ocm.rsp
11) The above succeeded on the first patch, failed on the second.  So I’m going to patch manually.  First, as oracle, ensure ORACLE_HOME is the new 11.2.0.3 home and run this:
[[email protected] ~]$ srvctl stop home -o $ORACLE_HOME -s /tmp/x.status -n cm01dbm01
12) Unlock crs by running this:
[[email protected] 11.2.0.3]# /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -unlock
Using configuration parameter file: /u01/app/11.2.0.3/grid/crs/install/crsconfig_params
13) Apply the first GI patch:
[[email protected] ~]$ /u01/app/11.2.0.3/grid/OPatch/opatch napply -oh /u01/app/11.2.0.3/grid/ -local /u01/stg/13551280/Database/11.2.0.3/13513783/13540563/
14) Apply second GI patch:
[[email protected] ~]$ /u01/app/11.2.0.3/grid/OPatch/opatch napply -oh /u01/app/11.2.0.3/grid/ -local /u01/stg/13551280/Database/11.2.0.3/13513783/13513982/
15) Login as oracle (database owner) on cm01dbm01 and run pre-script:
[[email protected] scripts]$ pwd
/u01/stg/13551280/Database/11.2.0.3/13513783/13540563/custom/server/13540563/custom/scripts
[[email protected] scripts]$ ./prepatch.sh -dbhome /u01/app/oracle/product/11.2.0.3/dbhome_1/
./prepatch.sh completed successfully.
16) Apply BP patch to RDBMS home on cm01dbm01:
[[email protected] 13513783]$ pwd
/u01/stg/13551280/Database/11.2.0.3/13513783
[[email protected] 13513783]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch napply -oh /u01/app/oracle/product/11.2.0.3/dbhome_1 -local ./13513783/
[[email protected] 13513783]$ /u01/app/oracle/product/11.2.0.3/dbhome_1/OPatch/opatch napply -oh /u01/app/oracle/product/11.2.0.3/dbhome_1 -local ./13540563/custom/server/13540563
17) Run post DB script:
[[email protected] 13513783]$ ./13540563/custom/server/13540563/custom/scripts/postpatch.sh -dbhome /u01/app/oracle/product/11.2.0.3/dbhome_1/
Reading /u01/app/oracle/product/11.2.0.3/dbhome_1//install/params.ora..
18) Run post scripts as root:
[[email protected] 11.2.0.3]# cd /u01/app/oracle/product/11.2.0
[[email protected] 11.2.0]# cd /u01/app/11.2.0.3/grid/
[[email protected] grid]# cd rdbms/install/
[[email protected] install]# ./rootadd_rdbms.sh
[[email protected] install]# cd ../../crs/installq
[[email protected] install]# ./rootcrs.pl -patch
Using configuration parameter file: ./crsconfig_params
20) Repeat steps 11-18 on cm01dbm02, except in this case we need to first apply the first GI patch
21) At this point, we’ve got GI completely upgraded to 11.2.0.3 and a patched 11.2.0.3 home for our RDBMS tier, but our databases still live on 11.2.0.2.  Let’s go on to the next section
Upgrading databases to 11.2.0.3 and Applying CPU BP bundle (see 1373255.1)
1) Start all databases on 11.2.0.2 and make sure they’re healthy.  One thing I screwed up was on the GI installation, I put in the wrong ASMDBA/ASMOPER/ASMADMIN accounts.  This made it impossible for the database instances to start after things were patched (on 11.2.0.2).  I worked around it by adding “oracle” to all the “asm” groups (i.e. made its group membership look like grid).  I’ll fix this later.
2) Run the upgrade prep tool for each database (NEW_HOME/rdbms/admin/utlu112i.sql).  Note that this took quite a long time on our Oracle EBS R12 database
3) Set cluster_interconnects to correct InfiniBand IP.  Below is an example from one of the 3 databases, but they look the same on all:
SQL> select inst_id, name, ip_address from gv$cluster_interconnects;
   INST_ID NAME   IP_ADDRESS
———- ————— —————-
1 bondib0   192.168.10.1
2 bondib0   192.168.10.2
SQL>
4) I don’t have any Data Guard environments or listener_networks setup, so I can skip these sections from the README.txt
5) Launch dbua from the 11.2.0.3 home and to upgrade the first database.  It complained on a few underscore parameters and dictionary statistics, but I chose to ignore and move on.
6) After my first database was upgraded, I validated things by running “srvctl status database”, checking /etc/oratab, checking V$VERSION, etc.
7) Repeat steps 5 and 6 for the remaining databases.
8) On each database that was upgraded, a couple of underscore parameter need to be reset – see below:
SYS @ dwprd1> alter system set “_lm_rcvr_hang_allow_time”=140 scope=both;
System altered.
Elapsed: 00:00:00.04
SYS @ dwprd1> alter system set “_kill_diagnostics_timeout”=140 scope=both;
System altered.
Elapsed: 00:00:00.01
SYS @ dwprd1> alter system set “_file_size_increase_increment”=2143289344 scope=both
  2  ;
System altered.
Elapsed: 00:00:00.00
9) Apply Exadata Bundle Patch.  See below:
/u01/app/oracle/product/11.2.0.3/dbhome_1/rdbms/admin
[[email protected] admin]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.3.0 Production on Mon Feb 13 13:28:55 2012
Copyright (c) 1982, 2011, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
SQL> @catbundle.sql exa apply.
– Make sure no ORA- errors exist in logs in /u01/app/oracle/cfgtoollogs/catbundle
– Check DBA_REGISTRY
10) Start applications and test
11) Start EM dbconsole for each database on first compute node
Finishing Up
1) Validate all your databases, CRS/GI components, etc.
2) Validate all ASM grid disks, Cell status
3) Cleanup staged patched from /tmp and/or other locations
4) Cleanup 11.2.0.2 GI and RDBMS Homes and ensure that all initialization parameters are pointing to the right spots
5) Fix group membership