Archive for July, 2010
EMC RecoverPoint Behavior under Heavy Load
I’m sure this is well-known and documented behavior to RecoverPoint appliance experts, but we recently found what we think to be a very nice feature in EMC’s RecoverPoint behavior that perhaps some folks new to the technology are unaware of …
Today we did a large (1Tb) backup of an assortment of Oracle database files on HP-UX file-systems to a set of ASM disk groups. The ASM disk groups were the source LUNs in a RecoverPoint remote replication consistency group. Our RMAN backup to ASM disk groups was done using 6 channels, and the source ASM LUNs resided in a RAID5 Enterprise Flash Drive Raid Group.
As the RMAN backup job allocated channels and wrote backup copies to the target LUNs, we noticed that the RecoverPoint appliances essentially “suspended” transmission of data with a helpful message about the source volumes being too busy. As channels de-allocated and the RMAN job switched from tablespace to tablespace (it was a tablespace at a time RMAN “backup as copy” operation), the RecoverPoint data transfer would kick back in, then suspend itself again as the drives became busy.
Granted, we configured our consistency groups with pretty much default settings, didn’t set any non-standard or overly aggressive RPOs, and essentially were “out of the box” with a 100Gb-sized RecoverPoint journal size. The nice implication of this behavior is this - in the event we need to do “large” data migration or storage-related activities on our source RP volumes, it looks like we’ll never have to worry about suspending our data transfer manually (through the RP administrative web interface or otherwise) - it all just happens by itself.
One less thing to worry about …
Cloning Oracle Databases with EMC SnapView and RecoverPoint
Many are familiar with the steps required to clone Oracle database using “rman duplicate” or “hot backup” cloning . Many are also familiar with steps required to create EMC SnapView Clones or SnapView Snapshots, either with the Navisphere web interface or CLI. In this post, I’ll outline steps required to build consistent, usable Oracle database “clones” within the framework of the following environment/architecture:
- Oracle 11.1.0.7, HP-UX 11iV2
- EMC CLARiiON CX4 storage arrays at production and DR site
- EMC RecoverPoint appliances at production and DR site
- Source database (production) uses Oracle ASM for its storage
- Requirement is to replicate production data from production storage array to remote “DR” array.
- Requirement is to use this replicated data at the DR site as the source for both SnapView Clones or SnapView Snapshots
- Requirement is to Clone or Snap from DR Replica LUNs, re-create an Oracle control file to build a new database, recover this target database, open with resetlogs, and use it
- The LUN numbers and names for the LUNs that comprise or will comprise the RecoverPoint consistency group
- Ensure (or assume) the LUNs are in a storage group and zoned to the production host
- LUN numbers/names of Replica LUNs (i.e., LUNs in the RecoverPoint Consistency Group)
- LUN numbers/names for all to-be SnapView Clones. When using SnapView Clones, the number and size of Replica LUNs needs to match that on the Clone LUNs for each clone group you’ll be creating
- Sufficient LUNs carved into the Reserve LUN pool to hold snapshot data
- How many SnapView Clones will I need? (will govern how many LUNs to build on the DR array, and place in the primary host storage group)
- How many SnapView Snapshots will I need? (this information, combined with source database size, will help size reserve LUN pool)
- What will the shelf-life be for my snapshots?
- How much DML/DDL will occur in my snapshot instances over time?
- Implement a standard ASM diskgroup naming convention (i.e., PROD_DG1, DEV_DG1, etc)
- Implement strategy for consistent symbolic linking of O/S files to the ASM devices that will be defined in the ASM disk group. For example, if ASM diskgroup PROD_DG1 is designed to use /dev/rdsk/c57t0d0, which is LUN1 on the production CX4 storage array, we should symbolically link /asm/disk1 to /dev/rdsk/c57t0d0 and build the ASM diskgroup with the “/asm/disk1″ string
- Set asm_diskstring in both production and DR server ASM instance to the same thing, with wild-cards. For example, “/asm*/disk*”
- Map Source to Replica LUN numbers
- Map Replica to Clone Group LUN for each Clone Group, and ensure “target” clone LUNs are added to the right storage group
- Implement a strategy for snapshot LUN number conventions. For example, if you will expect to build 3 different snapshots on the Replica LUNs, you can start LUN numbering on the first set at LUN 3000, the second set at LUN 4000, the third set at LUN 5000.
- prodhost = production HP-UX host
- drhost = DR HP-UX host
- PROD = production database name
- CLN1 = 1st clone of PROD using SnapView clones
- CLN2 = 2nd clone of PROD using SnapView clones
- SNP1 = 1st snapshot of PROD using SnapView snapshots
- SNP2 = 2nd snapshot of PROD using SnapView snapshots
- rpa1 = host name of RecoverPoint appliance’s admin interface
- cx4-dr = DNS name of DR CLARiiON CX4, used for NaviSphere
- EMC PowerPath is installed and configured on both prodhost and drhost
- 3 ASM Diskgroups: PROD_DG1, PROD_DG2, and PROD_DG3, all replicating in the RP Consistency group and all used as sources to Clones/Snapshots
- Ensure RPA is transmitting data from primary storage array to DR array (and ensure the consistency groups are setup and functional)
- Ensure an ASM instance is running on drhost
- Obtain LUN numbers to use for the CLN1 clone group from NaviSphere
- Make sure the LUNs are in the proper storage group, zoned to drhost, and visible via EMC PowerPath
- Run “powermt display dev=all” as root and search contents for the Replica LUN and Clone LUN names/numbers.
- Consider primary (PROD) ASM device to HP-UX device mappings and ensure you’ve got it documented. For sake of example:
- Create symbolic link from /asm_c1/disk1 to the target Clone LUN that will be synced from the Replica LUN mapped to the primary LUN for PROD_DG1
- Repeat for /asm_c1/disk2 and /asm_c1/disk3.
- Create SnapView clone on the 3 LUN. Below, assume the Replica LUNs are 1, 2, and 3
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun1CloneGrp_1 -luns 1 -o
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun2CloneGrp_1 -luns 2 -o
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun3CloneGrp_1 -luns 3 -o
- Add target LUNs to clone group and begin synchronizing data. When you create a clone group (above) and specify the “-luns” clause, the LUN number following the “-luns” argument is the source LUN for the clone, which in this case is the Replica LUN on the DR storage array. The following will create a clone group for LUN 11 (mapped to Replica LUN 1), LUN 12 (mapped to Replica LUN 2), and LUN 13 (mapped to Replica LUN 3)
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_1 -luns 11 -syncrate high
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_2 -luns 12 -syncrate high
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_3 -luns 13 -syncrate high
- Wait for clone synchronization to complete. You can use the below to monitor this based on the clone group configurations above:
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_1 | egrep ‘(^Name|^CloneState|^CloneCon|^Percent)’
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_2 | egrep ‘(^Name|^CloneState|^CloneCon|^Percent)’
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_3 | egrep ‘(^Name|^CloneState|^CloneCon|^Percent)’
- Put source database (PROD) in backup mode. First though, grab the max first_change# from V$ARCHIVED_LOG to show the earliest archived redo log we’ll need at a later step …
- Enable “Image Access” on the RecoverPoint Appliance (RPA). This is required to put the source of the clones, which are the RPA Replica LUNs, in a consistent state. If you omit this step you’ll get to the end of this, try to recover your database, and will be left with the only option to recover all the way up through the most current redo log on the primary site - something we don’t want to do … To enable image access through the RPA CLI:
# ssh admin@rpa1 ‘enable_image_access group=<your RP consistency group> copy=<name of copy site> image=latest’
- Fracture your SnapView clone
- Disable image access to RPA
- End backup mode on source
- Modify ASM diskgroup name. On ASM 11gR2, we can use “renamedg” from asmcmd, but since our test is on 11gR1, we need to use kfed to modify the header block of the ASM devices. First, do a “kfed read” on all devices that comprise the target ASM diskgroups you want to mount. Direct this to a text file, edit the file and search for string “grpname”. Change the diskgroup from “PROD_DG” to “CLN1_DG” and save the file. Then, use “kfed merge” to modify the disk.
- Mount ASM diskgroups
- Generate backup controlfile from source environment, edit and save so you have a “CREATE CONTROLFILE” script to use on your CLN1 database
- Build controlfile for CLN1
- At this point, in order for CLN1 to be recoverable, you need the archive log preceding the “begin backup” and archive log after the “end backup” in a place where CLN1 can see them. I use RMAN to copy these archivelogs to a location CLN1 can “see”
- Login to SQL*Plus with CLN1 set and set LOG_ARCHIVE_DEST_1 to the location you’ve copied the source archive logs to.
- Issue a “recover database using backup controlfile”
- Specify the archive logs copied from 3 steps ago, and cancel after the last one
- Open with RESETLOGS
- Add TEMP files
- Do whatever other post-cloning needs to be done
- Put source database in backup mode and note latest archive log
- Enable image access on RPA Replica LUNs (see previous section)
- Start Snapview Session. In the below example, the “-lun 1 2 3″ creates a snapshot session on Replca LUNs 1, 2, and 3
- Create Snapshots
- Activate Snapshots
- Add Snapshot LUNs to EMC Storage group. Assuming storage group is SG_drhost and snapshot LUNs will be named, 3000, 3001, and 3002 respectively. It’s good to map out which snapshot LUN numbers you want to use ahead of time
- Find and fix host (HP-UX) devices so they’re usable. Since we’re added a new set of LUNs to our storage group (3000, 3001, and 3002), we need to do the following on HP-UX for them to be visible and mountable:
# /sbin/init.d/agent stop
# ioscan -fnCdisk
# insf
# /sbin/init.d/agent start
#/sbin/powermt check force dev=all
# /sbin/powermt config
# /sbin/powermt save
Then do a “powermt display dev=all” and search for snp1_1, snp1_2, and snp1_3. Once you find these find the HP-UX device for these and symbolically link /asm_s1/disk1, /asm_s1/disk2, and /asm_s3/disk3 to these
- Disable image access on RPA (see previous section)
- End backup mode on source (see previous section
- Modify ASM block header on target SNP1 (see previous section). Use devices /asm_s1/disk1, /asm_s1/disk2, and /asm_s1/disk3 based on previous steps
- Create ASM diskgroups for SNP1 (see previous section). Reference above devices
- Mount ASM diskgroups for SNP1 (see previous section)
- Generate script to create controlfile (see previous section)
- Build controlfile for SNP1
- Find and backup needed archive logs to destination SNP1 can see (see previous section)
- Recover SNP1 (see previous section)
- Open SNP1 with RESETLOGS and add temp files
- You obviously won’t have to drop/dismount/create/mount ASM disk groups
- You won’t have to modify ASM block headers
- No need to symbolically link to HP-UX device names
- PROD is a production database running on prodhost
- CLN1 is an Oracle copy of production running on drhost and will be a complete SnapView clone of production
- CLN2 is an Oracle copy production running on drhost and will be a complete SnapView clone of production
- SNP1 is an Oracle copy production running on drhost and will be a SnapView snapshot of production
- SNP1 is an Oracle copy production running on drhost and will be a SnapView snapshot of production
- Ensure clone LUNs are in the proper CX4 Storage Group
- # /sbin/init.d/agent stop
- # ioscan -fnCdisk
- # insf
- # /sbin/init.d/agent start
- # /sbin/powermt display dev=all
- Examine output of PowerPath command above and note device names. For sake of example, we’ll focus on the first LUN, /u01, which we want to mount as /u01_cln1. The device for this is /dev/dsk/c80t0d1 (again, for example)
- # vgchgid /dev/dsk/c80t0d1
- # mkdir /dev/vgcln1
- # mknod /dev/vgcln1/group c 64 0×100000 — this “0×100000″ should be unique, check /dev/vg*/group*)
- # vgimport /dev/vgcln1 /dev/dsk/c80t0d1
- # vgchange -a y /dev/vgcln1
- # fsck /dev/vgcln1/lvol1
- # mkdir /u01_cln1
- # mount -o delaylog /dev/vgcln1/lvol1 /u01_cln1











