Archive for the ‘Oracle 11i’ Category
Cloning Oracle Databases with EMC SnapView and RecoverPoint
Many are familiar with the steps required to clone Oracle database using “rman duplicate” or “hot backup” cloning . Many are also familiar with steps required to create EMC SnapView Clones or SnapView Snapshots, either with the Navisphere web interface or CLI. In this post, I’ll outline steps required to build consistent, usable Oracle database “clones” within the framework of the following environment/architecture:
- Oracle 11.1.0.7, HP-UX 11iV2
- EMC CLARiiON CX4 storage arrays at production and DR site
- EMC RecoverPoint appliances at production and DR site
- Source database (production) uses Oracle ASM for its storage
- Requirement is to replicate production data from production storage array to remote “DR” array.
- Requirement is to use this replicated data at the DR site as the source for both SnapView Clones or SnapView Snapshots
- Requirement is to Clone or Snap from DR Replica LUNs, re-create an Oracle control file to build a new database, recover this target database, open with resetlogs, and use it
- The LUN numbers and names for the LUNs that comprise or will comprise the RecoverPoint consistency group
- Ensure (or assume) the LUNs are in a storage group and zoned to the production host
- LUN numbers/names of Replica LUNs (i.e., LUNs in the RecoverPoint Consistency Group)
- LUN numbers/names for all to-be SnapView Clones. When using SnapView Clones, the number and size of Replica LUNs needs to match that on the Clone LUNs for each clone group you’ll be creating
- Sufficient LUNs carved into the Reserve LUN pool to hold snapshot data
- How many SnapView Clones will I need? (will govern how many LUNs to build on the DR array, and place in the primary host storage group)
- How many SnapView Snapshots will I need? (this information, combined with source database size, will help size reserve LUN pool)
- What will the shelf-life be for my snapshots?
- How much DML/DDL will occur in my snapshot instances over time?
- Implement a standard ASM diskgroup naming convention (i.e., PROD_DG1, DEV_DG1, etc)
- Implement strategy for consistent symbolic linking of O/S files to the ASM devices that will be defined in the ASM disk group. For example, if ASM diskgroup PROD_DG1 is designed to use /dev/rdsk/c57t0d0, which is LUN1 on the production CX4 storage array, we should symbolically link /asm/disk1 to /dev/rdsk/c57t0d0 and build the ASM diskgroup with the “/asm/disk1″ string
- Set asm_diskstring in both production and DR server ASM instance to the same thing, with wild-cards. For example, “/asm*/disk*”
- Map Source to Replica LUN numbers
- Map Replica to Clone Group LUN for each Clone Group, and ensure “target” clone LUNs are added to the right storage group
- Implement a strategy for snapshot LUN number conventions. For example, if you will expect to build 3 different snapshots on the Replica LUNs, you can start LUN numbering on the first set at LUN 3000, the second set at LUN 4000, the third set at LUN 5000.
- prodhost = production HP-UX host
- drhost = DR HP-UX host
- PROD = production database name
- CLN1 = 1st clone of PROD using SnapView clones
- CLN2 = 2nd clone of PROD using SnapView clones
- SNP1 = 1st snapshot of PROD using SnapView snapshots
- SNP2 = 2nd snapshot of PROD using SnapView snapshots
- rpa1 = host name of RecoverPoint appliance’s admin interface
- cx4-dr = DNS name of DR CLARiiON CX4, used for NaviSphere
- EMC PowerPath is installed and configured on both prodhost and drhost
- 3 ASM Diskgroups: PROD_DG1, PROD_DG2, and PROD_DG3, all replicating in the RP Consistency group and all used as sources to Clones/Snapshots
- Ensure RPA is transmitting data from primary storage array to DR array (and ensure the consistency groups are setup and functional)
- Ensure an ASM instance is running on drhost
- Obtain LUN numbers to use for the CLN1 clone group from NaviSphere
- Make sure the LUNs are in the proper storage group, zoned to drhost, and visible via EMC PowerPath
- Run “powermt display dev=all” as root and search contents for the Replica LUN and Clone LUN names/numbers.
- Consider primary (PROD) ASM device to HP-UX device mappings and ensure you’ve got it documented. For sake of example:
- Create symbolic link from /asm_c1/disk1 to the target Clone LUN that will be synced from the Replica LUN mapped to the primary LUN for PROD_DG1
- Repeat for /asm_c1/disk2 and /asm_c1/disk3.
- Create SnapView clone on the 3 LUN. Below, assume the Replica LUNs are 1, 2, and 3
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun1CloneGrp_1 -luns 1 -o
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun2CloneGrp_1 -luns 2 -o
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -createclonegroup -name lun3CloneGrp_1 -luns 3 -o
- Add target LUNs to clone group and begin synchronizing data. When you create a clone group (above) and specify the “-luns” clause, the LUN number following the “-luns” argument is the source LUN for the clone, which in this case is the Replica LUN on the DR storage array. The following will create a clone group for LUN 11 (mapped to Replica LUN 1), LUN 12 (mapped to Replica LUN 2), and LUN 13 (mapped to Replica LUN 3)
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_1 -luns 11 -syncrate high
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_2 -luns 12 -syncrate high
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -addclone -name lun1CloneGrp_3 -luns 13 -syncrate high
- Wait for clone synchronization to complete. You can use the below to monitor this based on the clone group configurations above:
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_1 | egrep ‘(^Name|^CloneState|^CloneCon|^Percent)’
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_2 | egrep ‘(^Name|^CloneState|^CloneCon|^Percent)’
# naviseccli -h cx4-dr -Scope 0 -User admin -Password <Nav pwd> snapview -listclone -name lun1CloneGrp_3 | egrep ‘(^Name|^CloneState|^CloneCon|^Percent)’
- Put source database (PROD) in backup mode. First though, grab the max first_change# from V$ARCHIVED_LOG to show the earliest archived redo log we’ll need at a later step …
- Enable “Image Access” on the RecoverPoint Appliance (RPA). This is required to put the source of the clones, which are the RPA Replica LUNs, in a consistent state. If you omit this step you’ll get to the end of this, try to recover your database, and will be left with the only option to recover all the way up through the most current redo log on the primary site - something we don’t want to do … To enable image access through the RPA CLI:
# ssh admin@rpa1 ‘enable_image_access group=<your RP consistency group> copy=<name of copy site> image=latest’
- Fracture your SnapView clone
- Disable image access to RPA
- End backup mode on source
- Modify ASM diskgroup name. On ASM 11gR2, we can use “renamedg” from asmcmd, but since our test is on 11gR1, we need to use kfed to modify the header block of the ASM devices. First, do a “kfed read” on all devices that comprise the target ASM diskgroups you want to mount. Direct this to a text file, edit the file and search for string “grpname”. Change the diskgroup from “PROD_DG” to “CLN1_DG” and save the file. Then, use “kfed merge” to modify the disk.
- Mount ASM diskgroups
- Generate backup controlfile from source environment, edit and save so you have a “CREATE CONTROLFILE” script to use on your CLN1 database
- Build controlfile for CLN1
- At this point, in order for CLN1 to be recoverable, you need the archive log preceding the “begin backup” and archive log after the “end backup” in a place where CLN1 can see them. I use RMAN to copy these archivelogs to a location CLN1 can “see”
- Login to SQL*Plus with CLN1 set and set LOG_ARCHIVE_DEST_1 to the location you’ve copied the source archive logs to.
- Issue a “recover database using backup controlfile”
- Specify the archive logs copied from 3 steps ago, and cancel after the last one
- Open with RESETLOGS
- Add TEMP files
- Do whatever other post-cloning needs to be done
- Put source database in backup mode and note latest archive log
- Enable image access on RPA Replica LUNs (see previous section)
- Start Snapview Session. In the below example, the “-lun 1 2 3″ creates a snapshot session on Replca LUNs 1, 2, and 3
- Create Snapshots
- Activate Snapshots
- Add Snapshot LUNs to EMC Storage group. Assuming storage group is SG_drhost and snapshot LUNs will be named, 3000, 3001, and 3002 respectively. It’s good to map out which snapshot LUN numbers you want to use ahead of time
- Find and fix host (HP-UX) devices so they’re usable. Since we’re added a new set of LUNs to our storage group (3000, 3001, and 3002), we need to do the following on HP-UX for them to be visible and mountable:
# /sbin/init.d/agent stop
# ioscan -fnCdisk
# insf
# /sbin/init.d/agent start
#/sbin/powermt check force dev=all
# /sbin/powermt config
# /sbin/powermt save
Then do a “powermt display dev=all” and search for snp1_1, snp1_2, and snp1_3. Once you find these find the HP-UX device for these and symbolically link /asm_s1/disk1, /asm_s1/disk2, and /asm_s3/disk3 to these
- Disable image access on RPA (see previous section)
- End backup mode on source (see previous section
- Modify ASM block header on target SNP1 (see previous section). Use devices /asm_s1/disk1, /asm_s1/disk2, and /asm_s1/disk3 based on previous steps
- Create ASM diskgroups for SNP1 (see previous section). Reference above devices
- Mount ASM diskgroups for SNP1 (see previous section)
- Generate script to create controlfile (see previous section)
- Build controlfile for SNP1
- Find and backup needed archive logs to destination SNP1 can see (see previous section)
- Recover SNP1 (see previous section)
- Open SNP1 with RESETLOGS and add temp files
- You obviously won’t have to drop/dismount/create/mount ASM disk groups
- You won’t have to modify ASM block headers
- No need to symbolically link to HP-UX device names
- PROD is a production database running on prodhost
- CLN1 is an Oracle copy of production running on drhost and will be a complete SnapView clone of production
- CLN2 is an Oracle copy production running on drhost and will be a complete SnapView clone of production
- SNP1 is an Oracle copy production running on drhost and will be a SnapView snapshot of production
- SNP1 is an Oracle copy production running on drhost and will be a SnapView snapshot of production
- Ensure clone LUNs are in the proper CX4 Storage Group
- # /sbin/init.d/agent stop
- # ioscan -fnCdisk
- # insf
- # /sbin/init.d/agent start
- # /sbin/powermt display dev=all
- Examine output of PowerPath command above and note device names. For sake of example, we’ll focus on the first LUN, /u01, which we want to mount as /u01_cln1. The device for this is /dev/dsk/c80t0d1 (again, for example)
- # vgchgid /dev/dsk/c80t0d1
- # mkdir /dev/vgcln1
- # mknod /dev/vgcln1/group c 64 0×100000 — this “0×100000″ should be unique, check /dev/vg*/group*)
- # vgimport /dev/vgcln1 /dev/dsk/c80t0d1
- # vgchange -a y /dev/vgcln1
- # fsck /dev/vgcln1/lvol1
- # mkdir /u01_cln1
- # mount -o delaylog /dev/vgcln1/lvol1 /u01_cln1
Large SGA on 32-bit Redhat Linux
Many DBAs are familiar with the SGA size limitations on 32-bit platforms. This post shows how to allocate a 3Gb buffer cache on 32-bit RHAS 3.
- Step 1: Mount /dev/shm to as type ramfs. Edit /etc/fstab and add an entry like this:
none /dev/shm ramfs defaults,size=4G 0 0
- Step 2: do a “mount -a” to mount /dev/shm. I use ramfs instead of tmpfs because it doesn’t use swap; tmpfs does. With ramfs memory allocation will also grow dynamically, whereas when using tmpfs it will not.
- Step 3: As root, assuming your database is owned by the Linux account oracle, whose primary group is “dba”, do this:
# chown oracle:dba /dev/shm
- Step 4: Add the following to /etc/security/limits.conf to increase maximum memory lock parameters. Ensure “oracle” user has his environment sourced to establish these settings.
oracle soft memlock 3145728
oracle hard memlock 3145728
- Step 5: Login as oracle and do “ulimit -a”; validate that memlock is set to the above values
- Step 6: Edit /etc/sysctl.conf and add/change the following. When complete, do “sysctl -w” to activate changes into the Linux kernel. The “vm.hugetbl_pool” setting below is set based on the output of a script provided here (http://download-uk.oracle.com/docs/cd/B28359_01/server.111/b32009/appi_vlm.htm) - run this AFTER Oracle is started at the completion of this document to get a realistic value for vm.hugetbl_pool and adjust accordingly
kernel.sem = 1000 32000 100 150
kernel.shmmax = 4294967295
kernel.shmall = 4194304
net.ipv4.ip_local_port_range = 1024 65000
vm.pagecache = 10 20 30
kernel.shmmni=4096
vm.hugetlb_pool=4096
- Step 7: Unset db_cache_size, db_xk_cache_size, sga_target, sga_max_size, memory_target init.ora parameters and manually set shared_pool_size to appropriate value. You can use “show sga” to determine this
- Step 8: Set use_indirect_data_buffers=true
- Step 9: Set db_block_buffers such that the product of db_block_buffers and db_block_size = 3G
- Step 10: In oracle’s .profile/.bash_profile, set DISABLE_MAP_LOCK=1. This is required to avoid unnecessarily long connect times for databases that are connected to frequently
export DISABLE_MAP_LOCK=1
- Step 11: Stop oracle, source environment, ensure O/S limits are correct (ulimit -a), ensure /dev/shm is owned by oracle (ls -al /dev/shm), ensure DISABLE_MAP_LOCK=1, and then start Oracle
- Step 12: Add the following to /etc/rc2.d/S99local:
mount /dev/shm
chown oracle:dba /dev/shm
Test. Enjoy the benefits of a large cache. Test across reboots to ensure /dev/shm is mounted correctly.
Using DBMS_SQLDIAG
As a DBA, have you ever had an issue you suspected may be a bug and have been asked to generate “test data” for Oracle development? We ran into this at a client recently and found a cool 11g utility (at least I *think* it’s new :)) to get Oracle Support the data they need to reproduce the issue in-house.
The situation had to do with optimizer_features_enable being set to 11.1.0.7 in a database recently upgraded form 9.2.0.5 in an Oracle eBusiness Suite environment (11.5.10.2) that utilized Oracle Reports heavily. Our problem was specific to a custom Report that had optimizer hints embedded in the main query. With optimizer_features_enable set to 11.1.0.7, the report retrieved no rows. With it set to 9.2.0.x or 10.2.0.x, it returned the proper number of rows. Based on this data discrepancy, we felt it safe to set optimizer_features_enable < 11.1.0.7 across the board, as we didn’t know the scope of the issue.
During our work on an SR with Oracle, they asked for us to send the query, all the versions and optimizer settings, as well as an export of all the tables involved in the query with exported optimizer statistics. Our problem was several-fold; first, the query was complex and had many embedded views, so we didn’t really want to spend time deconstructing it to get a comprehensive list of tables to export. Second, the underlying tables were very large (hundreds of millions of rows for a few of them) and we didn’t have disk space or quite frankly, time, to export all the tables in their entirety.
Enter DBMS_SQLDIAG …
Using DBMS_SQLDIAG and 11g Data Pump features, we were able to quickly generate a complete test case to export a subset of the rows from all the impacted tables, very quickly and with minimal disk space requirements. Here’s what we did:
- Grabbed the offending query from a TKPROF’d trace file
- Used this syntax to generate a test case:
declare
tc_out clob;
begin
dbms_sqldiag.export_sql_testcase(directory=>’<directory>’,
sql_text=>’<SQL Text>’,
testcase => tc_out,
exportdata=>TRUE,
samplingpercent=>1);
end;
/
In the above example, note the following:
- <directory> is a valid directory - check DBA_DIRECTORIES
- <SQL Text> is the SQL statement from the TKPROF output
- exportdata=>TRUE tells DBMS_SQLDIAG to export the data from the base tables
- samplingpercent=>1 tells Data Pump to use a 1% sampling size. This was important to limit the number of rows
After executing, a number of XML, log, and Data Pump export dumps are generated to <directory> and available to upload to the SR!
One additional step we took was to export table statistics for all the tables involved in the query. For this, we looked in the log file for all tables export and used DBMS_STATS.EXPORT_TABLE_STATS to export segment statistics.
The obvious benefit here is that it enabled us to continue working on the SR, but some other possible applications of DBMS_SQLDIAG could be for internal testing purposes, testing functionality/performance across versions of Oracle without a complicated upgrade, regression testing, and so forth.
Future Proofing Oracle EBS: Tip 1
Most organizations leave their mission critical systems vulnerable to security threats and performance degradation. Recent studies show that organizational constraints are at the core of this issue. These constraints range from organizational policies, budgets and staff availability and even competency.
26 percent of organizations apply critical security and performance patches as they are released, 20 percent will eventually apply 6 to 9 months later while another 20 percent have no requirement at all to apply them. That leaves many companies data unprotected.
There are several ways to combat these constraints. One way is gain support from your organization. This can come in the form of support from executives, security / audit team and other critical policy makers. By establishing new and or updating existing security policies, and as a critical component, patching, these guidelines and timeliness of patching removes some of these constraints. A second way to protect and improve data integrity is to use technology. By keeping current on latest software, technology can help you reduce time and budget constraints of getting current applications and databases tested and allow your team to proactively assess the impact to your enterprise.
Centroid believes in 2 degrees of continuous change instead of wide arcs of re-activeness. These small degrees of change allows for continuous improvement across the organization. Whether it be through technology or process slight proactive adjustments make the enterprise perform at a higher degree then you can ever accomplish by large scale reactive measures.
Centroid is in the business of Future Proofing Oracle environments. In fact, Centroid is the leader in Future Proofing your Business and Technology and no one does it better. No one.
Oracle EBS 12.1.1 Upgrade from 12.0.6 - Lessons Learned
So you’re going to patch from 12.0.6 to 12.1.1? It’s basically like any other major upgrade with Oracle’s e-Business Suite; tons of pieces to upgrade, patch, validate, and a high potential for running into problems. The documentation for all the “gotchas” is all over the place, but Metalink document 752619.1 is the best place to start and has all the major steps listed. In general, the following is required:
- Confirm database version
- Upgrade 10.1.3 home to 10.1.3.4
- Upgrade 10.1.2 home to 10.1.2.3
- Apply necessary pre-req patches
- Apply maintenance pack (7303030)
- Apply post-maintenance pack patches
- Do all the post-upgrade steps
Below are some of the lessons learned in an environment in which we patched 12.0.6 to 12.1.1 and along the way, upgraded the 10.2.0.4 database to 11.1.0.7.
Oracle XDB core-dumps (look for ORA-31114)
- Follow Metalink document 760611.1
- We’re not sure exactly why this happened, but something in the XDB upgrade from 10.2.0.4 to 11.1.0.7 failed
AZ tables/objects invalid and workers fail on 7303030 (azR12active.sql)
- Result of XDB core-dump issue above
- Follow Metalink document 832459.1
Worker fails on EGOSILDU.sql
- Per Metalink document 847687.1, this is a bug patched with 8485942. This patch, 8485942, isn’t available as of 10/10/2009
- Per the recommendation in 847687.1, skipped worked and continued
Relink fails on RAXTRX
- Choose to continue. This is due to bug
- Look in 841113.1 for patches. These are the ones I found that fixed this, plus some other issues found during testing …
8239041
7721420
7660309
7673650
8393427
8414254
8414069
Failures during 7303030 compiling multiple forms and libraries
- Choose to continue. Manually recompile using adadmin after the patch, and if things still fail look for patches
Issues applying 10.1.3.4 IAS_ORACLE_HOME patch (7272722)
- Receive failures trying to start OPMN while at 83% complete on initial screen
- Environment was cloned and adcfgclone’d successfully from a 12.0.6 instance
- adcfgclone.pl will update all the right stuff in $ORA_CONFIG_HOME correctly, but several files in $IAS_ORACLE_HOME will not be updated with the correct environment-specific values
- Change hard-coded paths in the following:
$IAS_ORACLE_HOME/inventory/Clone/clone.xml
$IAS_ORACLE_HOME/config/ias.properties
$IAS_ORACLE_HOME/opmn/bin/opmnctl
$IAS_ORACLE_HOME/Apache/Apache/bin/iasobf
- Make sure you can run $IAS_ORACLE_HOME/opmn/bin/opmnctl startall|stopall|status
- Resume 7272722
Failures with 7303030 doing FndXdfCmp
- I’m pretty sure this was a site-specific issue related to not cloning cleanly
- If the adworkxxx.log file looks like this …
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at oracle.apps.ad.worker.AdJavaWorker.invokeUtility(AdJavaWorker.java:501)
at oracle.apps.ad.worker.AdJavaWorker.main(AdJavaWorker.java:356)
Caused by: java.lang.NoSuchFieldError: mTypeData
at oracle.apps.fnd.odf2.FndInitType.doCmpType(FndInitType.java:156)
at oracle.apps.fnd.odf2.FndXdfCmp.doOdfCmp(FndXdfCmp.java:98)
at oracle.apps.fnd.odf2.FndXdfCmp.applyXDF(FndXdfCmp.java:186)
- … hopefully you’ve got an existing 12.1.1 instance to refer to
- Look in $JAVA_TOP/oracle/apps/odf2 for class files containing mTypeData. You should see 3 class files
- Restore anything missing from a clean instance
- Generate all JAR Files
- Kill patch
- Resume
Patch takes a very long time on adobjcmp.sql (11.1.0.7)
- This will take a long time, it’s compiling all invalid objects and there will be 50,000+
- Look for lots of time on “resmgr:cpu quantum” wait event
- Do “alter system set resouce_manager_plan=””
- Bounce the database when everything is done
Are you trying to use RapidWiz to Upgrade to 12.1.1 from 12.0.6?
- Don’t. This is for an 11.5.x upgrade
More to come as I find them, I’m sure …
Limiting Off-shore Outsourcing firms H-1Bs in the USA
Recently “The Business Week” June 15, 2009 ran an article titled ‘Taking AIM at outsourcers on US Soil’ If passed, the bill would bar companies with more than 50 employees in the U.S. from landing any additional work visas if more than half their US workforce is made up of H-1B or L-1 visa holders. It definately appears there will be some controversy up on the hill. If acted upon it would put a stop to the addition of H-1B resources coming to the US. Senators Dick Durbin (D-Ill.) and Chuck Grassley (R-Iowa) are the two authoring the bill. It would be good to hear your opinion on the subject…..I have mine!
Using Configurator in R12
After back and forth volleys with Oracle Support in getting Configurator to work correctly in R12 with SSL and load balancers (as well as great support from our DBA Team) I thought it would be good to share what is missing from some documentation within metalink (at least at time of this post) and hope it saves you time in the setups.
1) Review Section 4.4.3.1 AltBatchValidateURL, Oracle® Configurator Implementation Guide, Release 11i, Part No. B13604-03 and also Oracle® Configurator Implementation Guide Release 12 Part No. B28682-02 page 4-15. Note that the R12 documentation should be updated as the URL needs to be slightly different for R12 as in step 2 below. This is what caused our issue!
2) Check the value of the setting in cz_db_settings for AltBatchValidateURL. Use this statement to check it:
SELECT * from cz_db_settings where upper(setting_id) = ‘ALTBATCHVALIDATEURL’ If no rows are returned then run the following sql insert statement:
INSERT INTO cz_db_settings (setting_id, section_name, data_type, value, desc_text) VALUES (’AltBatchValidateURL’,’ORAAPPS_INTEGRATE’,4,’http://hostname:portnum/OA_HTML/configurator/UiServlet’,’Non-secure URL’)
If a row is returned then validate that the value is correct based on the above url. If not then you can update the cz_db_settings table as follows: update cz_db_settings
set value = ‘http://hostname:portnum/OA_HMTL/configurator/UiServlet’ where setting_id = ‘AltBatchValidateURL’
3) Validate the the profile option for “BOM:Configurator URL of UI Manager” is set to: http://hostname:portnum/OA_HTML/configurator/UiServlet
Hopefully the R12 documentation is updated in the near future to guide others in the process.
Will Cutting Oracle Maintenance Fees Save You Money?
In the current economy you’re probably looking under every rock to see where you can cut costs and software support and maintenance fees may be something for you to consider. These fees can be significant. Take your Oracle support fees for example, roughly 22% of original license fees paid annually.
If it’s been a while since you last reviewed your support agreements, start there. You may be paying support fees for applications you no longer use. Those should be a no-brainer, get them on the chopping block ASAP. Others may not be so straightforward. A common example are those modules you purchased licenses for, but shelved for whatever reason. They were cut from the scope of the original implementation or maybe they were purchased with thoughts of using down the road. Now you are paying support fees for something you’re not using and may or may not use in the future. What should you do now?
If you’re considering terminating the support on these modules, there are a few things to keep in mind. Oracle requires that you drop the software licenses when you stop paying for annual support, which means, when/if you decide to implement those modules down the road, you’ll need to purchase new licenses. Oracle will also hit you up with reinstatement fees including retroactive support and maintenance fees and additional penalties. So keep this in mind when kicking around the options.
So I guess the answer to my original question is….it depends. It depends on many factors that you’ll need to consider before making any changes to your support agreements. Before making any changes, review your current agreements and read over the latest cancellation policies and do what makes sense for you.
“Oracle is slow, can you see if anything is going on?”
As DBAs, we’re faced with this question all the time. In order to quickly supply an accurate answer to this question, an experienced Oracle DBA needs to have a few tools in his belt - and I’m not talking about any special software or monitoring solutions, I’m talking simply SQL*Plus scripts and access to a database account with access to the V$ views.
Here’s what I do when someone asks me this question:
Step #1:
Take a look at V$SESSION_WAIT. This will show you details about sessions currently and actively waiting on named Oracle wait events. More often than not, if things are “slow”, a session or sessions is waiting on an instrumented Oracle wait event. The query I use to do this (works on 9i-11g) is below:
select A.sid,
decode(A.event,'null event','CPU Exec',A.event) WaitEvent,
decode(A.event,'slave wait','N/A',
'PX Deq: Execution Msg','N/A',
'PX Deq Credit: send blk','N/A',
'latch free','N/A',
'enqueue',
chr(bitand(A.p1,-16777216)/16777215)||chr(bitand(A.p1,16711680)/65535),
'file open','-1',to_char(A.p1)) p1,
decode(A.event,'enqueue',decode(mod(A.p1,16),'6','ROW-LOCK','4','ITL','3',
'FK?','OTHER'),
'file open',
-1,
A.p2) p2,
decode(A.event,'latch free','N/A','enqueue',null,'PX qref latch','-1',
'buffer busy waits',to_char(A.p3), A.p3) p3,
decode(A.state,'WAITING','WTG',
'WAITED UNKNOWN TIME','UNK',
'WAITED SHORT TIME','WST',
'WAITED KNOWN TIME','WKT') wait_type,
decode(A.state,'WAITING',A.seconds_in_wait,
'WAITED UNKNOWN TIME',-999,
'WAITED SHORT TIME',A.wait_time,
'WAITED KNOWN TIME',A.WAIT_TIME) wt,
round((last_call_et/60),2) lc,
substr(nvl(b.module,b.program),1,15) pgm
from v$session_wait A,
v$session B
where A.event not in ('Queue Monitor Slave Wait','wait for unread message on broadcast channel','Queue Monitor Wait','jobq slave wait','queue messages','SQL*Net message to client','Null event','rdbms ipc message','i/o slave wait','io done')
and A.event <> 'pipe get'
and A.event not like '%akeup%'
and A.event not like 'Streams AQ%'
and A.state in ('WAITING','WAITED KNOWN TIME')
and A.sid=B.sid
and B.status='ACTIVE'
order by 1
/
Sample output is below (you’ll have to set column headings and other SQL*plus formatting options, but you get the point):
Sid Wait Event P1 P2 P3 Typ Time last call What
------ ------------------------------ ---------- ---------- ---------- --- -------- ---------------
4518 gc buffer busy 24 38019 65537 WTG 0 .00
4519 gc buffer busy 24 38019 65537 WTG 0 .00 XXVG_INV_PICKLI
4680 gc buffer busy 24 38019 65537 WTG 0 .00
4830 gc buffer busy 24 38019 65537 WTG 0 .00 FNDRSSUB
4886 smon timer 300 0 0 WTG 29 18447.47 oracle@usplsvpe
4887 control file parallel write 2 4 2 WTG 0 18447.47 oracle@usplsvpe
4893 gcs remote message 24 0 0 WTG 0 18447.47 oracle@usplsvpe
4895 gcs remote message 24 0 0 WTG 0 18447.47 oracle@usplsvpe
4896 ges remote message 64 0 0 WTG 152 18447.47 oracle@usplsvpe
4899 DIAG idle wait 1 1 200 WTG 1106848 18447.47 oracle@usplsvpe
4900 pmon timer 300 0 0 WTG 680 18447.47 oracle@usplsvpe
In this output, you’ll see a handful of sessions waiting on “gc buffer busy” wait events. At this point, it’s time for the Oracle DBA to study up on what the wait events mean; in this case, sessions are waiting on RAC-related global buffer busy waits, which means that blocks are being used and are pinned in another instance’s cache. I won’t go into a description on what all the wait events mean here - you can look them up at any of the following URLs:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents.htm#REFRN101
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm#BGGIBDJI
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/instance_tune.htm#i22670
http://www.adp-gmbh.ch/ora/tuning/event.html
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=34405.1
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=62172.1
At this point, you know who’s waiting on what and you can use the output to look for anomalies for the current environment. A couple of things to note:
- There are a handful of common wait events in any “busy” Oracle environment; specifically, “db file sequential read”, “db file scattered read”, latch-related, enqueue-related (locks), etc. You should be familiar with what types of waits are “normal” for a given system.
- You should become familiar with the relative quantity of each type of wait for each system at various times during the day. For example, at client A, with a new implementation, low transaction volume, not many users, you may never see more than a handful of I/O-related waits at any given time. At this client, if you see several dozen sessions waiting on the same type or class of wait event, it’s probably a cause for concern. At a different client, it may be typical to see 20 or 30 I/O-related waits at any given time. Bottom line is this - you need to have familiarity with the system you’re monitoring.
- Any DBA with his salt should become study Oracle’s wait interface and become familiar with what each of the major wait events means
- You can use Centroid’s “CCEO Infra Wait Interface.ppt” document as a quick reference on the wait interface
Step #2:
Grab the SQL for the sessions that show up repeatedly and frequently in the output from the above query. Note the SID (Session Identifier) and use it as input to the following script:
select
t.sql_fulltext ,
t.buffer_gets, t.disk_reads,t.executions
from v$session s,
v$sql t
where s.sql_address =t.address and s.sql_hash_value =t.hash_value
and s.sid = &&1
/
Format the out of this if you plan on running an execution plan on it.
If you want additional detail about the session(s) from V$SESSION_WAIT, you can query V$SESSION.
Step #3:
If the SQL statements extracted from the previous step are waiting on I/O-related or contention-related waits, you should grab an execution plan/explain plan by taking the formatted SQL and plugging into the below script:
set lines 120
explain plan for
<< insert SQL here >>
select * from table(dbms_xplan.display(null, null,'all'));
Step #4:
If the slowness is related to, for example, locks (enqueue waits), find out who the lock holder(s) is by querying V$LOCK or DBA_WAITERS and make a judgement call as to whether to kill the session(s) holding the lock, communicate with the end-user, etc.
Step #5:
Fix it. This could be a quick-fix (resolving a lock), or more likely will take some time to assess. If your cause of slowness is I/O-related waits, for example, you need to determine whether the SQL is optimized, whether indexes will help, whether concurrency patterns are abnormal (i.e., are there 50 simultaneous executions of a batch job that should only be running serially?), etc. SQL optimization is a science in itself that requires knowledge of the underlying data structures and data volumes, as well as an understanding of Oracle’s optimizer.
Step #6:
What if V$SESSION_WAIT doesn’t tell you anything meaningful? This is when you should consult ASH (Active Session History) views to give you time breakdown details:
select
decode(nvl(to_char(s.sid),-1),-1,'DISCONNECTED','CONNECTED')
"STATUS",
topsession.sid "SID",
topsession.program "PROGRAM",
max(topsession.CPU) "CPU",
max(topsession.WAIT) "WAITING",
max(topsession.IO) "IO",
max(topsession.TOTAL) "TOTAL"
from (
select * from (
select
ash.session_id sid,
ash.session_serial# serial#,
ash.user_id user_id,
ash.program,
sum(decode(ash.session_state,'ON CPU',1,0)) "CPU",
sum(decode(ash.session_state,'WAITING',1,0)) -
sum(decode(ash.session_state,'WAITING',
decode(wait_class,'User I/O',1, 0 ), 0)) "WAIT" ,
sum(decode(ash.session_state,'WAITING',
decode(wait_class,'User I/O',1, 0 ), 0)) "IO" ,
sum(decode(session_state,'ON CPU',1,1)) "TOTAL"
from v$active_session_history ash
group by session_id,user_id,session_serial#,program
order by sum(decode(session_state,'ON CPU',1,1)) desc
) where rownum < 10
) topsession,
v$session s,
all_users u
where
u.user_id =topsession.user_id and
/* outer join to v$session because the session might be disconnected */
topsession.sid = s.sid (+) and
topsession.serial# = s.serial# (+)
group by topsession.sid, topsession.serial#,
topsession.user_id, topsession.program, s.username,
s.sid,s.paddr,u.username
order by max(topsession.TOTAL) desc
/
The output may look like this:
STATUS Sid PROGRAM CPU WAITING IO TOTAL
------------ ----- ------------------------- ----------- ---------- ----- ------
DISCONNECTED 4518 11220 9 83 11312
CONNECTED 4584 9620 25 50 9695
DISCONNECTED 4683 das@usplsvpba002.verigy.n 5598 258 735 6591
et (TNS V1-V3)
CONNECTED 4888 oracle@usplsvped002.verig 483 956 0 1439
y.net (LGWR)
CONNECTED 4897 oracle@usplsvped002.verig 7 1119 0 1126
y.net (LMON)
DISCONNECTED 4614 158 75 552 785
DISCONNECTED 4698 sqlservr.exe 52 20 695 767
DISCONNECTED 4491 sqlservr.exe 102 35 496 633
DISCONNECTED 4698 sqlservr.exe 14 9 578 601
You can use the methods in Steps 2-3 above to get details about the sessions above.
Step #7:
If nothing stands out at this point, consult system logs and Oracle alert logs, as well as O/S performance tools (sar, top, glance, etc)
Migrating Off Oracle On Demand? How to Get Started
Thinking about parting ways with Oracle’s data center? Whether the move is being considered or the decision is made, I have a few tips to help you get started. I won’t be discussing the pro’s and con’s of hosting with Oracle, but offering practical advice for those of you in the early planning stages of your migration.
Review Your OOD Service Contract
This is something you’ll want to do as soon as possible. You’ll obviously want to confirm your contract end date and plan your migration accordingly, but you’ll also want to be on the lookout for any terms and conditions that affect your project plan and budget. If you plan to decommission services prior to the contract end date, check for applicable early termination fees.
Review Your Software License Agreements
You’ll want to review these agreements to be sure you actually own licenses for all the software you plan to use in the new data center. There may be cases where software gets included in the OOD base services and made available without an explicit license. For example, we’ve found this to be true with Enterprise Manager. The point here is to be sure you have what you need before you start and plan for the expense if additional licenses are going to be required.
Gather Key Facts about the Current Environment
If you don’t already have operating system and database access for the current Oracle systems, get it ASAP. Once you have it, review the systems to gather critical information to help you plan your future infrastructure needs. Take note of things like number of processors, memory requirements, file system structure, storage requirements, OS versions and configurations, database size, etc. Perform this review for all tiers. In addition to this “physical” review of the systems, Oracle will provide you documentation that will supplement the information you’ll gather on your own. Request a “Customer Storage Report” from your Oracle Service Delivery Manager (SDM) to get an idea of storage requirements for the environment. While you’re at it, request the “OSA90” as well. OSA90 is the ”Service Architecture Design Document” which describes the infrastructure supporting your Oracle hosted applications, at least as it was initially designed.
Inventory and Review Your RICE Elements
You’ll want to take an inventory of all RICE elements, (R)eports, (I)nterfaces, (C)onversions, and (E)xtensions and complete an impact assessment for the server migration. You’ll most likely spend considerable time on this task if there are integrations between your Oracle system and non Oracle and/or external systems. Take note of things like FTP processes that may require changes after the data center move. For example, data interfaces to banking institutions would need to be reviewed and impact assessed.
Notify Oracle of Intent to Decommission OOD Services
A simple email to Oracle notifies the parties that be and sets the wheels in motion with Oracle. Check with your SDM to get the special email account. Your email will need to include things like customer name, your CSI number, planned decommission date, etc. In return, expect to receive an automated reply with your official “Decommission Reference Guide” attached. The decommission guide provides key dates and tasks, essentially a very high-level to-do list.
Submit SR Requesting Decommission Media
When you’re ready to build the first environment in the new data center, you’ll need to submit an SR requesting the folks at Oracle to take the necessary backups and place your files on appropriate storage media. Plan to submit the SR 72 hours before the date you want the backups taken and media created. It may happen sooner, but OOD’s policy is 72 hours. Also be aware that OOD does not offer overnight shipping, they use 2 or 3 day. If you require overnight shipping, you need to state this in the SR and provide carrier account information for a registered carrier (ie. FedEx, UPS, etc.). You’ll also want to be very clear on the type of storage media to be used. Oracle offers a few options, a couple of different tape drives as well as USB drives. Figure out which way you want to go before submitting the SR. In our experience, this has taken anywhere from one to three weeks to get good data in hand.
Request Pre-Clone Procedures from Oracle On Demand
If your source and target environments are compatible and you plan to use Rapid Clone to create your new environment, you may want to start kissing up to your SDM now. I’ve been told from folks in the On Demand team, that by policy, they do not run autoconfig and preclone on the database tier, a requirement if you plan to clone the environment. We have had luck getting them to do this by the way. If you’re headed down this path, plan extra time for Oracle to apply Autoconfig and Rapid clone patches before creating your backups and like any other patch, you’ll need to apply and test in all environments (ie. dev, test, prod). This can add significant delays to your timeline, so plan accordingly.
Obtain Installation Media and Documentation
Log an SR requesting installation media and documentation for all of your licensed products. You may not use this stuff during your migration, but definitely something you’ll want to have on hand.
If you have questions regarding OOD migrations, feel free to comment and I’ll get back to you. For those that have gone through the process and have additional tips to add to the list, please feel free to add your comments as well.











