Archive for the ‘Performance Tuning’ Category
EMC RecoverPoint Behavior under Heavy Load
I’m sure this is well-known and documented behavior to RecoverPoint appliance experts, but we recently found what we think to be a very nice feature in EMC’s RecoverPoint behavior that perhaps some folks new to the technology are unaware of …
Today we did a large (1Tb) backup of an assortment of Oracle database files on HP-UX file-systems to a set of ASM disk groups. The ASM disk groups were the source LUNs in a RecoverPoint remote replication consistency group. Our RMAN backup to ASM disk groups was done using 6 channels, and the source ASM LUNs resided in a RAID5 Enterprise Flash Drive Raid Group.
As the RMAN backup job allocated channels and wrote backup copies to the target LUNs, we noticed that the RecoverPoint appliances essentially “suspended” transmission of data with a helpful message about the source volumes being too busy. As channels de-allocated and the RMAN job switched from tablespace to tablespace (it was a tablespace at a time RMAN “backup as copy” operation), the RecoverPoint data transfer would kick back in, then suspend itself again as the drives became busy.
Granted, we configured our consistency groups with pretty much default settings, didn’t set any non-standard or overly aggressive RPOs, and essentially were “out of the box” with a 100Gb-sized RecoverPoint journal size. The nice implication of this behavior is this - in the event we need to do “large” data migration or storage-related activities on our source RP volumes, it looks like we’ll never have to worry about suspending our data transfer manually (through the RP administrative web interface or otherwise) - it all just happens by itself.
One less thing to worry about …
Large SGA on 32-bit Redhat Linux
Many DBAs are familiar with the SGA size limitations on 32-bit platforms. This post shows how to allocate a 3Gb buffer cache on 32-bit RHAS 3.
- Step 1: Mount /dev/shm to as type ramfs. Edit /etc/fstab and add an entry like this:
none /dev/shm ramfs defaults,size=4G 0 0
- Step 2: do a “mount -a” to mount /dev/shm. I use ramfs instead of tmpfs because it doesn’t use swap; tmpfs does. With ramfs memory allocation will also grow dynamically, whereas when using tmpfs it will not.
- Step 3: As root, assuming your database is owned by the Linux account oracle, whose primary group is “dba”, do this:
# chown oracle:dba /dev/shm
- Step 4: Add the following to /etc/security/limits.conf to increase maximum memory lock parameters. Ensure “oracle” user has his environment sourced to establish these settings.
oracle soft memlock 3145728
oracle hard memlock 3145728
- Step 5: Login as oracle and do “ulimit -a”; validate that memlock is set to the above values
- Step 6: Edit /etc/sysctl.conf and add/change the following. When complete, do “sysctl -w” to activate changes into the Linux kernel. The “vm.hugetbl_pool” setting below is set based on the output of a script provided here (http://download-uk.oracle.com/docs/cd/B28359_01/server.111/b32009/appi_vlm.htm) - run this AFTER Oracle is started at the completion of this document to get a realistic value for vm.hugetbl_pool and adjust accordingly
kernel.sem = 1000 32000 100 150
kernel.shmmax = 4294967295
kernel.shmall = 4194304
net.ipv4.ip_local_port_range = 1024 65000
vm.pagecache = 10 20 30
kernel.shmmni=4096
vm.hugetlb_pool=4096
- Step 7: Unset db_cache_size, db_xk_cache_size, sga_target, sga_max_size, memory_target init.ora parameters and manually set shared_pool_size to appropriate value. You can use “show sga” to determine this
- Step 8: Set use_indirect_data_buffers=true
- Step 9: Set db_block_buffers such that the product of db_block_buffers and db_block_size = 3G
- Step 10: In oracle’s .profile/.bash_profile, set DISABLE_MAP_LOCK=1. This is required to avoid unnecessarily long connect times for databases that are connected to frequently
export DISABLE_MAP_LOCK=1
- Step 11: Stop oracle, source environment, ensure O/S limits are correct (ulimit -a), ensure /dev/shm is owned by oracle (ls -al /dev/shm), ensure DISABLE_MAP_LOCK=1, and then start Oracle
- Step 12: Add the following to /etc/rc2.d/S99local:
mount /dev/shm
chown oracle:dba /dev/shm
Test. Enjoy the benefits of a large cache. Test across reboots to ensure /dev/shm is mounted correctly.
Using DBMS_SQLDIAG
As a DBA, have you ever had an issue you suspected may be a bug and have been asked to generate “test data” for Oracle development? We ran into this at a client recently and found a cool 11g utility (at least I *think* it’s new :)) to get Oracle Support the data they need to reproduce the issue in-house.
The situation had to do with optimizer_features_enable being set to 11.1.0.7 in a database recently upgraded form 9.2.0.5 in an Oracle eBusiness Suite environment (11.5.10.2) that utilized Oracle Reports heavily. Our problem was specific to a custom Report that had optimizer hints embedded in the main query. With optimizer_features_enable set to 11.1.0.7, the report retrieved no rows. With it set to 9.2.0.x or 10.2.0.x, it returned the proper number of rows. Based on this data discrepancy, we felt it safe to set optimizer_features_enable < 11.1.0.7 across the board, as we didn’t know the scope of the issue.
During our work on an SR with Oracle, they asked for us to send the query, all the versions and optimizer settings, as well as an export of all the tables involved in the query with exported optimizer statistics. Our problem was several-fold; first, the query was complex and had many embedded views, so we didn’t really want to spend time deconstructing it to get a comprehensive list of tables to export. Second, the underlying tables were very large (hundreds of millions of rows for a few of them) and we didn’t have disk space or quite frankly, time, to export all the tables in their entirety.
Enter DBMS_SQLDIAG …
Using DBMS_SQLDIAG and 11g Data Pump features, we were able to quickly generate a complete test case to export a subset of the rows from all the impacted tables, very quickly and with minimal disk space requirements. Here’s what we did:
- Grabbed the offending query from a TKPROF’d trace file
- Used this syntax to generate a test case:
declare
tc_out clob;
begin
dbms_sqldiag.export_sql_testcase(directory=>’<directory>’,
sql_text=>’<SQL Text>’,
testcase => tc_out,
exportdata=>TRUE,
samplingpercent=>1);
end;
/
In the above example, note the following:
- <directory> is a valid directory - check DBA_DIRECTORIES
- <SQL Text> is the SQL statement from the TKPROF output
- exportdata=>TRUE tells DBMS_SQLDIAG to export the data from the base tables
- samplingpercent=>1 tells Data Pump to use a 1% sampling size. This was important to limit the number of rows
After executing, a number of XML, log, and Data Pump export dumps are generated to <directory> and available to upload to the SR!
One additional step we took was to export table statistics for all the tables involved in the query. For this, we looked in the log file for all tables export and used DBMS_STATS.EXPORT_TABLE_STATS to export segment statistics.
The obvious benefit here is that it enabled us to continue working on the SR, but some other possible applications of DBMS_SQLDIAG could be for internal testing purposes, testing functionality/performance across versions of Oracle without a complicated upgrade, regression testing, and so forth.
Future Proofing Oracle EBS: Tip 1
Most organizations leave their mission critical systems vulnerable to security threats and performance degradation. Recent studies show that organizational constraints are at the core of this issue. These constraints range from organizational policies, budgets and staff availability and even competency.
26 percent of organizations apply critical security and performance patches as they are released, 20 percent will eventually apply 6 to 9 months later while another 20 percent have no requirement at all to apply them. That leaves many companies data unprotected.
There are several ways to combat these constraints. One way is gain support from your organization. This can come in the form of support from executives, security / audit team and other critical policy makers. By establishing new and or updating existing security policies, and as a critical component, patching, these guidelines and timeliness of patching removes some of these constraints. A second way to protect and improve data integrity is to use technology. By keeping current on latest software, technology can help you reduce time and budget constraints of getting current applications and databases tested and allow your team to proactively assess the impact to your enterprise.
Centroid believes in 2 degrees of continuous change instead of wide arcs of re-activeness. These small degrees of change allows for continuous improvement across the organization. Whether it be through technology or process slight proactive adjustments make the enterprise perform at a higher degree then you can ever accomplish by large scale reactive measures.
Centroid is in the business of Future Proofing Oracle environments. In fact, Centroid is the leader in Future Proofing your Business and Technology and no one does it better. No one.
Oracle EBS 12.1.1 Upgrade from 12.0.6 - Lessons Learned
So you’re going to patch from 12.0.6 to 12.1.1? It’s basically like any other major upgrade with Oracle’s e-Business Suite; tons of pieces to upgrade, patch, validate, and a high potential for running into problems. The documentation for all the “gotchas” is all over the place, but Metalink document 752619.1 is the best place to start and has all the major steps listed. In general, the following is required:
- Confirm database version
- Upgrade 10.1.3 home to 10.1.3.4
- Upgrade 10.1.2 home to 10.1.2.3
- Apply necessary pre-req patches
- Apply maintenance pack (7303030)
- Apply post-maintenance pack patches
- Do all the post-upgrade steps
Below are some of the lessons learned in an environment in which we patched 12.0.6 to 12.1.1 and along the way, upgraded the 10.2.0.4 database to 11.1.0.7.
Oracle XDB core-dumps (look for ORA-31114)
- Follow Metalink document 760611.1
- We’re not sure exactly why this happened, but something in the XDB upgrade from 10.2.0.4 to 11.1.0.7 failed
AZ tables/objects invalid and workers fail on 7303030 (azR12active.sql)
- Result of XDB core-dump issue above
- Follow Metalink document 832459.1
Worker fails on EGOSILDU.sql
- Per Metalink document 847687.1, this is a bug patched with 8485942. This patch, 8485942, isn’t available as of 10/10/2009
- Per the recommendation in 847687.1, skipped worked and continued
Relink fails on RAXTRX
- Choose to continue. This is due to bug
- Look in 841113.1 for patches. These are the ones I found that fixed this, plus some other issues found during testing …
8239041
7721420
7660309
7673650
8393427
8414254
8414069
Failures during 7303030 compiling multiple forms and libraries
- Choose to continue. Manually recompile using adadmin after the patch, and if things still fail look for patches
Issues applying 10.1.3.4 IAS_ORACLE_HOME patch (7272722)
- Receive failures trying to start OPMN while at 83% complete on initial screen
- Environment was cloned and adcfgclone’d successfully from a 12.0.6 instance
- adcfgclone.pl will update all the right stuff in $ORA_CONFIG_HOME correctly, but several files in $IAS_ORACLE_HOME will not be updated with the correct environment-specific values
- Change hard-coded paths in the following:
$IAS_ORACLE_HOME/inventory/Clone/clone.xml
$IAS_ORACLE_HOME/config/ias.properties
$IAS_ORACLE_HOME/opmn/bin/opmnctl
$IAS_ORACLE_HOME/Apache/Apache/bin/iasobf
- Make sure you can run $IAS_ORACLE_HOME/opmn/bin/opmnctl startall|stopall|status
- Resume 7272722
Failures with 7303030 doing FndXdfCmp
- I’m pretty sure this was a site-specific issue related to not cloning cleanly
- If the adworkxxx.log file looks like this …
java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at oracle.apps.ad.worker.AdJavaWorker.invokeUtility(AdJavaWorker.java:501)
at oracle.apps.ad.worker.AdJavaWorker.main(AdJavaWorker.java:356)
Caused by: java.lang.NoSuchFieldError: mTypeData
at oracle.apps.fnd.odf2.FndInitType.doCmpType(FndInitType.java:156)
at oracle.apps.fnd.odf2.FndXdfCmp.doOdfCmp(FndXdfCmp.java:98)
at oracle.apps.fnd.odf2.FndXdfCmp.applyXDF(FndXdfCmp.java:186)
- … hopefully you’ve got an existing 12.1.1 instance to refer to
- Look in $JAVA_TOP/oracle/apps/odf2 for class files containing mTypeData. You should see 3 class files
- Restore anything missing from a clean instance
- Generate all JAR Files
- Kill patch
- Resume
Patch takes a very long time on adobjcmp.sql (11.1.0.7)
- This will take a long time, it’s compiling all invalid objects and there will be 50,000+
- Look for lots of time on “resmgr:cpu quantum” wait event
- Do “alter system set resouce_manager_plan=””
- Bounce the database when everything is done
Are you trying to use RapidWiz to Upgrade to 12.1.1 from 12.0.6?
- Don’t. This is for an 11.5.x upgrade
More to come as I find them, I’m sure …
What you can do with ASH: Top Resource Consuming SQL
The following SQL shows you the top resource-consuming pieces of SQL in your instance:
select ash.SQL_ID ,
sum(decode(ash.session_state,'ON CPU',1,0)) "CPU",
sum(decode(ash.session_state,'WAITING',1,0)) -
sum(decode(ash.session_state,'WAITING', decode(en.wait_class, 'User I/O',1,0),0)) "WAIT" ,
sum(decode(ash.session_state,'WAITING', decode(en.wait_class, 'User I/O',1,0),0)) "IO" ,
sum(decode(ash.session_state,'ON CPU',1,1)) "TOTAL"
from v$active_session_history ash,
v$event_name en
where SQL_ID is not NULL
and en.event#=ash.event#
group by sql_id
order by sum(decode(session_state,'ON CPU',1,1)) desc
“Oracle is slow, can you see if anything is going on?”
As DBAs, we’re faced with this question all the time. In order to quickly supply an accurate answer to this question, an experienced Oracle DBA needs to have a few tools in his belt - and I’m not talking about any special software or monitoring solutions, I’m talking simply SQL*Plus scripts and access to a database account with access to the V$ views.
Here’s what I do when someone asks me this question:
Step #1:
Take a look at V$SESSION_WAIT. This will show you details about sessions currently and actively waiting on named Oracle wait events. More often than not, if things are “slow”, a session or sessions is waiting on an instrumented Oracle wait event. The query I use to do this (works on 9i-11g) is below:
select A.sid,
decode(A.event,'null event','CPU Exec',A.event) WaitEvent,
decode(A.event,'slave wait','N/A',
'PX Deq: Execution Msg','N/A',
'PX Deq Credit: send blk','N/A',
'latch free','N/A',
'enqueue',
chr(bitand(A.p1,-16777216)/16777215)||chr(bitand(A.p1,16711680)/65535),
'file open','-1',to_char(A.p1)) p1,
decode(A.event,'enqueue',decode(mod(A.p1,16),'6','ROW-LOCK','4','ITL','3',
'FK?','OTHER'),
'file open',
-1,
A.p2) p2,
decode(A.event,'latch free','N/A','enqueue',null,'PX qref latch','-1',
'buffer busy waits',to_char(A.p3), A.p3) p3,
decode(A.state,'WAITING','WTG',
'WAITED UNKNOWN TIME','UNK',
'WAITED SHORT TIME','WST',
'WAITED KNOWN TIME','WKT') wait_type,
decode(A.state,'WAITING',A.seconds_in_wait,
'WAITED UNKNOWN TIME',-999,
'WAITED SHORT TIME',A.wait_time,
'WAITED KNOWN TIME',A.WAIT_TIME) wt,
round((last_call_et/60),2) lc,
substr(nvl(b.module,b.program),1,15) pgm
from v$session_wait A,
v$session B
where A.event not in ('Queue Monitor Slave Wait','wait for unread message on broadcast channel','Queue Monitor Wait','jobq slave wait','queue messages','SQL*Net message to client','Null event','rdbms ipc message','i/o slave wait','io done')
and A.event <> 'pipe get'
and A.event not like '%akeup%'
and A.event not like 'Streams AQ%'
and A.state in ('WAITING','WAITED KNOWN TIME')
and A.sid=B.sid
and B.status='ACTIVE'
order by 1
/
Sample output is below (you’ll have to set column headings and other SQL*plus formatting options, but you get the point):
Sid Wait Event P1 P2 P3 Typ Time last call What
------ ------------------------------ ---------- ---------- ---------- --- -------- ---------------
4518 gc buffer busy 24 38019 65537 WTG 0 .00
4519 gc buffer busy 24 38019 65537 WTG 0 .00 XXVG_INV_PICKLI
4680 gc buffer busy 24 38019 65537 WTG 0 .00
4830 gc buffer busy 24 38019 65537 WTG 0 .00 FNDRSSUB
4886 smon timer 300 0 0 WTG 29 18447.47 oracle@usplsvpe
4887 control file parallel write 2 4 2 WTG 0 18447.47 oracle@usplsvpe
4893 gcs remote message 24 0 0 WTG 0 18447.47 oracle@usplsvpe
4895 gcs remote message 24 0 0 WTG 0 18447.47 oracle@usplsvpe
4896 ges remote message 64 0 0 WTG 152 18447.47 oracle@usplsvpe
4899 DIAG idle wait 1 1 200 WTG 1106848 18447.47 oracle@usplsvpe
4900 pmon timer 300 0 0 WTG 680 18447.47 oracle@usplsvpe
In this output, you’ll see a handful of sessions waiting on “gc buffer busy” wait events. At this point, it’s time for the Oracle DBA to study up on what the wait events mean; in this case, sessions are waiting on RAC-related global buffer busy waits, which means that blocks are being used and are pinned in another instance’s cache. I won’t go into a description on what all the wait events mean here - you can look them up at any of the following URLs:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents.htm#REFRN101
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevents003.htm#BGGIBDJI
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/instance_tune.htm#i22670
http://www.adp-gmbh.ch/ora/tuning/event.html
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=34405.1
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=62172.1
At this point, you know who’s waiting on what and you can use the output to look for anomalies for the current environment. A couple of things to note:
- There are a handful of common wait events in any “busy” Oracle environment; specifically, “db file sequential read”, “db file scattered read”, latch-related, enqueue-related (locks), etc. You should be familiar with what types of waits are “normal” for a given system.
- You should become familiar with the relative quantity of each type of wait for each system at various times during the day. For example, at client A, with a new implementation, low transaction volume, not many users, you may never see more than a handful of I/O-related waits at any given time. At this client, if you see several dozen sessions waiting on the same type or class of wait event, it’s probably a cause for concern. At a different client, it may be typical to see 20 or 30 I/O-related waits at any given time. Bottom line is this - you need to have familiarity with the system you’re monitoring.
- Any DBA with his salt should become study Oracle’s wait interface and become familiar with what each of the major wait events means
- You can use Centroid’s “CCEO Infra Wait Interface.ppt” document as a quick reference on the wait interface
Step #2:
Grab the SQL for the sessions that show up repeatedly and frequently in the output from the above query. Note the SID (Session Identifier) and use it as input to the following script:
select
t.sql_fulltext ,
t.buffer_gets, t.disk_reads,t.executions
from v$session s,
v$sql t
where s.sql_address =t.address and s.sql_hash_value =t.hash_value
and s.sid = &&1
/
Format the out of this if you plan on running an execution plan on it.
If you want additional detail about the session(s) from V$SESSION_WAIT, you can query V$SESSION.
Step #3:
If the SQL statements extracted from the previous step are waiting on I/O-related or contention-related waits, you should grab an execution plan/explain plan by taking the formatted SQL and plugging into the below script:
set lines 120
explain plan for
<< insert SQL here >>
select * from table(dbms_xplan.display(null, null,'all'));
Step #4:
If the slowness is related to, for example, locks (enqueue waits), find out who the lock holder(s) is by querying V$LOCK or DBA_WAITERS and make a judgement call as to whether to kill the session(s) holding the lock, communicate with the end-user, etc.
Step #5:
Fix it. This could be a quick-fix (resolving a lock), or more likely will take some time to assess. If your cause of slowness is I/O-related waits, for example, you need to determine whether the SQL is optimized, whether indexes will help, whether concurrency patterns are abnormal (i.e., are there 50 simultaneous executions of a batch job that should only be running serially?), etc. SQL optimization is a science in itself that requires knowledge of the underlying data structures and data volumes, as well as an understanding of Oracle’s optimizer.
Step #6:
What if V$SESSION_WAIT doesn’t tell you anything meaningful? This is when you should consult ASH (Active Session History) views to give you time breakdown details:
select
decode(nvl(to_char(s.sid),-1),-1,'DISCONNECTED','CONNECTED')
"STATUS",
topsession.sid "SID",
topsession.program "PROGRAM",
max(topsession.CPU) "CPU",
max(topsession.WAIT) "WAITING",
max(topsession.IO) "IO",
max(topsession.TOTAL) "TOTAL"
from (
select * from (
select
ash.session_id sid,
ash.session_serial# serial#,
ash.user_id user_id,
ash.program,
sum(decode(ash.session_state,'ON CPU',1,0)) "CPU",
sum(decode(ash.session_state,'WAITING',1,0)) -
sum(decode(ash.session_state,'WAITING',
decode(wait_class,'User I/O',1, 0 ), 0)) "WAIT" ,
sum(decode(ash.session_state,'WAITING',
decode(wait_class,'User I/O',1, 0 ), 0)) "IO" ,
sum(decode(session_state,'ON CPU',1,1)) "TOTAL"
from v$active_session_history ash
group by session_id,user_id,session_serial#,program
order by sum(decode(session_state,'ON CPU',1,1)) desc
) where rownum < 10
) topsession,
v$session s,
all_users u
where
u.user_id =topsession.user_id and
/* outer join to v$session because the session might be disconnected */
topsession.sid = s.sid (+) and
topsession.serial# = s.serial# (+)
group by topsession.sid, topsession.serial#,
topsession.user_id, topsession.program, s.username,
s.sid,s.paddr,u.username
order by max(topsession.TOTAL) desc
/
The output may look like this:
STATUS Sid PROGRAM CPU WAITING IO TOTAL
------------ ----- ------------------------- ----------- ---------- ----- ------
DISCONNECTED 4518 11220 9 83 11312
CONNECTED 4584 9620 25 50 9695
DISCONNECTED 4683 das@usplsvpba002.verigy.n 5598 258 735 6591
et (TNS V1-V3)
CONNECTED 4888 oracle@usplsvped002.verig 483 956 0 1439
y.net (LGWR)
CONNECTED 4897 oracle@usplsvped002.verig 7 1119 0 1126
y.net (LMON)
DISCONNECTED 4614 158 75 552 785
DISCONNECTED 4698 sqlservr.exe 52 20 695 767
DISCONNECTED 4491 sqlservr.exe 102 35 496 633
DISCONNECTED 4698 sqlservr.exe 14 9 578 601
You can use the methods in Steps 2-3 above to get details about the sessions above.
Step #7:
If nothing stands out at this point, consult system logs and Oracle alert logs, as well as O/S performance tools (sar, top, glance, etc)
Oracle 10g ASH: What is it Anyway?
As a DBA, how many times have you been asked… “what happened at 3:00am today, everything slowed down?” Try answering this question with V$SESSION_WAIT – you can’t. Try answering by looking at AWR, Automatic Workload Repository, or Statpack reports – you may be able to make some guesses, but they’d only be guesses. Try answering with data in the alert log or data in O/S logs – you’ll only be able to answer this question if you get very lucky. Try answering by looking at Oracle Enterprise Manager Grid Control 10g. Well actually, this gets you on the right track, because Grid Control will show you ASH, Activity Session History data!
ASH is built into the kernel in Oracle 10g and MMON and MMNL do the work! ASH collects data every second from V$SESSION, V$SESSTAT, V$SESSION_WAIT, etc, and populates V$ACTIVE_SESSION_HISTORY. An hour of data is kept in memory for easy access, just query V$ACTIVE_SESSION_HISTORY (think of it as V$SESSION_WAIT plus some more stuff, but longer lasting). Data is flushed to disk every hour and externalized via views WRH$_<ASH name>. The collection interval and volume self-adjusts based on load. ASH will collect more aggressively when the system is busy, less so when not.
In short, ASH gives you a way to use all your “right now” tuning methods for time periods in the past. When someone asks you what happened at X:00AM when you were fast asleep, you have an option now. Give it a try!
Look for deeper dives into ASH in future posts. Stay tuned!











