"Oracle is slow, can you see if anything is going on?"

April 10th, 2009 | Written by Centroid
As DBAs, we're faced with this question all the time.  In order to quickly supply an accurate answer to this question, an experienced Oracle DBA needs to have a few tools in his belt - and I'm not talking about any special software or monitoring solutions, I'm talking simply SQL*Plus scripts and access to a database account with access to the V$ views. Here's what I do when someone asks me this question: Step #1: Take a look at V$SESSION_WAIT.  This will show you details about sessions currently and actively waiting on named Oracle wait events.  More often than not, if things are "slow",  a session or sessions is waiting on an instrumented Oracle wait event.  The query I use to do this (works on 9i-11g) is below: select     A.sid, decode(A.event,'null event','CPU Exec',A.event) WaitEvent, decode(A.event,'slave wait','N/A', 'PX Deq: Execution Msg','N/A', 'PX Deq Credit: send blk','N/A', 'latch free','N/A', 'enqueue', chr(bitand(A.p1,-16777216)/16777215)||chr(bitand(A.p1,16711680)/65535), 'file open','-1',to_char(A.p1)) p1, decode(A.event,'enqueue',decode(mod(A.p1,16),'6','ROW-LOCK','4','ITL','3', 'FK?','OTHER'), 'file open', -1, A.p2) p2, decode(A.event,'latch free','N/A','enqueue',null,'PX qref latch','-1', 'buffer busy waits',to_char(A.p3), A.p3) p3, decode(A.state,'WAITING','WTG', 'WAITED UNKNOWN TIME','UNK', 'WAITED SHORT TIME','WST', 'WAITED KNOWN TIME','WKT') wait_type, decode(A.state,'WAITING',A.seconds_in_wait, 'WAITED UNKNOWN TIME',-999, 'WAITED SHORT TIME',A.wait_time, 'WAITED KNOWN TIME',A.WAIT_TIME) wt, round((last_call_et/60),2) lc, substr(nvl(b.module,b.program),1,15) pgm from    v$session_wait A, v$session B where  A.event not in ('Queue Monitor Slave Wait','wait for unread message on broadcast channel','Queue Monitor Wait','jobq slave wait','queue messages','SQL*Net message to client','Null event','rdbms ipc message','i/o slave wait','io done') and A.event <> 'pipe get' and A.event not like '%akeup%' and A.event not like 'Streams AQ%' and A.state in ('WAITING','WAITED KNOWN TIME') and A.sid=B.sid and B.status='ACTIVE' order by 1 / Sample output is below (you'll have to set column headings and other SQL*plus formatting options, but you get the point): Sid Wait Event                     P1         P2         P3         Typ     Time       last call What ------ ------------------------------ ---------- ---------- ---------- --- -------- --------------- 4518 gc buffer busy                 24         38019      65537      WTG        0             .00 4519 gc buffer busy                 24         38019      65537      WTG        0             .00 XXVG_INV_PICKLI 4680 gc buffer busy                 24         38019      65537      WTG        0             .00 4830 gc buffer busy                 24         38019      65537      WTG        0             .00 FNDRSSUB 4886 smon timer                     300        0          0          WTG       29        18447.47 oracle@usplsvpe 4887 control file parallel write    2          4          2          WTG        0        18447.47 oracle@usplsvpe 4893 gcs remote message             24         0          0          WTG        0        18447.47 oracle@usplsvpe 4895 gcs remote message             24         0          0          WTG        0        18447.47 oracle@usplsvpe 4896 ges remote message             64         0          0          WTG      152        18447.47 oracle@usplsvpe 4899 DIAG idle wait                 1          1          200        WTG  1106848        18447.47 oracle@usplsvpe 4900 pmon timer                     300        0          0          WTG      680        18447.47 oracle@usplsvpe In this output, you'll see a handful of sessions waiting on "gc buffer busy" wait events.  At this point, it's time for the Oracle DBA to study up on what the wait events mean; in this case, sessions are waiting on RAC-related global buffer busy waits, which means that blocks are being used and are pinned in another instance's cache.   I won't go into a description on what all the wait events mean here - you can look them up at any of the following URLs: http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevent... http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevent... http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/instance_... http://www.adp-gmbh.ch/ora/tuning/event.html http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_d... http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_d... At this point,  you know who's waiting on what and you can use the output to look for anomalies for the current environment.  A couple of things to note:
  • There are a handful of common wait events in any "busy" Oracle environment; specifically, "db file sequential read", "db file scattered read", latch-related, enqueue-related (locks), etc.  You should be familiar with what types of waits are "normal" for a given system.
  • You should become familiar with the relative quantity of each type of wait for each system at various times during the day.  For example, at client A, with a new implementation, low transaction volume, not many users, you may never see more than a handful of I/O-related waits at any given time.  At this client, if you see several dozen sessions waiting on the same type or class of wait event, it's probably a cause for concern.  At a different client, it may be typical to see 20 or 30 I/O-related waits at any given time.  Bottom line is this - you need to have familiarity with the system you're monitoring.
  • Any DBA with his salt should become study Oracle's wait interface and become familiar with what each of the major wait events means
  • You can use Centroid's "CCEO Infra Wait Interface.ppt" document as a quick reference on the wait interface
Step #2: Grab the SQL for the sessions that show up repeatedly and frequently in the output from the above query.  Note the SID (Session Identifier) and use it as input to the following script: select t.sql_fulltext , t.buffer_gets, t.disk_reads,t.executions from v$session s, v$sql t where s.sql_address =t.address and s.sql_hash_value =t.hash_value and s.sid = &&1 / Format the out of this if you plan on running an execution plan on it. If you want additional detail about the session(s) from V$SESSION_WAIT, you can query V$SESSION. Step #3: If the SQL statements extracted from the previous step are waiting on I/O-related or contention-related waits,  you should grab an execution plan/explain plan by taking the formatted SQL and plugging into the below script: set lines 120 explain plan for << insert SQL here >> select * from table(dbms_xplan.display(null, null,'all')); Step #4: If the slowness is related to, for example, locks (enqueue waits), find out who the lock holder(s) is by querying V$LOCK or DBA_WAITERS and make a judgement call as to whether to kill the session(s) holding the lock, communicate with the end-user, etc. Step #5: Fix it.  This could be a quick-fix (resolving a lock), or more likely will take some time to assess.  If your cause of slowness is I/O-related waits, for example, you need to determine whether the SQL is optimized, whether indexes will help, whether concurrency patterns are abnormal (i.e., are there 50 simultaneous executions of a batch job that should only be running serially?), etc.  SQL optimization is a science in itself that requires knowledge of the underlying data structures and data volumes, as well as an understanding of Oracle's optimizer. Step #6: What if V$SESSION_WAIT doesn't tell you anything meaningful?  This is when you should consult ASH (Active Session History) views to give you time breakdown details: select decode(nvl(to_char(s.sid),-1),-1,'DISCONNECTED','CONNECTED') "STATUS", topsession.sid             "SID", topsession.program                  "PROGRAM", max(topsession.CPU)              "CPU", max(topsession.WAIT)       "WAITING", max(topsession.IO)                  "IO", max(topsession.TOTAL)           "TOTAL" from ( select * from ( select ash.session_id sid, ash.session_serial# serial#, ash.user_id user_id, ash.program, sum(decode(ash.session_state,'ON CPU',1,0))     "CPU", sum(decode(ash.session_state,'WAITING',1,0))    - sum(decode(ash.session_state,'WAITING', decode(wait_class,'User I/O',1, 0 ), 0))    "WAIT" , sum(decode(ash.session_state,'WAITING', decode(wait_class,'User I/O',1, 0 ), 0))    "IO" , sum(decode(session_state,'ON CPU',1,1))     "TOTAL" from v$active_session_history ash group by session_id,user_id,session_serial#,program order by sum(decode(session_state,'ON CPU',1,1)) desc ) where rownum < 10 )    topsession, v$session s, all_users u where u.user_id =topsession.user_id and /* outer join to v$session because the session might be disconnected */ topsession.sid         = s.sid         (+) and topsession.serial# = s.serial#   (+) group by  topsession.sid, topsession.serial#, topsession.user_id, topsession.program, s.username, s.sid,s.paddr,u.username order by max(topsession.TOTAL) desc / The output may look like this: STATUS         Sid PROGRAM                           CPU    WAITING    IO  TOTAL ------------ ----- ------------------------- ----------- ---------- ----- ------ DISCONNECTED  4518                                 11220          9    83  11312 CONNECTED     4584                                  9620         25    50   9695 DISCONNECTED  4683 das@usplsvpba002.verigy.n        5598        258   735   6591 et (TNS V1-V3) CONNECTED     4888 oracle@usplsvped002.verig         483        956     0   1439 y.net (LGWR) CONNECTED     4897 oracle@usplsvped002.verig           7       1119     0   1126 y.net (LMON) DISCONNECTED  4614                                   158         75   552    785 DISCONNECTED  4698 sqlservr.exe                       52         20   695    767 DISCONNECTED  4491 sqlservr.exe                      102         35   496    633 DISCONNECTED  4698 sqlservr.exe                       14          9   578    601 You can use the methods in Steps 2-3 above to get details about the sessions above. Step #7: If nothing stands out at this point, consult system logs and Oracle alert logs, as well as O/S performance tools (sar, top, glance, etc)