As DBAs, we're faced with this question all the time. In order to quickly supply an accurate answer to this question, an experienced Oracle DBA needs to have a few tools in his belt - and I'm not talking about any special software or monitoring solutions, I'm talking simply SQL*Plus scripts and access to a database account with access to the V$ views.
Here's what I do when someone asks me this question:
Step #1:
Take a look at V$SESSION_WAIT. This will show you details about sessions currently and actively waiting on named Oracle wait events. More often than not, if things are "slow", a session or sessions is waiting on an instrumented Oracle wait event. The query I use to do this (works on 9i-11g) is below:
select A.sid,
decode(A.event,'null event','CPU Exec',A.event) WaitEvent,
decode(A.event,'slave wait','N/A',
'PX Deq: Execution Msg','N/A',
'PX Deq Credit: send blk','N/A',
'latch free','N/A',
'enqueue',
chr(bitand(A.p1,-16777216)/16777215)||chr(bitand(A.p1,16711680)/65535),
'file open','-1',to_char(A.p1)) p1,
decode(A.event,'enqueue',decode(mod(A.p1,16),'6','ROW-LOCK','4','ITL','3',
'FK?','OTHER'),
'file open',
-1,
A.p2) p2,
decode(A.event,'latch free','N/A','enqueue',null,'PX qref latch','-1',
'buffer busy waits',to_char(A.p3), A.p3) p3,
decode(A.state,'WAITING','WTG',
'WAITED UNKNOWN TIME','UNK',
'WAITED SHORT TIME','WST',
'WAITED KNOWN TIME','WKT') wait_type,
decode(A.state,'WAITING',A.seconds_in_wait,
'WAITED UNKNOWN TIME',-999,
'WAITED SHORT TIME',A.wait_time,
'WAITED KNOWN TIME',A.WAIT_TIME) wt,
round((last_call_et/60),2) lc,
substr(nvl(b.module,b.program),1,15) pgm
from v$session_wait A,
v$session B
where A.event not in ('Queue Monitor Slave Wait','wait for unread message on broadcast channel','Queue Monitor Wait','jobq slave wait','queue messages','SQL*Net message to client','Null event','rdbms ipc message','i/o slave wait','io done')
and A.event <> 'pipe get'
and A.event not like '%akeup%'
and A.event not like 'Streams AQ%'
and A.state in ('WAITING','WAITED KNOWN TIME')
and A.sid=B.sid
and B.status='ACTIVE'
order by 1
/
Sample output is below (you'll have to set column headings and other SQL*plus formatting options, but you get the point):
Sid Wait Event P1 P2 P3 Typ Time last call What
------ ------------------------------ ---------- ---------- ---------- --- -------- ---------------
4518 gc buffer busy 24 38019 65537 WTG 0 .00
4519 gc buffer busy 24 38019 65537 WTG 0 .00 XXVG_INV_PICKLI
4680 gc buffer busy 24 38019 65537 WTG 0 .00
4830 gc buffer busy 24 38019 65537 WTG 0 .00 FNDRSSUB
4886 smon timer 300 0 0 WTG 29 18447.47 oracle@usplsvpe
4887 control file parallel write 2 4 2 WTG 0 18447.47 oracle@usplsvpe
4893 gcs remote message 24 0 0 WTG 0 18447.47 oracle@usplsvpe
4895 gcs remote message 24 0 0 WTG 0 18447.47 oracle@usplsvpe
4896 ges remote message 64 0 0 WTG 152 18447.47 oracle@usplsvpe
4899 DIAG idle wait 1 1 200 WTG 1106848 18447.47 oracle@usplsvpe
4900 pmon timer 300 0 0 WTG 680 18447.47 oracle@usplsvpe
In this output, you'll see a handful of sessions waiting on "gc buffer busy" wait events. At this point, it's time for the Oracle DBA to study up on what the wait events mean; in this case, sessions are waiting on RAC-related global buffer busy waits, which means that blocks are being used and are pinned in another instance's cache. I won't go into a description on what all the wait events mean here - you can look them up at any of the following URLs:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevent...
http://download.oracle.com/docs/cd/B19306_01/server.102/b14237/waitevent...
http://download.oracle.com/docs/cd/B19306_01/server.102/b14211/instance_...
http://www.adp-gmbh.ch/ora/tuning/event.html
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_d...
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_d...
At this point, you know who's waiting on what and you can use the output to look for anomalies for the current environment. A couple of things to note:
- There are a handful of common wait events in any "busy" Oracle environment; specifically, "db file sequential read", "db file scattered read", latch-related, enqueue-related (locks), etc. You should be familiar with what types of waits are "normal" for a given system.
- You should become familiar with the relative quantity of each type of wait for each system at various times during the day. For example, at client A, with a new implementation, low transaction volume, not many users, you may never see more than a handful of I/O-related waits at any given time. At this client, if you see several dozen sessions waiting on the same type or class of wait event, it's probably a cause for concern. At a different client, it may be typical to see 20 or 30 I/O-related waits at any given time. Bottom line is this - you need to have familiarity with the system you're monitoring.
- Any DBA with his salt should become study Oracle's wait interface and become familiar with what each of the major wait events means
- You can use Centroid's "CCEO Infra Wait Interface.ppt" document as a quick reference on the wait interface
Step #2:
Grab the SQL for the sessions that show up repeatedly and frequently in the output from the above query. Note the SID (Session Identifier) and use it as input to the following script:
select
t.sql_fulltext ,
t.buffer_gets, t.disk_reads,t.executions
from v$session s,
v$sql t
where s.sql_address =t.address and s.sql_hash_value =t.hash_value
and s.sid = &&1
/
Format the out of this if you plan on running an execution plan on it.
If you want additional detail about the session(s) from V$SESSION_WAIT, you can query V$SESSION.
Step #3:
If the SQL statements extracted from the previous step are waiting on I/O-related or contention-related waits, you should grab an execution plan/explain plan by taking the formatted SQL and plugging into the below script:
set lines 120
explain plan for
<< insert SQL here >>
select * from table(dbms_xplan.display(null, null,'all'));
Step #4:
If the slowness is related to, for example, locks (enqueue waits), find out who the lock holder(s) is by querying V$LOCK or DBA_WAITERS and make a judgement call as to whether to kill the session(s) holding the lock, communicate with the end-user, etc.
Step #5:
Fix it. This could be a quick-fix (resolving a lock), or more likely will take some time to assess. If your cause of slowness is I/O-related waits, for example, you need to determine whether the SQL is optimized, whether indexes will help, whether concurrency patterns are abnormal (i.e., are there 50 simultaneous executions of a batch job that should only be running serially?), etc. SQL optimization is a science in itself that requires knowledge of the underlying data structures and data volumes, as well as an understanding of Oracle's optimizer.
Step #6:
What if V$SESSION_WAIT doesn't tell you anything meaningful? This is when you should consult ASH (Active Session History) views to give you time breakdown details:
select
decode(nvl(to_char(s.sid),-1),-1,'DISCONNECTED','CONNECTED')
"STATUS",
topsession.sid "SID",
topsession.program "PROGRAM",
max(topsession.CPU) "CPU",
max(topsession.WAIT) "WAITING",
max(topsession.IO) "IO",
max(topsession.TOTAL) "TOTAL"
from (
select * from (
select
ash.session_id sid,
ash.session_serial# serial#,
ash.user_id user_id,
ash.program,
sum(decode(ash.session_state,'ON CPU',1,0)) "CPU",
sum(decode(ash.session_state,'WAITING',1,0)) -
sum(decode(ash.session_state,'WAITING',
decode(wait_class,'User I/O',1, 0 ), 0)) "WAIT" ,
sum(decode(ash.session_state,'WAITING',
decode(wait_class,'User I/O',1, 0 ), 0)) "IO" ,
sum(decode(session_state,'ON CPU',1,1)) "TOTAL"
from v$active_session_history ash
group by session_id,user_id,session_serial#,program
order by sum(decode(session_state,'ON CPU',1,1)) desc
) where rownum < 10
) topsession,
v$session s,
all_users u
where
u.user_id =topsession.user_id and
/* outer join to v$session because the session might be disconnected */
topsession.sid = s.sid (+) and
topsession.serial# = s.serial# (+)
group by topsession.sid, topsession.serial#,
topsession.user_id, topsession.program, s.username,
s.sid,s.paddr,u.username
order by max(topsession.TOTAL) desc
/
The output may look like this:
STATUS Sid PROGRAM CPU WAITING IO TOTAL
------------ ----- ------------------------- ----------- ---------- ----- ------
DISCONNECTED 4518 11220 9 83 11312
CONNECTED 4584 9620 25 50 9695
DISCONNECTED 4683 das@usplsvpba002.verigy.n 5598 258 735 6591
et (TNS V1-V3)
CONNECTED 4888 oracle@usplsvped002.verig 483 956 0 1439
y.net (LGWR)
CONNECTED 4897 oracle@usplsvped002.verig 7 1119 0 1126
y.net (LMON)
DISCONNECTED 4614 158 75 552 785
DISCONNECTED 4698 sqlservr.exe 52 20 695 767
DISCONNECTED 4491 sqlservr.exe 102 35 496 633
DISCONNECTED 4698 sqlservr.exe 14 9 578 601
You can use the methods in Steps 2-3 above to get details about the sessions above.
Step #7:
If nothing stands out at this point, consult system logs and Oracle alert logs, as well as O/S performance tools (sar, top, glance, etc)