Monitoring Exadata Cell Servers with Alerts

Alerts indicate warning, critical, clear, and informational messages about operations within a cell.

 

You can list the alert definitions available for every condition in which an alert exists by running the below:

 

CellCLI> LIST ALERTDEFINITION ATTRIBUTES
name, metricName, description;
ADRAlert "Incident Alert"
HardwareAlert "Hardware Alert"
StatefulAlert_CD_IO_ERRS_MIN CD_IO_ERRS_MIN
"Threshold Alert"
StatefulAlert_CG_FC_IO_BY_SEC CG_FC_IO_BY_SEC
"Threshold Alert"
StatefulAlert_CG_FC_IO_RQ CG_FC_IO_RQ "Threshold
Alert"
StatefulAlert_CG_FC_IO_RQ_SEC CG_FC_IO_RQ_SEC
"Threshold Alert"
StatefulAlert_CG_FD_IO_BY_SEC CG_FD_IO_BY_SEC
"Threshold Alert"
StatefulAlert_CG_FD_IO_LOAD CG_FD_IO_LOAD
"Threshold Alert"
StatefulAlert_CG_FD_IO_RQ_LG CG_FD_IO_RQ_LG
"Threshold Alert"
StatefulAlert_CG_FD_IO_RQ_LG_SEC
CG_FD_IO_RQ_LG_SEC "Threshold Alert"
StatefulAlert_CG_FD_IO_RQ_SM CG_FD_IO_RQ_SM
"Threshold Alert"
StatefulAlert_CG_FD_IO_RQ_SM_SEC
CG_FD_IO_RQ_SM_SEC "Threshold Alert"
StatefulAlert_CG_IO_BY_SEC CG_IO_BY_SEC
"Threshold Alert"
StatefulAlert_CG_IO_LOAD CG_IO_LOAD "Threshold
Alert"
StatefulAlert_CG_IO_RQ_LG CG_IO_RQ_LG "Threshold
Alert"
StatefulAlert_CG_IO_RQ_LG_SEC CG_IO_RQ_LG_SEC
"Threshold Alert"
StatefulAlert_CG_IO_RQ_SM CG_IO_RQ_SM "Threshold
Alert"
StatefulAlert_CG_IO_RQ_SM_SEC CG_IO_RQ_SM_SEC
"Threshold Alert"
StatefulAlert_CG_IO_UTIL_LG CG_IO_UTIL_LG
"Threshold Alert"
StatefulAlert_CG_IO_UTIL_SM CG_IO_UTIL_SM
"Threshold Alert"
StatefulAlert_CG_IO_WT_LG CG_IO_WT_LG "Threshold
Alert"
StatefulAlert_CG_IO_WT_LG_RQ CG_IO_WT_LG_RQ
"Threshold Alert"
StatefulAlert_CG_IO_WT_SM CG_IO_WT_SM "Threshold
Alert"
StatefulAlert_CG_IO_WT_SM_RQ CG_IO_WT_SM_RQ
"Threshold Alert"
StatefulAlert_CL_FSUT CL_FSUT "Threshold
Alert"
StatefulAlert_CL_MEMUT CL_MEMUT "Threshold
Alert"
StatefulAlert_CT_FC_IO_BY_SEC CT_FC_IO_BY_SEC
"Threshold Alert"
StatefulAlert_CT_FC_IO_RQ CT_FC_IO_RQ "Threshold
Alert"
StatefulAlert_CT_FC_IO_RQ_SEC CT_FC_IO_RQ_SEC
"Threshold Alert"
StatefulAlert_CT_FD_IO_BY_SEC CT_FD_IO_BY_SEC
"Threshold Alert"
StatefulAlert_CT_FD_IO_LOAD CT_FD_IO_LOAD
"Threshold Alert"
StatefulAlert_CT_FD_IO_RQ_LG CT_FD_IO_RQ_LG
"Threshold Alert"
StatefulAlert_CT_FD_IO_RQ_LG_SEC
CT_FD_IO_RQ_LG_SEC "Threshold Alert"
StatefulAlert_CT_FD_IO_RQ_SM CT_FD_IO_RQ_SM
"Threshold Alert"
StatefulAlert_CT_FD_IO_RQ_SM_SEC
CT_FD_IO_RQ_SM_SEC "Threshold Alert"
StatefulAlert_CT_IO_BY_SEC CT_IO_BY_SEC
"Threshold Alert"
StatefulAlert_CT_IO_LOAD CT_IO_LOAD "Threshold
Alert"
StatefulAlert_CT_IO_RQ_LG CT_IO_RQ_LG "Threshold
Alert"
StatefulAlert_CT_IO_RQ_LG_SEC CT_IO_RQ_LG_SEC
"Threshold Alert"
StatefulAlert_CT_IO_RQ_SM CT_IO_RQ_SM "Threshold
Alert"
StatefulAlert_CT_IO_RQ_SM_SEC CT_IO_RQ_SM_SEC
"Threshold Alert"
StatefulAlert_CT_IO_UTIL_LG CT_IO_UTIL_LG
"Threshold Alert"
StatefulAlert_CT_IO_UTIL_SM CT_IO_UTIL_SM
"Threshold Alert"
StatefulAlert_CT_IO_WT_LG CT_IO_WT_LG "Threshold
Alert"
StatefulAlert_CT_IO_WT_LG_RQ CT_IO_WT_LG_RQ
"Threshold Alert"
StatefulAlert_CT_IO_WT_SM CT_IO_WT_SM "Threshold
Alert"
StatefulAlert_CT_IO_WT_SM_RQ CT_IO_WT_SM_RQ
"Threshold Alert"
StatefulAlert_DB_FC_IO_BY_SEC DB_FC_IO_BY_SEC
"Threshold Alert"
StatefulAlert_DB_FC_IO_RQ DB_FC_IO_RQ "Threshold
Alert"
StatefulAlert_DB_FC_IO_RQ_SEC DB_FC_IO_RQ_SEC
"Threshold Alert"
StatefulAlert_DB_FD_IO_BY_SEC DB_FD_IO_BY_SEC
"Threshold Alert"
StatefulAlert_DB_FD_IO_LOAD DB_FD_IO_LOAD
"Threshold Alert"
StatefulAlert_DB_FD_IO_RQ_LG DB_FD_IO_RQ_LG
"Threshold Alert"
StatefulAlert_DB_FD_IO_RQ_LG_SEC
DB_FD_IO_RQ_LG_SEC "Threshold Alert"
StatefulAlert_DB_FD_IO_RQ_SM DB_FD_IO_RQ_SM
"Threshold Alert"
StatefulAlert_DB_FD_IO_RQ_SM_SEC
DB_FD_IO_RQ_SM_SEC "Threshold Alert"
StatefulAlert_DB_IO_BY_SEC DB_IO_BY_SEC
"Threshold Alert"
StatefulAlert_DB_IO_LOAD DB_IO_LOAD "Threshold
Alert"
StatefulAlert_DB_IO_RQ_LG DB_IO_RQ_LG "Threshold
Alert"
StatefulAlert_DB_IO_RQ_LG_SEC DB_IO_RQ_LG_SEC
"Threshold Alert"
StatefulAlert_DB_IO_RQ_SM DB_IO_RQ_SM "Threshold
Alert"
StatefulAlert_DB_IO_RQ_SM_SEC DB_IO_RQ_SM_SEC
"Threshold Alert"
StatefulAlert_DB_IO_UTIL_LG DB_IO_UTIL_LG
"Threshold Alert"
StatefulAlert_DB_IO_UTIL_SM DB_IO_UTIL_SM
"Threshold Alert"
StatefulAlert_DB_IO_WT_LG DB_IO_WT_LG "Threshold
Alert"
StatefulAlert_DB_IO_WT_LG_RQ DB_IO_WT_LG_RQ
"Threshold Alert"
StatefulAlert_DB_IO_WT_SM DB_IO_WT_SM "Threshold
Alert"
StatefulAlert_DB_IO_WT_SM_RQ DB_IO_WT_SM_RQ
"Threshold Alert"
StatefulAlert_GD_IO_ERRS_MIN GD_IO_ERRS_MIN
"Threshold Alert"
Stateful_HardwareAlert "Hardware Stateful
Alert"
Stateful_SoftwareAlert "Software Stateful
Alert"

CellCLI>

 

As an example, if you want to list all “critical” alerts you can run the following command.

 

CellCLI> LIST ALERTHISTORY WHERE severity =
'critical' AND examinedBy = '' DETAIL;
name: 1_1
alertMessage: "Cell configuration check
discovered the following problems: Check Exadata configuration via
ipconf utility Config file exists : FAILED Error. the main
loop(ipconf.pl:768): Overall status for config verification: FAILED
[INFO] The ipconf check may generate a failure for temporary
inability to reach NTP or DNS server. You may ignore this alert, if
the NTP or DNS servers are valid and available. [INFO] You may
ignore this alert, if the NTP or DNS servers are valid and
available. [INFO] As root user run /usr/local/bin/ipconf -verify
-semantic to verify consistent network configurations. Could not
open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such
file or directory Could not open device at /dev/ipmi0 or
/dev/ipmi/0 or /dev/ipmidev/0: No such file or directory Get Device
ID command failed Could not open device at /dev/ipmi0 or
/dev/ipmi/0 or /dev/ipmidev/0: No such file or directory Get Device
ID command failed Unable to open SDR for reading Could not open
device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file
or directory Get Device ID command failed [EXCEPTION] Running
/opt/oracle.SupportTools/CheckHWnFWProfile"
alertSequenceID: 1
alertShortName: Software
alertType: Stateful
beginTime: 2011-02-21T13:27:41-05:00
endTime: 2011-02-21T14:22:52-05:00
examinedBy:
metricObjectName: checkconfig
notificationState: 0
sequenceBeginTime: 2011-02-21T13:27:41-05:00
severity: critical
alertAction: "Correct the configuration
problems. Then run cellcli command: ALTER CELL VALIDATE
CONFIGURATION Verify that the new configuration is correct."

name: 2
alertMessage: "RS-7445 [Required IP parameters
missing] [Check cellinit.ora] [] [] [] [] [] [] [] [] [] []"
alertSequenceID: 2
alertShortName: ADR
alertType: Stateless
beginTime: 2011-02-21T14:17:51-05:00
endTime:
examinedBy:
notificationState: 0
sequenceBeginTime: 2011-02-21T14:17:51-05:00
severity: critical
alertAction: "Errors in file
/opt/oracle/cell11.2.2.2.0_LINUX.X64_101206.2/log/diag/asm/cell/cell01/trace/rstrc_15435_4.trc
(incident=1). Please create an incident package for incident 1
using ADRCI and upload the incident package to Oracle Support. This
can be done as shown below. From a shell session on cell cell01,
enter the following commands: $ cd
/opt/oracle/cell11.2.2.2.0_LINUX.X64_101206.2/log $ adrci adrci>
set home diag/asm/cell/cell01 adrci> ips pack incident 1 in /tmp
<<<adrci displays a message including the name of
generated zip file>>> Add this zip file as an attachment
to an email message and send the message to Oracle Support."

name: 4
alertMessage: "RS-7445 [Serv CELLSRV comm
failed] [It will be restarted] [] [] [] [] [] [] [] [] [] []"
alertSequenceID: 4
alertShortName: ADR
alertType: Stateless
beginTime: 2011-05-03T14:06:19-04:00
endTime:
examinedBy:
notificationState: 0
sequenceBeginTime: 2011-05-03T14:06:19-04:00
severity: critical
alertAction: "Errors in file
/opt/oracle/cell11.2.2.2.0_LINUX.X64_101206.2/log/diag/asm/cell/cm01cel01/trace/rstrc_13402_4.trc
(incident=1). Please create an incident package for incident 1
using ADRCI and upload the incident package to Oracle Support. This
can be done as shown below. From a shell session on cell cell01,
enter the following commands: $ cd
/opt/oracle/cell11.2.2.2.0_LINUX.X64_101206.2/log $ adrci adrci>
set home diag/asm/cell/cm01cel01 adrci> ips pack incident 1 in
/tmp <<<adrci displays a message including the name of
generated zip file>>> Add this zip file as an attachment
to an email message and send the message to Oracle Support."
CellCLI>

 

To clear alerts, you need to mark them “examined” – the scripting below can be used to clear all alerts.

 

CellCLI> alter alerthistory all
examinedBy='JC';
Alert 1_1 successfully altered
Alert 1_2 successfully altered
Alert 2 successfully altered
Alert 3 successfully altered
Alert 4 successfully altered
CellCLI> list alerthistory where
examinedBy='';

< notice it’s empty >

 

CellCLI>

 

In the first listing in this section, we notice that it shows several dozen alerts and alert conditions that are pre-delivered with Exadata. We can also create our own alert conditions as follows:

 

CellCLI> create threshold
ct_io_wt_lg_rq.interactive warning=1000,critical=2000,
comparison='>', occurrences=2, observation=5;
Threshold ct_io_wt_lg_rq.interactive
successfully created

CellCLI>

 

In this example, we’ve created an alert threshold for the CT_IO_WT_LG_RG metric for the INTERACTIVE category specifying the warning and critical thresholds above with a number of observations and occurrences of 5 and 2, respectively.

Summary

Monitoring Exadata with alerts provides the ability to show alert conditions for all components of your Exadata storage cell in which alert conditions have been configured. You can create your own custom thresholds using the “create threshold” cellcli command in order to tailor alerts to your business needs.