mcelog internally also implements offlining the page through the kernel. I created a trigger called /etc/mcelog/joel.sh which just sends a basic email to my gmail account. Best I can make from the example trigger is that it sets a bunch of environmental variables before invoking the script. The knowledge article might contain additional actions that you or a service provider should take beyond those listed on line 14.

The error flow gives an overview over the various triggers (note some are missing) The DIMM and socket memory error triggers The /etc/mcelog/dimm-error-trigger and /etc/mcelog/socket-memory-error-trigger scripts are executed when a DIMM

The thresholds are configured in the mcelog.conf [dimm] and [socket] sections. more hot questions question feed about us tour help blog chat data legal privacy policy work here advertising info mobile contact us feedback Technology Life / Arts Culture / Recreation Science Always good to get someone else to look at the problem for issues like that. –Bratchley May 18 '13 at 21:15 Any ideas on why my joel.sh script just I'm looking for information on how to write triggers for it.

However, I would suggest to check for known issues and system bios version first.For more info on memory protection technology, please refer the following HP White Paperhttp://h20000.www2.hp.com/bc/docs/support/SupportManual/c02878598/c02878598.pdfIf system bios is latest for more clearfull identification, you should start HP insigth diagnostics. Triggers are usually shell scripts in the /etc/mcelog directory but can be also other internal actions. Otherwise I'd need to know a bit more information about the memory offset from a more detailed error. –Chopper3 May 7 '09 at 10:08 We're not running any of

See the Oracle ILOM documentation at: http://www.oracle.com/goto/ILOM/docs In addition, Oracle Auto Service Request can be configured to automatically request Oracle service when specific hardware problems occur from supported telemetry resources (such Sample trigger script, dimm-error-triggers: #!/bin/sh # This shell script can be executed by mcelog in daemon mode when a DIMM # exceeds a pre-configured error threshold # # environment: # THRESHOLD scroll down and watch all memory. Not sure if this is normal error or if this is a problem with the Memory or OS??

In my case the errors were only on MC1, csrow1, channel 0: [[email protected] ~]# grep "[0-9]" /sys/devices/system/edac/mc/mc*/csrow*/ch*_ce_count /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow1/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow2/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow3/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow3/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow4/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow4/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow5/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow5/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow6/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow6/ch1_ce_count:0 Arguments are passed as environment variables MESSAGE Human readable consolidated error message.

current community chat Unix & Linux Unix & Linux Meta your communities Sign up or log in to customize your list. How would they learn astronomy, those who don't see the stars? if so that'll offer a lot more info. When this happens, the mcelog daemon adds an entry to /var/log/mcelog .

Testing with mce-inject shows that the threshold is exceeded on every event up to the bucket capacity. Arguments are passed as environment variables THRESHOLD human readable threshold status MESSAGE Human readable consolidated error message TOTALCOUNT total corrected oruncorrected count of errors for current DIMM depending on what triggered UNIX is a registered trademark of The Open Group. here is and example ho it looks as bad.

If this occurs too often (whatever this means), you will receive this message. Often, the first interaction with the Fault Manager daemon is a system message indicating that a fault or defect has been diagnosed. If we can't work out which DIMM is dead while online it's not a showstopper -- I'm just on the lookout for ways to save time :~) –markdrayton May 7 '09 The page error trigger The /etc/mcelog/page-error-trigger script is executed by mcelog in daemon mode when a page in memory exceeds a pre-configured corrected or uncorrected error threshold.

All messages from the Fault Manager daemon use the following format: 1 SUNW-MSG-ID: SPX86A-8002-30, TYPE: Fault, VER: 1, SEVERITY: Minor 2 EVENT-TIME: Wed Nov 27 10:36:30 PST 2013 3 PLATFORM: SUN The environment arguments are the same as for the dimm-error-trigger script After the default action local actions in /etc/mcelog/page-error-trigger.loccal are executed.