Tuesday, September 20, 2011

BSOD (Blue Screen Of Death)


I have experienced few BSOD last week, after updating some of my device drivers on my workstation Windows 7 SP1 (x86). Since I was updating from Microsoft update, i was not suspecting that there might be problems with updated drivers. So, I have scheduled checkdisk of my HDD for surface scan, suspecting for bad sectors. Fortunately, my HDD was free of bad sectors, so I decided to examine the crash dump file using windbg. For more info how to obtain windbg, how to read small memory dump and how to setup symbols please check Microsoft articles : http://support.microsoft.com/kb/311503 and http://support.microsoft.com/kb/315263 .

Here is the output from my minidump :

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************
VIDEO_TDR_FAILURE (116)
Attempt to reset the display driver and recover from timeout failed.
Arguments:
Arg1: 864cf100, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: 92906326, The pointer into responsible device driver module (e.g. owner tag).
Arg3: c00000b5, Optional error code (NTSTATUS) of the last failed operation.
Arg4: 0000000a, Optional internal context dependent data.
Debugging Details:
------------------
Unable to load image \SystemRoot\system32\DRIVERS\nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
*** ERROR: Module load completed but symbols could not be loaded for nvlddmkm.sys
FAULTING_IP:
nvlddmkm+de326
92906326 55              push    ebp
DEFAULT_BUCKET_ID:  GRAPHICS_DRIVER_TDR_FAULT
CUSTOMER_CRASH_COUNT:  1
BUGCHECK_STR:  0x116
PROCESS_NAME:  System
CURRENT_IRQL:  0
STACK_TEXT: 
8e4fdb74 9333507b 00000116 864cf100 92906326 nt!KeBugCheckEx+0x1e
8e4fdb98 93329937 92906326 c00000b5 0000000a dxgkrnl!TdrBugcheckOnTimeout+0x8d
8e4fdbbc 9336592c c00000b5 00000102 8710a008 dxgkrnl!TdrIsRecoveryRequired+0xb8
8e4fdc34 9338f944 fffffcfb 00565127 00000000 dxgmms1!VidSchiReportHwHang+0x3c0
8e4fdc5c 93390065 00000000 00000000 00000000 dxgmms1!VidSchiCheckHwProgress+0x68
8e4fdc98 9336c8f0 8e4fdc90 86104c20 8636dd08 dxgmms1!VidSchiWaitForSchedulerEvents+0x1b1
8e4fdd28 933913c9 8710a008 82c4d509 8710a008 dxgmms1!VidSchiScheduleCommandToRun+0xaa
8e4fdd3c 93391485 8710a008 00000000 87121410 dxgmms1!VidSchiRun_PriorityTable+0xf
8e4fdd50 82e1efda 8710a008 be498b21 00000000 dxgmms1!VidSchiWorkerThread+0x7f
8e4fdd90 82cc71d9 93391406 8710a008 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19

STACK_COMMAND:  .bugcheck ; kb
FOLLOWUP_IP:
nvlddmkm+de326
92906326 55              push    ebp
SYMBOL_NAME:  nvlddmkm+de326
FOLLOWUP_NAME:  MachineOwner
MODULE_NAME: nvlddmkm
IMAGE_NAME:  nvlddmkm.sys
DEBUG_FLR_IMAGE_TIMESTAMP:  4c379162
FAILURE_BUCKET_ID:  0x116_IMAGE_nvlddmkm.sys
BUCKET_ID:  0x116_IMAGE_nvlddmkm.sys
Followup: MachineOwner
---------

From the output I have noticed that BSOD was caused by the Nvidia video card and graphic driver nvlddmkm.sys. So, after updating the graphic driver with latest drivers from Nvidia site, case was successfully closed.

Wednesday, September 14, 2011

Asking yourself how many messages are transported in your Exchange organization ?

In my migration scenario for Exchange 2010 I was using Exchange calculator from Exchange team http://blogs.technet.com/b/exchange/archive/2009/11/09/3408737.aspx , and in order to calculate required number of IOPS per database (or server), one of the required input parameters are total messages per mailbox per day and average message size in KB. So, if you don't want just to guess number of total messages per mailbox per day and average message size, you can use Rob's script for gathering email statistics http://gallery.technet.microsoft.com/scriptcenter/bb94b422-eb9e-4c53-a454-f7da6ddfb5d6?SRC=Home . From all the data gathered you can use "Received Total Messages" and "Sent Unique Total" to get Total Number of messages per user. Also, you can use Received MB Total and Sent Unique MB Total to count daily traffic in MB for each user. Using these two parameters you can count average number of total messages per mailbox per day, and average size in MB (KB) for each message and fill in the required parameters in Exchange calculator for Total Send/Receive Capability / Mailbox / Day and Average Message Size (KB).
With small modification of the script you can gather statistics for maximum of 30 days (if you have not changed the default settings for tracking logs history on hub transport servers) and if hub transport server has not hit 1GB of disk space limit. To eliminate so many "ifs" you can use Rob's great script for hub transport message tracking log information : http://mjolinor.wordpress.com/2011/02/11/how-far-back-do-your-message-tracking-logs-really-go/ .

Tuesday, September 13, 2011

You can't make your Exchange 2010 Jetstress tests to pass

If you're testing your disk subsystem for Exchange 2010 with Jetstress and you're unable to make it "green" pass because database read latency is higher than 20 msec than it's time for reducing number of threads and for fine tuning. But, if you reduce number of threads than you will lose number of IOPS.
For example: with 2 threads you can't achieve required number of IOPS and with 3 threads you're achieving number of IOPS but database read latencies are higher than 20 msecs, in that case you can use "SluggishSessions" parameter. You can find this parameter in JetstressConfig.xml file. By default this parameter is set to 1, you can start increasing this number by 1, which will make Jetstress to add pause between tasks. With increasing "SluggishSessions" parameter you will lose IOPS.

For example : with thread count 3 and SluggishSession 2, I was able to achieve required number of IOPS but database read latencies were still higher than 20msec:

So with increasing number of SluggishSessions to 3 I was loosing IOPS but database read latencies were lower than 20 msec:


For more on threads and "SlugishSessions" check the following Technet article : http://technet.microsoft.com/en-us/library/ff459238.aspx

Monday, September 12, 2011

Your VM running on Hyper V is not in sync (time)

I have experienced something strange on Hyper V cluster based on Windows Server 2008 R2 SP1 Enterprise and VM guest with same OS but Standard edition, with time synchronization enabled but VM was still out of sync.
I have checked that Windows time service was running and queried the source for synchronization and it was free running system clock !
After restarting the windows time service, the VM started to sync with local CMOS clock instead of synchronizing with host (parent partition).
I'm guessing that VM was not reading the setting for the time synchronization from the VM configuration, so I have disabled the setting and re enabled and restarted the windows time service, and finally within few seconds the VM was synchronizing the time with host :

One more thing : if the server is Windows 2008 R2 and the machine is not domain joined the windows time service will stop automatically. On reboot you can see the following event :

This behaviour is by default and you can find more info on translated Japanese kb :
http://translate.google.com/translate?hl=en&sl=ja&u=http://support.microsoft.com/kb/2385818/ja