This week one of my colleagues has been performing updates on our hypervisor hosts including those running Hyper-V. During this work we received alerts from our monitoring system to indicate two of the hosts had essentially run out of disk space, obviously a little worrying. We stopped the update process while troubleshooting was carried out and in this post I’m going to cover the steps I took to diagnose the issue. These hosts run Windows Server 2012 R2 Core edition so local GUI tools are very limited meaning much of this work was carried out via PowerShell.
First I connected to one of the suspect hosts and retrieved the current disk space usage to confirm what the monitoring system had reported. It’s important to note that we use Microsoft System Center Virtual Machine Manager (SCVMM) to build our hosts. These are HPE BL460c Gen8 blades with a pair of local disks in a RAID1 and the D:\ drive is basically the main drive while C:\ is actually the VHDX file with our host operating system – it will all become apparent as we explore the disks below.
Let’s use the Get-Volume PowerShell cmdlet to view our current volume attributes.
PS C:\> get-volume -DriveLetter c, d DriveLetter FileSystemL FileSystem DriveType HealthStat SizeRemain Size abel us ing ----------- ----------- ---------- --------- ---------- ---------- ---- C NTFS Fixed Healthy 45.27 GB 59.48 GB D OS NTFS Fixed Healthy 32 MB 136.4 GB
OK so we can see that the D:\ drive only has 32MB of space left which is certainly not ideal. Let’s take a look on the drive itself to see what objects are consuming space.
PS C:\> Get-ChildItem "D:" Directory: D:\ Mode LastWriteTime Length Name ---- ------------- ------ ---- -a--- 28/04/2017 14:55 64428703744 S2012R2DC-CORE.vhdx
It looks like there is only a single file, our operating system VHDX however the file size is far too small to consume all of the 137GB of drive space available so there must be something else we are not seeing by default. Let’s take another look using the Get-ChildItem cmdlet but this time adding the -Hidden parameter.
PS C:\> Get-ChildItem "D:" -hidden Directory: D:\ Mode LastWriteTime Length Name ---- ------------- ------ ---- -a-hs 28/04/2017 09:14 81882251264 pagefile.sys
Well now would you look at that – we have a page file which is rather large to say the least, in fact it’s double the size of a ‘healthy’ host! At this point I think we have found our culprit, the question is why did pagefile.sys grow to double the size it was previously? The only change had been the HPE SPP update pack installation but I knew this wouldn’t have caused the problem directly. It had to be related to the reboot involved and something else which had occurred recently. I know crash dumps are saved to the pagefile.sys location so I went looking for memory dumps but couldn’t find any. Next it was time to look at the crash dump settings, below I will show a few ways of doing this for those interested.
We can remotely connect to the server registry using regedit.exe and then browse to the relevant key. I think first off it is worth viewing a host which does not have the larger than expected page file to see how it is configured. The value we are interested in is CrashDumpEnabled which is set to 7 on a healthy host, this is the default which equates to a kernel memory dump. If you would like more information on the values check this MSDN article – https://msdn.microsoft.com/en-us/library/windows/hardware/mt586681(v=vs.85).aspx
We can retrieve the same information via the reg query command as below.
C:\>reg query HKLM\SYSTEM\CurrentControlSet\Control\CrashControl HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl LogEvent REG_DWORD 0x1 Overwrite REG_DWORD 0x1 AutoReboot REG_DWORD 0x1 DumpFile REG_EXPAND_SZ %SystemRoot%\MEMORY.DMP DisableEmoticon REG_DWORD 0x1 CrashDumpEnabled REG_DWORD 0x7 MinidumpDir REG_EXPAND_SZ %SystemRoot%\Minidump MinidumpsCount REG_DWORD 0x32
PowerShell can also be used here by leveraging Get-ItemProperty. We can add the -Name parameter to limit the returned data to just the CrashDumpEnabled value.
PS C:\> Get-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\" LogEvent : 1 Overwrite : 1 AutoReboot : 1 DumpFile : C:\Windows\MEMORY.DMP DisableEmoticon : 1 CrashDumpEnabled : 7 MinidumpDir : C:\Windows\Minidump MinidumpsCount : 50 PSPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\ PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control PSChildName : CrashControl PSDrive : HKLM PSProvider : Microsoft.PowerShell.Core\Registry
PS C:\> Get-ItemProperty -path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\" -Name CrashDumpEnabled CrashDumpEnabled : 7 PSPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\ PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control PSChildName : CrashControl PSDrive : HKLM PSProvider : Microsoft.PowerShell.Core\Registry
So the healthy host is set with the default – I wonder what the problem hosts are set to? Let’s run through the same process of commands to compare.
First off we have our screenshot from the remote registry connection.
Next let’s try our reg query command and then we shall use our PowerShell Get-ItemProperty cmdlet.
C:\> reg query HKLM\SYSTEM\CurrentControlSet\Control\CrashControl HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl LogEvent REG_DWORD 0x1 Overwrite REG_DWORD 0x1 AutoReboot REG_DWORD 0x1 DumpFile REG_EXPAND_SZ %SystemRoot%\MEMORY.DMP DisableEmoticon REG_DWORD 0x1 CrashDumpEnabled REG_DWORD 0x2 MinidumpDir REG_EXPAND_SZ %SystemRoot%\Minidump MinidumpsCount REG_DWORD 0x32
PS C:\> Get-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\" LogEvent : 1 Overwrite : 1 AutoReboot : 1 DumpFile : C:\Windows\MEMORY.DMP DisableEmoticon : 1 CrashDumpEnabled : 2 MinidumpDir : C:\Windows\Minidump MinidumpsCount : 50 PSPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\ PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control PSChildName : CrashControl PSDrive : HKLM PSProvider : Microsoft.PowerShell.Core\Registry
We can see that the CrashDumpEnabled attribute has been modified on the problem host to a value of 2 which as we know from our MSDN link is a kernel memory dump. A kernel memory dump (according to MSDN https://msdn.microsoft.com/en-us/library/windows/hardware/ff551867(v=vs.85).aspx) is typically roughly a third the size of physical memory. These blades have 256GB of RAM so the 80GB pagefile.sys is about right. Let’s try setting the value back to the expected default on Server 2012 R2 then reboot the host to see what our new pagefile.sys looks like.
To do this we make use of the PowerShell cmdlet Set-ItemProperty.
PS C:\> Set-ItemProperty -path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\" -Name "CrashDumpEnabled" -Value "7"
Next it’s time to reboot and then check our space usage to see if anything has changed.
PS C:\> Get-Volume -DriveLetter c, d DriveLetter FileSystemL FileSystem DriveType HealthStat SizeRemain Size abel us ing ----------- ----------- ---------- --------- ---------- ---------- ---- C NTFS Fixed Healthy 45.3 GB 59.48 GB D OS NTFS Fixed Healthy 38.29 GB 136.4 GB PS C:\> Get-ChildItem d: -hidden Directory: D:\ Mode LastWriteTime Length Name ---- ------------- ------ ---- -a-hs 28/04/2017 15:02 40802189312 pagefile.sys
Hmm pagefile.sys has now returned to ~40GB which is the same as a healthy host – it would appear the issue has resolved which is always nice! Thing is I have more than one host to configure so how can I quickly modify them all in one go? PowerShell ftw! I’ll create a variable and assign it’s value to be the returned output from my Get-ClusterNode cmdlet. Then I can run a foreach loop through each server and use the Invoke-Command cmdlet to get them to modify the registry setting.
$servers = (Get-ClusterNode -Cluster "ClusterName") foreach ($server in $servers) { invoke-command -ComputerName $server {Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl" -Name "CrashDumpEnabled" -Value 7} }
We may wonder why, if the CrashDumpEnabled values of 2 and 7 are basically the same do we see a difference in page file size? Well this can be expanded upon in the following TechNet article – https://blogs.technet.microsoft.com/askcore/2012/09/12/windows-8-and-windows-server-2012-automatic-memory-dump/
The TL;DR is as follows, quoted from the previous link –
The “System Managed” page file has been updated to reduce the page file size on disk, primarily for small SSDs but will also benefit servers with large amounts or ram.
The “Automatic memory dump” is not really a new memory dump type. In previous versions of Windows, we already have Mini, Kernel, and Complete memory dump options. The Automatic memory dump option produces a Kernel memory dump, the difference is when you select Automatic it allows the SMSS process to reduce the page file smaller than the size of RAM.
As per the Hyper-V team best practice we leave our hosts with a system managed page file size. The automatic memory dump setting does not require our system to reserve roughly a third of physical memory (~80GB) resulting in our smaller page file of ~40GB. Further review with the team highlighted a setting change had been made as part of troubleshooting an issue with Microsoft support. We have since reverted all hosts to the expected default value of 7 in the registry and following a round of reboots everything is good in the world… at least until something else breaks!