Unexpected Large Page File Troubleshooting

This week one of my colleagues has been performing updates on our hypervisor hosts including those running Hyper-V. During this work we received alerts from our monitoring system to indicate two of the hosts had essentially run out of disk space, obviously a little worrying. We stopped the update process while troubleshooting was carried out and in this post I’m going to cover the steps I took to diagnose the issue. These hosts run Windows Server 2012 R2 Core edition so local GUI tools are very limited meaning much of this work was carried out via PowerShell.


First I connected to one of the suspect hosts and retrieved the current disk space usage to confirm what the monitoring system had reported. It’s important to note that we use Microsoft System Center Virtual Machine Manager (SCVMM) to build our hosts. These are HPE BL460c Gen8 blades with a pair of local disks in a RAID1 and the D:\ drive is basically the main drive while C:\ is actually the VHDX file with our host operating system – it will all become apparent as we explore the disks below.

Let’s use the Get-Volume PowerShell cmdlet to view our current volume attributes.

PS C:\> get-volume -DriveLetter c, d
 
DriveLetter FileSystemL FileSystem  DriveType  HealthStat SizeRemain       Size
            abel                               us                ing
----------- ----------- ----------  ---------  ---------- ----------       ----
C                       NTFS        Fixed      Healthy      45.27 GB   59.48 GB
D           OS          NTFS        Fixed      Healthy         32 MB   136.4 GB

OK so we can see that the D:\ drive only has 32MB of space left which is certainly not ideal. Let’s take a look on the drive itself to see what objects are consuming space.

PS C:\> Get-ChildItem "D:"
 
    Directory: D:\
 
Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---        28/04/2017     14:55 64428703744 S2012R2DC-CORE.vhdx

It looks like there is only a single file, our operating system VHDX however the file size is far too small to consume all of the 137GB of drive space available so there must be something else we are not seeing by default. Let’s take another look using the Get-ChildItem cmdlet but this time adding the -Hidden parameter.

PS C:\> Get-ChildItem "D:" -hidden
 
    Directory: D:\
 
Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a-hs        28/04/2017     09:14 81882251264 pagefile.sys

Well now would you look at that – we have a page file which is rather large to say the least, in fact it’s double the size of a ‘healthy’ host! At this point I think we have found our culprit, the question is why did pagefile.sys grow to double the size it was previously? The only change had been the HPE SPP update pack installation but I knew this wouldn’t have caused the problem directly. It had to be related to the reboot involved and something else which had occurred recently. I know crash dumps are saved to the pagefile.sys location so I went looking for memory dumps but couldn’t find any. Next it was time to look at the crash dump settings, below I will show a few ways of doing this for those interested.

We can remotely connect to the server registry using regedit.exe and then browse to the relevant key. I think first off it is worth viewing a host which does not have the larger than expected page file to see how it is configured. The value we are interested in is CrashDumpEnabled which is set to 7 on a healthy host, this is the default which equates to a kernel memory dump. If you would like more information on the values check this MSDN article – https://msdn.microsoft.com/en-us/library/windows/hardware/mt586681(v=vs.85).aspx

Healthy Host Crash Dump Registry Settings

We can retrieve the same information via the reg query command as below.

C:\>reg query HKLM\SYSTEM\CurrentControlSet\Control\CrashControl
 
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl
    LogEvent    REG_DWORD    0x1
    Overwrite    REG_DWORD    0x1
    AutoReboot    REG_DWORD    0x1
    DumpFile    REG_EXPAND_SZ    %SystemRoot%\MEMORY.DMP
    DisableEmoticon    REG_DWORD    0x1
    CrashDumpEnabled    REG_DWORD    0x7
    MinidumpDir    REG_EXPAND_SZ    %SystemRoot%\Minidump
    MinidumpsCount    REG_DWORD    0x32

PowerShell can also be used here by leveraging Get-ItemProperty. We can add the -Name parameter to limit the returned data to just the CrashDumpEnabled value.

PS C:\> Get-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\"
 
LogEvent         : 1
Overwrite        : 1
AutoReboot       : 1
DumpFile         : C:\Windows\MEMORY.DMP
DisableEmoticon  : 1
CrashDumpEnabled : 7
MinidumpDir      : C:\Windows\Minidump
MinidumpsCount   : 50
PSPath           : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\
PSParentPath     : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
PSChildName      : CrashControl
PSDrive          : HKLM
PSProvider       : Microsoft.PowerShell.Core\Registry
PS C:\> Get-ItemProperty -path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\" -Name CrashDumpEnabled


CrashDumpEnabled : 7
PSPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\
PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
PSChildName : CrashControl
PSDrive : HKLM
PSProvider : Microsoft.PowerShell.Core\Registry

So the healthy host is set with the default – I wonder what the problem hosts are set to? Let’s run through the same process of commands to compare.

First off we have our screenshot from the remote registry connection.

Problem Host Crash Dump Registry Settings

Next let’s try our reg query command and then we shall use our PowerShell Get-ItemProperty cmdlet.

C:\> reg query HKLM\SYSTEM\CurrentControlSet\Control\CrashControl

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl
LogEvent REG_DWORD 0x1
Overwrite REG_DWORD 0x1
AutoReboot REG_DWORD 0x1
DumpFile REG_EXPAND_SZ %SystemRoot%\MEMORY.DMP
DisableEmoticon REG_DWORD 0x1
CrashDumpEnabled REG_DWORD 0x2
MinidumpDir REG_EXPAND_SZ %SystemRoot%\Minidump
MinidumpsCount REG_DWORD 0x32
PS C:\> Get-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\"

LogEvent : 1
Overwrite : 1
AutoReboot : 1
DumpFile : C:\Windows\MEMORY.DMP
DisableEmoticon : 1
CrashDumpEnabled : 2
MinidumpDir : C:\Windows\Minidump
MinidumpsCount : 50
PSPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\CrashControl\
PSParentPath : Microsoft.PowerShell.Core\Registry::HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control
PSChildName : CrashControl
PSDrive : HKLM
PSProvider : Microsoft.PowerShell.Core\Registry

We can see that the CrashDumpEnabled attribute has been modified on the problem host to a value of 2 which as we know from our MSDN link is a kernel memory dump. A kernel memory dump (according to MSDN https://msdn.microsoft.com/en-us/library/windows/hardware/ff551867(v=vs.85).aspx) is typically roughly a third the size of physical memory. These blades have 256GB of RAM so the 80GB pagefile.sys is about right. Let’s try setting the value back to the expected default on Server 2012 R2 then reboot the host to see what our new pagefile.sys looks like.

To do this we make use of the PowerShell cmdlet Set-ItemProperty.

PS C:\> Set-ItemProperty -path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl\" -Name "CrashDumpEnabled" -Value "7"

Next it’s time to reboot and then check our space usage to see if anything has changed.

PS C:\> Get-Volume -DriveLetter c, d
 
DriveLetter FileSystemL FileSystem  DriveType  HealthStat SizeRemain       Size
            abel                               us                ing
----------- ----------- ----------  ---------  ---------- ----------       ----
C                       NTFS        Fixed      Healthy       45.3 GB   59.48 GB
D           OS          NTFS        Fixed      Healthy      38.29 GB   136.4 GB
 
 
 
PS C:\> Get-ChildItem d: -hidden
 
    Directory: D:\
 
Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a-hs        28/04/2017     15:02 40802189312 pagefile.sys

Hmm pagefile.sys has now returned to ~40GB which is the same as a healthy host – it would appear the issue has resolved which is always nice! Thing is I have more than one host to configure so how can I quickly modify them all in one go? PowerShell ftw! I’ll create a variable and assign it’s value to be the returned output from my Get-ClusterNode cmdlet. Then I can run a foreach loop through each server and use the Invoke-Command cmdlet to get them to modify the registry setting.

$servers = (Get-ClusterNode -Cluster "ClusterName")

foreach ($server in $servers) {

invoke-command -ComputerName $server {Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Control\CrashControl" -Name "CrashDumpEnabled" -Value 7}

}

 


We may wonder why, if the CrashDumpEnabled values of 2 and 7 are basically the same do we see a difference in page file size? Well this can be expanded upon in the following TechNet article – https://blogs.technet.microsoft.com/askcore/2012/09/12/windows-8-and-windows-server-2012-automatic-memory-dump/

The TL;DR is as follows, quoted from the previous link –

The “System Managed” page file has been updated to reduce the page file size on disk, primarily for small SSDs but will also benefit servers with large amounts or ram.

The “Automatic memory dump” is not really a new memory dump type. In previous versions of Windows, we already have Mini, Kernel, and Complete memory dump options. The Automatic memory dump option produces a Kernel memory dump, the difference is when you select Automatic it allows the SMSS process to reduce the page file smaller than the size of RAM.

As per the Hyper-V team best practice we leave our hosts with a system managed page file size. The automatic memory dump setting does not require our system to reserve roughly a third of physical memory (~80GB) resulting in our smaller page file of ~40GB. Further review with the team highlighted a setting change had been made as part of troubleshooting an issue with Microsoft support. We have since reverted all hosts to the expected default value of 7 in the registry and following a round of reboots everything is good in the world… at least until something else breaks!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.