VMWare CPU Over Prescription!

Recently I was asked to look into performance issues on a VMware cluster. The IT department was adamant that the issues they were seeing were caused by poor SAN (Storage area network) I/O (Input/output). Having worked in infrastructure for many years, but most recently embarking on a journey into the world of DevOps (Development Operations), I was quick to put my hand up to look into the issues as I felt I had plenty of value to add!

INITIAL ANALYSIS:

Normally in a smaller environment, or more to the point, a smaller business, I would do a simple calculation to see how over provisioned the CPU is. This simple and extremely quick & mainly unscientific calculation came about for 2 reasons

  • Labor cost
  • Time frame (or lack of)

I call this calculation “CPU Contention Ratio” – Or CCR as it will be known moving forward. CCR is a crude method used for a quick analysis of the over prescription of physical CPU resource to virtual CPU resource. EG: How much physical CPU resource is actually backing the virtual CPU resource

Calculating CCR:
$a = (number of host logical CPU cores within the cluster)
$b = (number of virtual cores assigned to running Virtual Machines within the cluster)
$a / $b = (CCR)

Or, here is a function I’ve written to give us exactly that. Requirements to run this script are as follows:

  • VMWare PowerCLI installed
  • VMWare PowerCLI modules loaded into the current shell session (I’ve written a function for just that found here)
  • A Username & Password to the VCenter with at least read-only rights to the cluster and VM’s

function Get-VMCCR {
    param (
    $Datacenter,
    $Cluster
    )
    #Get the VM Host specific info and store in a variable
    $VMHosts = Get-Datacenter -Name $Datacenter | Get-Cluster -Name $Cluster | Get-VMHost
    $VMs = Get-Datacenter -Name $Datacenter | Get-Cluster -Name $Cluster | Get-VMHost | Get-VM | where {$_.powerstate -eq 'PoweredOn'}

    # Initilizing a new array 
    $VMHostCPUCounts = @()

    # Loop through the hosts to see if they have HyperThreading on and calculate as required
    foreach ($VMHost in $VMHosts)
        {
            IF ($VMHost.HyperthreadingActive -eq 'True')
                {
                    $VMHostCPUCount = "" | Select 'Name','PhysicalCPUCount','TotalLogicalCPUCount'
                    $VMHostCPUCount.Name = $VMHost.Name
                    $VMHostCPUCount.PhysicalCPUCount = $VMHost.NumCPU
                    $VMHostCPUCount.TotalLogicalCPUCount = ($VMHost.NumCPU)*2
                    $VMHostCPUCounts += $VMHostCPUCount
                }
            Else
                {
                    $VMHostCPUCount = "" | Select 'Name','PhysicalCPUCount','TotalLogicalCPUCount'
                    $VMHostCPUCount.Name = $VMHost.Name
                    $VMHostCPUCount.PhysicalCPUCount = $VMHost.NumCPU
                    $VMHostCPUCount.TotalLogicalCPUCount = $VMHost.NumCPU
                    $VMHostCPUCounts += $VMHostCPUCount
                }
        }



    # Dump the output to the console
    $Outputs = @()
    $Output = "" | Select 'Datacenter','Cluster','TotalLogicalCPUCount','AssignedVCPU','CCR'
    $Output.Datacenter = $Datacenter
    $Output.Cluster = $Cluster
    $Output.TotalLogicalCPUCount = ($VMHostCPUCounts.TotalLogicalCPUCount | Measure-Object -sum).sum
    $Output.AssignedVCPU = ($VMs.NumCPU | Measure-Object -Sum).sum
    $output.CCR = "{0:N2}" -f ((($VMHostCPUCounts.TotalLogicalCPUCount| Measure-Object -Sum).sum/($VMs.NumCPU | Measure-Object -Sum).sum))
    $Outputs += $Output
    
    return $outputs

}

And after running this function, I am output with the following:


Datacenter Cluster    TotalLogicalCPUCount AssignedVCPU CCR 
---------- -------    -------------------- ------------ --- 
Region01   Production                  136          434 0.31

As you can see, I have 0.31 of a physical core backing each virtual CPU assigned to a virtual machine within the cluster. In my opinion, this is to little. For real time applications, no less than about 0.9 CCR is acceptable and for general application servers that require performance, but are not real time, no less than 0.75 CCR is acceptable.

BUT ALAS – THIS WASN”T ENOUGH

So, now more analysis is required, as IT want a more scientific analysis of the exact issue and also proof that the storage isn’t the bottle neck as they are suggesting – BUT, more on the in-depth analysis and specifically VMWare statistics collection using VMWare PowerCLI in my next post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s