Find CPU hogging vms using PowerCLI

by Grzegorz Kulikowski

Hello,
this time i would like to show how can we check if we have any vms that are hogging cpu for too long. I bet you are already preventing this using VC alarms. So lets build a simple script that helps us out in getting list of vms that have cpu usage at near 100% for some time.
What we have to do here is:
1) figure out where alarms are triggered , for which vms
2) figure out how to get the root folder of alarms
3) figure out how to get definitions of alarms
4) figure out how to select vms that are using cpu too much for some predefined period of time
5) figure out how to get the cpu stats for vms from point number 4

Let’s go:
Lets first get the service instance object:
[sourcecode language=”powershell”]
$si=get-view serviceinstance
[/sourcecode]
I have decided to put more explanations in this post.
So, serviceinstance, what is that ? I think that at best it is described in the documentation. “The ServiceInstance managed object is the singleton root object of the inventory on both vCenter servers and servers running on standalone host agents. The server creates the ServiceInstance automatically, and also automatically creates the various manager entities that provide services in the virtual environment. Some examples of manager entities are LicenseManager, PerformanceManager, and ViewManager. You can access the manager entities through the content property.”
Make sure to use that link to read service instance description, it will really help to understand how it works.
We will be using alarm manager in order to achieve our goal, let’s have a look how we get there.
sicontent
And ? What to do now ? We see Alarm-AlarmManager. If we would like to return it, we will just get something with Type,Value properties. So how this can help us?
Let’s see what is this by checking the type, and then use get-help on get-view parameter called ID. This should give us some hint what to do next. So This magic Alarm-AlarmManager is actually a moref/ManagedObjectReference , having that we can get its view using get-view.
amn

Then define from when we should build the statistics for vm cpu hogging
[sourcecode language=”powershell”]
$days=-7
[/sourcecode]
Let’s get the VC root folder , and then find out which alarms are triggered there on.
[sourcecode language=”powershell”]
$rootfolderviewalarms=(get-view -id $si.Content.RootFolder).TriggeredAlarmState
[/sourcecode]
Then let’s get the alaram manager object:
[sourcecode language=”powershell”]
$am=get-view -id $si.Content.AlarmManager
[/sourcecode]
Let’s get defined alarms ids in our root vc folder.
We can do this for example using GetAlarm method from our AlarmManager. But..but..but.. how ?! Ok, lets take few steps back. What if we do not know how to do this, or if there even is a method that can do this ? First thing we can do is to inspect the alarm manager, to see what he can do for us.
On the screenshot below, you can see our alarm manager object $am. We want to see what it can do for us, we use get-member to get list of his methods. From there we can see that he has a method called GetAlarm. In order to check what it does we can use the documentation.
From the output of this method, we know that it will be returning morefs(defined alarm ids on particular entity), and in order to use it we need to give it a entity moref, a place where we look for alarms. Now the documentation for GetAlarm method also states that if the entity will not be set, then it will return all visible alarms. If you would like to use it like that, you would have to run it with $null as argument.
amm
In this example we will get only alarms defined on the root of our Virtual Center server.
[sourcecode language=”powershell”]
$alarmids=$am.GetAlarm($si.Content.rootfolder)
[/sourcecode]
Once again, what is it that was returned ? Morefs ! correct. As such they don’t hold too much information. We can get that information though. Using what ? Get-View , connected to that id. Screenshot below shows how to get from ids to actual alarm definition objects with information.
Lets take information about those alarms.
[sourcecode language=”powershell”]
$alarmdefinitions=get-view -id $alarmids
[/sourcecode]
alarmsa1
Let’s find the alarm id that describes the alarm for virtual machine cpu usage.
Now this part is bit tricky. I am making assumption here that we have only 1 alarm defined for vm cpu usage, and that it was defined in the root virtual center folder. This script will not work if you have defined more alarms than 1 for vm cpu usage because i am searching only for alarm system name ‘alarm.VmCPUUsageAlarm’, and i am not filtering by its name. So we are looking at $alarmdefinitions array that holds definitions of alarms with .info object that has a systemname property. We filter it so we can get in result the VM cpu usage alarm, and selecting its alarm moref.
[sourcecode language=”powershell”]
$vmcpuusagealarmid=($alarmdefinitions|Where-Object {$_.Info.Systemname -eq ‘alarm.VmCPUUsageAlarm’ }).info.alarm
$vmcpuusagealarmid
Type Value
—- —–
Alarm alarm-6
[/sourcecode]
So the alarm id that is about vm cpu usage is Alarm-alarm-6.
Let’s get now ids of virtual machines that have currently triggered alarm that we have found previously. Our root VC container had a property called TriggeredAlarmState that holds morefs of entities and alarm ids triggered on those entities. We will now filter them them to get only those that have vm cpu usage alarm triggered.
alarmsa2
[sourcecode language=”powershell”]
$vmswithcpualarms=$rootfolderviewalarms|?{$_.Alarm -eq $vmcpuusagealarmid}
[/sourcecode]
So now we have in $vmswithcpualarms only those triggered alarm states that match our vms with alarm of vm cpu usage.
Let’s change ids to vm view objects, so that we can grab vm names. So far we have only in $vmsswithcpualarms properties called Entity which is only a moref.
[sourcecode language=”powershell”]
$cpuhoggingVMs=get-view -property name -id ($vmswithcpualarms | %{$_.Entity})
[/sourcecode]
Now let’s build statistics from a ‘cpu.usage.average’ metric. We will use 2h intervals and we would like to get data from -7 days until now as stated in $days variable. PowerCLI gives us get-stat cmdlet that we will use. It accepts entity names. I am using $cpuhoggingVMS|%{$_.name} in order to return only names directly. Same as if you would type : -Entity ‘vm1′,’vm2′,’vm3′,’vm4’
[sourcecode language=”powershell”]
$result=Get-Stat -Entity ($cpuhoggingVMS|%{$_.name}) -Start (Get-Date).AddDays($days) -Finish (get-date) -Stat ‘cpu.usage.average’ -IntervalMins 120
[/sourcecode]
We have now our data stored in $result variable, we have a lot of data there, for each virtual machine statistics about its cpu usage.
Ok what if you say that you can not distinguish from which vm is that statistic data ? I say : “We need to go deeper” 🙂
So gm or get-member on the result entry shows that there are more properties than only those which are displayed.

alarmsa3

And the last line! We will group statistics for the vms by their name, and then for each of them we will measure their average cpu usage during that period of time.
[sourcecode language=”powershell”]
$reportVMcpu=$result | select value,Entity | Group-Object -Property entity | % {$temp=$_; $temp.group | Measure-Object -Property value -average | select @{n=’CPU % Average usage’;e={[math]::round($_.average,3)}}, @{n=’entity’;e={$temp.name} }} | Sort-Object -Propert ‘CPU % Average usage’ -Descending
[/sourcecode]
Now, if we will display our report, we will get a summary of vms and its corresponding average cpu usage through last 7 days in our example.
vmcpuhoggers
We can now tell that some vms here have an average of cpu usage for lat 7 days at 99%, that would indicate that something went bad inside this vm, and we need to investigate it. We do not like vms that hog cpus without any reason for too long 😉
Why would we want this report ?
Ok, i bet you are using alarms for VM cpu usage, and the alarm kicks in after 5..15..30.. minutes for example. You might assume that something went wrong inside the vm, but there are vms that for example are working really hard only during some specific time window. For example systems that are doing calculations at end of the month, or that use cpu only for few days in week/month as per design/function. Each time that alarms is triggered you would have to go to vm performance, and check if this is abnormal situation/call the vm owner/ or look for any pattern in its cpu usage. If you will see that this vm behaves as expected because it is normal to consume that amount of cpu only on Mondays, you would ignore that alarm and just wait to see if alarm gets cleared as previously.

I hope that this post will help you start using alarm manager and other managers, as well as understand morefs and using get-view.

Below is the code without any comments.
[sourcecode language=”powershell”]
$si=get-view serviceinstance
$days=-7
$rootfolderviewalarms=(get-view -id $si.Content.RootFolder).TriggeredAlarmState
$am=get-view -id $si.Content.AlarmManager
$alarmids=$am.GetAlarm($si.Content.rootfolder)
$alarmdefinitions=get-view -id $alarmids
$vmcpuusagealarmid=($alarmdefinitions|Where-Object {$_.Info.Systemname -eq ‘alarm.VmCPUUsageAlarm’ }).info.alarm
$vmswithcpualarms=$rootfolderviewalarms|?{$_.Alarm -eq $vmcpuusagealarmid}
$cpuhoggingVMs=get-view -property name -id ($vmswithcpualarms | %{$_.Entity})
$result=Get-Stat -Entity ($cpuhoggingVMS|%{$_.name}) -Start (Get-Date).AddDays($days) -Finish (get-date) -Stat ‘cpu.usage.average’ -IntervalMins 120
$reportVMcpu=$result | select value,Entity | Group-Object -Property entity | % {$temp=$_; $temp.group | Measure-Object -Property value -average | select @{n=’CPU % Average usage’;e={[math]::round($_.average,3)}}, @{n=’entity’;e={$temp.name} }} | Sort-Object -Propert ‘CPU % Average usage’ -Descending
[/sourcecode]

Enjoy!

You may also like

2 comments

Get-view, list viewtypes, filter usage, Get-VIObjectbyVIView, and get-vm in powercli | VMware and Powershell February 2, 2014 - 2:03 am

[…] I hope this post explains a little bit of differences between get-vm and get-view. If you have any questions please post a comment and i will update the post then. Another post that i wrote recently that explains it as well http://psvmware.wordpress.com/2014/02/02/find-cpu-hogging-vms-using-powercli/ […]

Reply
Peter Lammers October 7, 2015 - 2:53 pm

please use an IF/Else to avoid error messages on empty variables:

if ($vmswithcpualarms -ne $null)
{$cpuhoggingVMs=get-view -property name -id ($vmswithcpualarms | %{$_.Entity})
$result=Get-Stat -Entity ($cpuhoggingVMS|%{$_.name}) -Start (Get-Date).AddDays($days) -Finish (get-date) -Stat ‘cpu.usage.average’ -IntervalMins 120
$reportVMcpu=$result | select Entity, value | Group-Object -Property entity | % {$temp=$_; $temp.group | Measure-Object -Property value -average | select @{n=’Average CPU% usage’;e={[math]::round($_.average,3)}}, @{n=’VM Name’;e={$temp.name} }} | Sort-Object -Propert ‘Average CPU% usage’ -Descending}
else {Write-Host “Text for informing that there are no vm’s found that are guilty of hogging your CPU”}
$reportVMcpu

Reply

Leave a Reply

Chinese (Simplified)EnglishFrenchGermanHindiPolishSpanish