Getting Accurate Latency from Dynamically expanding Hyper-V virtual Machine Disks

 

This Article is about the tool called Hyper-V Performance Monitor Tool (PowerShell)

you can download it from the tech net article down the page, or use the link above.

So Hyper-V gets thrown around loosely these days, when you talk about Virtualization or Performance Tuning, or Planning or any other aspect of the product life-cycle of a new Host Deployment.

Over the last few years, we have made rapid changes from Physical Host Machines for production work loads, to these Virtual monstrosities, that now host our whole company.

Along with this change, you may recall that Early Hyper-V documentation has gently let us know that monitoring inside the Virtual Machine was not going to give us the parity results to the Physical counters, depending on configuration.  This is for a few reasons, which are beyond this articles scope. However, I would like to shine light on this, so more people can think differently about their Virtual performance.

The most common measure of how well a server is performing is Latency in Milliseconds. Every one is most concerned with how much latency is in the storage system. Perhaps with good reason. The San storage vendors can perform so fast now days, that you can throw the empire state building at a server, and the Latency is less then 10 Milliseconds (MS). Or is it throughput?

To be clear, we are interested more in Latency then throughput. Latency should be minimized, and throughput will generally increase.

Can I make a case that Counters are not reliable?

Well let me tell your actual Latency is not as easily attained as you would think. If I told you that your tests are lying to you, would you believe me? Lets say your in shock. Without knowing your design, my generalized answer has a very high chance of being correct.  The issue is that you have a SAN and you are tying to get latency by measuring responses to a file, that go though 4 different filters, and then has to wait until it gets queued to a disk subsystem, that is always expanding. I promise you; your numbers are incorrect.

If you cannot quantify how latency and throughput are different, but related, then I would say you should not stop reading.

Storage Latency of VM guests

There are many problems with calculating storage latency, but disk is the model we are going to use to illustrate how tricky it can be to find out how your VM is performing.

The most common approach to gaining latency information is to use a command line tool. Normally the tool will work fine. The model breaks down when the Disk itself is changing, along with the Ram and processor availability. The bottom line is a Virtual machine may lie to you about resource numbers at any given time.  Add to the mix, that the clock cycle is a weakness in any virtualization platform. That means that the calculation of time itself can result In poor results based on good math with bad numbers.

There are a crowd of you who say that is bull. Well all I can say is; don’t read this and good luck solving your latency issues.

Let me try to list some areas where the numbers may go awry. I am just making a one line explanation with a link, so you can read more. I don’t want this to be about the problems. Below, I talk more about the solution. Read more if you have a specific issue:

 

I could keep going. Do you get the feeling there are a ton of variables that change how storage latencies should be calculated?

From my experience, I have found that every set of servers are their own data set of network behavior. There are some basic assumptions I found to pass along to Admins who want to find out latency of Virtual machines.

Guidelines for VM latency Study

Who to Blame

So again, the basic message is that the calculation of latencies is totally based on the sum of the deployment factors. In one data center you may find under reporting, and the other Over Reporting. Support agents do not have the Onus to prove why one is slower then the other. We will have to look at your Design and deploy and try to make a story of things we can identify. it is not likely we will find that moment where the Deployment deviated from your Baseline storage latency measurements. We offer Best effort, but encourage you to strip down your deployment to make a core Baseline latency for a Dynamically expanding VM. All Vms will compare to that one. We go from there.

Using the Stop Gap solutions for Monitoring Virtual Machines

SO it was just a few years ago, this issue with VM monitoring was not easily remedied. You could certainly use the Perfmon counters to get VM stats. But Customers just want to run Disk Speed or SQLIO, and get an output to look at. This did not exist for quite some time. Thankfully there is a script out there, that will now carve out some parity to those tools. the link is at the TechNet Gallery:

Gallery.TechNet.Com

Hyper-V Performance Monitor Tool (PowerShell)

Below is the walk through of the basic performance collection.

you Just run the Script from an Admin PowerShell. There are a few ways to run it:

.\Monitor-HyperVGuestPerformance.ps1

### export data to csv via GUI, defaults to current dir
.\Monitor-HyperVGuestPerformance.ps1 -ExportToCsv

### retrieve data as PSobjects, great for parsing and logging, -name parameter is optional, defaults to automatic discovery
.\Monitor-HyperVGuestPerformance.ps1 -PSobjects

### specify host and interval/samples manually
.\Monitor-HyperVGuestPerformance.ps1  -Name host1,host2 -PSobjects -Interval 2 -MaxSamples 5
### accepts pipeline input
‘Host1′,’Host2’
| .\Monitor-HyperVGuestPerformance.ps1 -PSobjects

### Log to SQL server with Write-ObjectToSQL , this example uses SQL auth
.\Monitor-HyperVGuestPerformance.ps1 –PSobjects  |  Write-ObjectToSQL –TableName table –Database db -Server server –credential (get-credential)

 

image

If the domain connection fails, it tries for a Local connection:

 

image

In my case, I ran the tool on the Host, and this GUI below popped up. All I did was hit monitor, and I got an export vm_perfmon_stats file. This file can be used to find your latency.

Untitled

While this method may not be pretty, It does follow the rules for Hyper-V guest. The main purpose for this tool would be to use instead of SQLIO or DISK SPEED. tools like these should be used for hardware testing. A Hyper-V Server, running on ISCSI shared storage, with two VHDX files attached, is likely going to come back with Erroneous Latencies. This may not be perfect, but I do believe you will see a consistent result that is not a totally unbelievable number.

See I changed the Sample and interval:

Untitled

And I get a time-frame to wait for the test results:

image

Find the Link at the Microsoft TechNet Center. Thank you for taking the time to Read about Storage Performance for Hyper-V virtual Machines

I hope this helps in your Baseline Studies.

The result is a nice little Excel Display of the data, that I cleaned up a little with colors, to the Excel Fields.

 

image

Louis

Advertisements