Some tips on fixing Warning – Reverse DNS does not match the SMTP Banner

 

 

I have a pretty common error that I get asked about pretty frequently. I wanted to take a moment to hopefully share some information on what the error is, what to focus on, and what tools you need to fix and monitor.

First of all, please understand this paper covers the simplest of scenarios. Multiple sites, Smart Hosts, Bridgeheads, and multiple Accepted Domains will quickly muddy the waters, but for a basic Exchange Server, This Article Applies directly.

 

The Error

Exchange Server 2013 SMTP banner does not match reverse lookup. or

Warning – Reverse DNS does not match the SMTP Banner

 

Disclaim

First be aware, there is a lot of misinformation out there. Stop and read and understand, before you decide which articles are telling you the truth. This error is likely to pop up in a few situations. I wanted to take a minute to clarify this message and what is needed to clear this up.

First you must understand this error  is directional and relative to a point in mail flow. So you really have to nail down your situation before you set out on solving the problem. You risk getting yourself more confused. Speaking of that, let me try to hopefully explain in a simple way.

First let me say the SMTP Banner is more generally a problem for outbound mail. You may still get an error for inbound connectors,  but mail will not usually fail either. Internal mail uses Internal banner (host) and DNS, and external mail uses External Banner and DNS.  An error comes about, generally where you have mail received across the public internet, where a reference is made to an internal FQDN in the SMTP Header.

Inbound Banner

So if you think you have an inbound banner issue, just go into your inbound mail connector, and then try to save it, without making changes. If there is a problem, you should get a pop up message similar to figure A

Figure A. Inbound Banner issues are identifiable

 

Exchange will promptly give you an error when your inbound connector has a banner issue. Why you ask? Because  the Banner is checked by Exchange, against the security settings.  Think of it like a security Guard. They always check you coming in, but once you have cleared security, it is not as difficult to leave.

So I won’t go into the explanation of inbound banners, except to say, by the time your mail hits this server, the lookup is internal, so the Banner should always be internal. In addition, you have a server, with a certificate, matching this FQDN, so it should make sense that these should all be the same name. Do what the error says and set the Banner to the Internal FQDN.

Outbound Banner

Outbound is really the same sort of thing, for any outbound Internal Connectors. Internal connector, Internal FQDN. The change comes when you have an outbound Internet connector. So this connector will be the banner for your reverse look ups by external recipients. That is, unless you have a third party device doing store and forward for you, in which case, you should be able to set the SMTP banner there as well. Assuming you don’t use a smart host, your Send connector header would look like this:

 

Figure B. Send Connector Scoping Tab.

 

This should make sense. You see this is the external facing send connector. Once mail leaves this connector, the mail will be called External Mail. From this point mail will have to rely on MX, DNS or a Smart host to propagate.

So.. What do you think gets queried for the reverse lookup? The mail server at the destination Is going to query public records it finds, against the header and other information it has received, when it looks your mail domain up. So the checks done include reverse lookup, Public MX record, A record, Text Record and SPF record. So all you need to do to is make sure these records contain your correct Public IP address for your Exchange server, the correct resolution of the  Banner to an IP address, and verify the other records contain the same Name and or IP addresses.

A light conversation

So now we get to brass tacks. So I want to focus you to the main things you would need to set correctly. This is:

  1. Public MX record -Domain.com resolves to target mail.domain.com at PUBLIC IP address
  2. An “A Record” that is the value of the Banner “Mail.domain.com”
  3. An “A record” for values for your setup like “auto-discover.domain.com”
  4. TXT or (PTR) record for your Reverse Lookup DNS record. One domain should be assigned to one PTR record- this is what should match the “send” banner
  5. SPF record. – . Special record with special format for Domain verification by Anti-Spam. SPF record tool will help generate your record

Tools you can use to make sure your records are correct:

  1. Install Dig on your client machine for windows- Dig -x Public IP (will find your PTR record)
  2. Dig domain.com will give you your “A” record.
  3. Dig mail.domain.com txt – will show your SPF record.
  4. Dig mx domain.com to query MX record, or Dig @nameserver.domain.com yourdomain.com

So with this Dig tool, you can check and cross check. If you have an IP address in this mix, that you are not aware of, or are not using, then you will need to fix this.

I am not going into too much detail here, but if you have all these records in place, and make sure they point to the public IP address, which sends the exchange server its mail, then you should be happy. Use the web site IPCHICKEN.COM on your Exchange Server. It will tell you your Public IP, normally used for Setting Public DNS records. For non-smart host or bridgehead customers, your value of IPCHICKEN, should be your Public IP values for these records.

In Closing

You have the public information you need to set records above. Set this correctly. Second, go to Exchange Server and set the FQDN correctly and you should no longer have SMTP banner failing to match the reverse lookup:

  • Send Connector Mail Flow -> Send Connector-> Scoping-> FQDN
  • Receive Connector  Mail Flow -> Send Connector-> Scoping-> FQDN

Make sure these FQDN matches its function. Internal connector is internal FQDN.

Send Connector is Public FQDN. Then make the Records match the correct public values and this issue will be resolved.

In closing Here are some tools you can use to troubleshoot:

Exchange Connectivity.

Dig Bind Tool

MX Tool Box

I hope this is helpful and explains what you are seeing, and how you can fix your SMTP banner issue.

Thank you,

 

Louis

 

 

 

Why Network Design is so Important for Hyper-V CSV Clusters. And Some Common Items to Check

I realize I have written several Hyper-V articles lately. They all come for the unique perspective of technical  support. I now see, I have been trying to put together some material to help understand how these new 2016 Clusters are from the 2012 Clusters from the 2008 Clusters.

So I want to express three clear goals in this paper. I want to define a list of items you may want to read, to make your Cluster as supportable as possible. Second, I simply want to speak to the Importance of your NIC hardware purchases, as it relates to the past and current Stance of Microsoft, with respect to Network requirements, of  Hyper-V cluster setups. Finally, I just summarize some of the command line setting you may look at if you don’t have the optimal NIC setup for a Cluster.

 

Disclaimers

 

      • Disclaimer (agent), While there are items in this discussion, that may not be good advice, for your particular Infrastructure, or your particular situation, I am writing this from a familiar perspective, that Admins, and Designers are approaching me from. Namely this is an “I want it all, and I want it all to work now!” type of design steps.
      •  Translation: (customer needs)Give me the settings that make the fullest use of my server, give me the most VMS, with the most possible resources, and I want to Live Migrate them all day long. Furthermore, I want to be able to host conference in an RDP session, on any one of my Cluster Nodes, and not have any problems.

 

Facts and Common issues in today’s and yesterday’s Hyper-V CSV Clusters

 

With this disclaimer in mind let us proceed. First things first. I am providing this list based on my 10 years with Hyper-V and Clustering, along with the reading and video information I have come across.:

 

      1. Never Use the (Hyper-V) shared Network Adapter as a NIC in the Host Server.
      2. Never Software Team with the NDIS driver Installed from your NIC vendor.
      3. Software teaming is fine for most workloads, unless you’re having latency problems. The answer is Hardware Teaming, or Vice Versa
      4. Don’t put SQL servers on your Hyper-V host.
      5. VM QUEUING can be a problem. Try your workloads with and without VM Queuing and see which works best for your situation.
      6. TCP Offload is not supported for Server 2012 Cluster Teams. Check the other settings here
      7. The Preferred software Team setting is Hyper-V PORT virtualized, switch Independent teaming is best. This is where we are at today. Remember, you had access to these statements in current documentation.
      8. If you Use the Multiplexer Driver as the Virtual Machine NIC, do not turn around and share that NIC with the Hyper-V Host. This is not pretty.
      9. Use Jumbo Frames, QOS etc., where you’re supposed to, according to the current guidelines
      10. Piggy Back off of #8 is that today’s Clusters with Hyper-V, are a balance of Isolation and Bandwidth networks. There is no hard fast rule on how many Network adapters you need.
      11. You cannot just say a node is too slow or fast. When you first install the server, you need to perform clearly laid out, baseline testing, where similar results can be obtained for your server, in pristine condition, with no other workloads. The Same is true for Virtual Machines.
      12. You cannot run a Hyper-V CSV cluster, with all your NICS in the TEAM. You need at least 3 networks in any version of Hyper-V. This is
        1. Cluster Only (Cluster Communication)
        2. Cluster and Client (Management etc.…)
        3. No Cluster Communication. (ISCSI)
      13. Run Cluster Validation- If your updates don’t match across your cluster, you need to get all your nodes to match, before the cluster will work properly.
      14. Clustering only recognizes one NIC per sub-net, when you add multiple NICS to cluster
      15. Back up applications and Antivirus, may have compatibility issues, disable both, and see if the issues disappear.
      16. Network Considerations
      • The Binding Order and DNS must match your current MS documentation. DO not miss this.
      • Cluster Setup now adds rules to the firewall automatically. If you are using Symantec Endpoint. These Firewall rules can serve as your port list to add to Symantec firewall.
      • You can Now Sysprep with Cluster Role installed now for Server 2012.
      • Your NETFT is enabled at the physical NIC, Where you find your IPV4 properties.  Do not Disable it.
      •  So if you are setting up converged clusters, you now have to rely on cluster validation to tell you, if you have enough networks to effectively set up your cluster. Resolve any of the network issues here in validation
      • CSV traffic includes Metadata Updates and Live migration data, as well as failure recovery (IE no storage connectivity) you cannot break this traffic into isolated streams.
      • CSV needs NTLM and SMB- don’t disable either.
      • ISCI Teams now work with MPIO and Jumbo Frames. Jumbo Needed for ISCSI
      • Using multiple Nic Brands Is now preferred.
      1. This series of Articles covers topics I did not go into In Depth. Topics Include:
        1. Mapping the OSI model
        2. VLANS
        3. IP routing
        4. Link Aggregation and Teaming
        5. DNS
        6. Ports, Sockets and Applications
        7. Bindings
        8. Load Balancing, Algorithms

 

 

Cluster nodes are multi-homed systems.  Network priority affects DNS Client for outbound network connectivity.  Network adapters used for client communication should be at the top in the binding order.  Non-routed networks can be placed at lower priority.  In Windows Server 2012/2012R2, the Cluster Network Driver (NETFT.SYS) adapter is automatically placed at the bottom in the binding order list

 

Network Evolution and common sense Network needs

This section is really addressing how we build clusters today. For example, See recently how I wrote a paper on using the old Isolation rules for a simple 2016 cluster, based off the old method of deployment. This method is elegant and works well, will little maintenance needed.

For 2012, and forward, we have the new design which is detailed in the Tech net article, “Network Recommendations for a Hyper-V Cluster in Windows Server 2012”. In this paper, it  Includes the modern setup, using the a software team, and scripted Network Isolation

This paper interleaves these two philosophies, at least that was the intent or message, you are always using one or the other as a guiding principle. Insofar as you have the technical reasoning to do so. what I mean is, if you have 10GB nics, you may fully move to the 2012 method. If you have like 3 1GB nics, you are leaning on the 2008 article to explain to the customer why live migration would not work properly.

Get logging information for Hyper-V and clustering from this article

The quick history of the CSV cluster as follows:

2008

Heartbeats/Intra Cluster Communications -in some documentation  (1GB)

CSV I/O Redirection  (1GB)

VM Network (1GB)

Cluster Network (1GB)

Management Network (1GB)

ISCSI Network – (1GB)

 

2012 and 2016

HeartbeatsNetwork Health monitoring in some documentation (QOS IMPORTANT)  (10GB)

Intra Cluster Communications (QOS IMPORTANT)  (10GB)

CSV I/O Redirection (Bandwidth Important)  (10GB)

ISCSI Network – Not registered in DNS (10GB)

 

This is where you can clearly see how new clusters in 2016, just don’t have the same specifications. The recommendation here is to adjust the Cluster Networks, by the number of network adapters, and what the throughput is. If the NIC setup looks like the 2008 cluster, then apply 2008 network setup guidelines. If the cluster has 2 or more 10GB nics, then treat it as the newer 2016 logic. This has worked well for me for some time now. This will ensure that you get the best Isolation and throughput for your customer.

So as you can see, the Number of NICS is going down, but the NIC SPEED is going up. To make matters more difficult, Microsoft Now states that to be optimized, a CSV cluster will have a combination of Isolation and Bandwidth. They are no longer able to lean on the hard 5 to 7 NIC requirement that once was the norm. For proof of this, you will need to watch this video entitled. ” Fail over Cluster Networking essentials. ”

So really, Support may not be giving you a great explanation as to why your CSV cluster is slow. It is really closely related to the Network Design. Does your Network look more like a 2008 cluster, or a 2012 or 16 cluster? This will give you justification as to why a cluster would be slow or fast.

Server 2012 Requirements are here, along with a basic script for Embedded Teams

In addition to the script above, you also have control over the heartbeat, and other things like priority of the various Cluster NICS and timeouts.

 

Settings you may look at to change if needed

The rest of this article, just shows you some config settings, if you find you have to make a manual change. With a 2016 cluster, they are saying its all automatic, and should not be changed.

While you can make changes to the following. The recommendation is to leave the settings alone. The automatic settings should adjust to the proper, situational network changes:

 

Configure Cluster Heart Beating

 

(Get-Cluster). SameSubnetDelay = 2

 

The Above command Is an example of how you set the following variables. They are posted below with their Default values

 

  • SameSubnetDelay (1 Second)
  • SameSubnetThreshold (5 heartbeats)
  • CrossSubnetDelay  (1 Second)
  • CrossSubnetThreshold (5 heartbeats)

The above setting Is for regular clustering. For Hyper-V clustering, consider the following defaults

  • SameSubnetThreshold (10 heartbeats)
  • CrossSubnetThreshold (20 heartbeats)

If you go more than 10 to 20 on these two settings, the documentations says the overhead starts to interfere, more than the benefit. FYI.

This Step Below is only for allowing the creation of the cluster on a Slow network. Set the value of SetHeartbeatThresholdOnClusterCreate  to 10, for a value of 10 seconds.

HKLM\SYSTEM\CurrentControlSet\Services\ClusSvc\Parameters
add DWORD value SetHeartbeatThresholdOnClusterCreate

 

Configure Full Mesh HeartBeat

(Get-Cluster). PlumbAllCrossSubnetRoutes = 1

 

Other Important changes to change Cluster Setup Parameters

Please Be advised, All the following syntax, has been duplicated from this publicly available Microsoft Article:

Change Cluster Network Roles ( 0=no cluster, 1=Only cluster communication, 3=Client and Cluster Communication)

  • (Get-ClusterNetwork “Cluster Network 1”). Role =3
  • Get-ClusterNetwork | ft Name, Metric, AutoMetric, Role
  • ( Get-ClusterNetwork “Cluster Network 1” ).Metric = 900
  • ( Get-ClusterNetwork “Cluster Network 1” ).AutoMetric = $true

Set Quality of Service Policies (values 0-6) ( Must be enabled on all the nodes in the cluster and the physical network switch)

  • New-NetQosPolicy “Cluster”-Cluster –Priority 6
  • New-NetQosPolicy “SMB” –SMB –Priority 5
  • New-NetQosPolicy “Live Migration” –LiveMigration –Priority 3

 

Set Bandwidth policy (relative minimum bandwidth policy) (It is recommended to configure Relative Minimum Bandwidth SMB policy on CSV deployments)

  • New-NetQosPolicy “Cluster” –Cluster –MinBandwidthWeightAction 30
  • New-NetQosPolicy “Live Migration” –LiveMigration –MinBandwidthWeightAction 20
  • New-NetQosPolicy “SMB” –SMB –MinBandwidthWeightAction 50

 

If you need to add a Hyper-V replica

  • Add-VMNetworkAdapter –ManagementOS –Name “Replica” –SwitchName “TeamSwitch”
    Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Replica” –Access –VlanId 17
    Set-VMNetworkAdapter -ManagementOS -Name “Replica” -VmqWeight 80 -MinimumBandwidthWeight 10
    # If the host is clustered – configure the cluster name and role
    * (Get-ClusterNetwork | Where-Object {$_.Address -eq “10.0.17.0”}).Name = “Replica”
    *(Get-ClusterNetwork -Name “Replica”).Role = 3

From <https://technet.microsoft.com/en-us/library/dn550728(v=ws.11).aspx>

Configure Live Migration Network

  • # Configure the live migration network
    Get-ClusterResourceType -Name “Virtual Machine” | Set-ClusterParameter -Name MigrationExcludeNetworks -Value ([String]::Join(“;”,(Get-ClusterNetwork | Where-Object {$_.Name -ne “Migration_Network”}).ID))

From <https://technet.microsoft.com/en-us/library/dn550728(v=ws.11).aspx>

 

Other Commands

  • Enable VM team Set-VMNetworkAdapter -VMName <VMname> -AllowTeaming On
  • Restrict SMB – New-SmbMultichannelConstraint -ServerName “FileServer1” -InterfaceAlias “SMB1”, “SMB2”, “SMB3”, “SMB4”

 

 

Creating or migrating the CMS of a SQL 2014 Always on Availability Group for Skype for Business 2015.

Creating or migrating the CMS of a SQL 2014 Always on Availability Group for Skype for Business 2015

 

I think one you get done with your deployment, you get to sit back and Enjoy the Honey of your Efforts. But If you are Migrating a SQL Always On Availability Group, See Below for some steps!

 

 

Basic Bullets steps

 

  1. Review Prerequisites
  2. Don’t try this on Lync 2013. This must be done using A fully patched SFB Fe Servers, as well as Fully Patched Server 2012 R2 Machine.
  3. So point # 2 is the reason for this article. The word is, Lync 2013 is not supported, but it works. Below you may see the gotcha, and be able to set it up.
  4. Install the Clustering Role on both SQL Servrs with Add-WindowsFeature Net-Framework-Core, Failover-Clustering, RSAT-Clustering-Mgmt,RSAT-Clustering-PowerShell -Source d:\sources\sxs
  5. Do not try to use SQL 2016. Only use SQL 2014 with SFB 2015 and Server 2012 R2-
  6. Test-Cluster -Node SQL1, SQL2 and make sure you have pre-requisites correct.

Follow the documentation of your choice to install your Skype for Business on SQL, but please review the notes from my friend in the Field, Timothy E Boudin. His notes may come in helpful, if you’re facing the move, and see no documentation. This is what he ran across, and brought to me. Thank you to Tim , for bringing this issue up so I could write about it:

First Find your Links for the Job

Skype

SQL

 

So to summarize, you should basically move the CMS when you do the SQL always on Availability group in my opinion. Otherwise, you are going to have to come back later, and do the second part of this cold. This is just an opinion. Below, Please Find Tim’s, Comments on his Move of the CMS with SQL Always on and Lync 2013, and SFB 2015 Migration.

 

Skype for Business setup using SQL always on Clustering

This configuration requires some unique settings for the build of the Skype for Business (SFB) support when dealing with the issue of supporting the CMS migration from Lync 2013 or previous versions to SFB 2015.

 

For this discussion the servers are as follows

Enterprise Front End Servers (3)

FE1, FE2, FE3

Always on SQL Cluster servers (2)

SQLNode1, SQLNode2, SQL Listener

During the process of defining a new Enterprise Front End pool, you provide the names of the Servers that are members of the pool as normal, but when providing the new Back end servers information you have to handle this in a specific way to be able to support fail-over.

Create the new SQL Server Store as follows

 

Fill in the name of the SQL Listener in the SQL Server FQDN field

 

Provide the name of the SQL Instance if your using one.

Check the High Availability Settings option

Select SQL Always-on Availability Groups

Fill in the name of the SQL Node1 server

Click OK and complete the Front End pool Wizard.  When you Publish the changes to topology you should be prompted to provide details on how you want to establish the tables for the Cluster.  If this is setup by a DBA check with them on location of the Data and Log files.

Once publishing is completed, go back and edit the SQL Store settings and change the SQLNode1 setting to SQLNode2 server name.

 

Publish the change and then do an Install or upgrade a Database

Once publishing is completed, go back and edit the SQL Store settings and change the SQLNode1 setting to the SQL Listener server name.  Now publish a finial time.  There is no need to do an Install Database for this change.

 

Once the finial publish is complete you should be able to start services on the Front End Pool servers.

Move the CMS

To move the CMS from the previous location to the Front End pool using the Always-on Cluster use the following process.

  • Stop Services on the Front End Pool to be hosting the new CMS location
  • Open the Lync Power-Shell and use the following commands:

Stop-cswindowsservice

  • Backup the CMS to a file with the following

Export-CsConfiguration –filename c:\media\cmsbackup.zip

 Create the new Tables for the CMS on the new Front End pool specifying the node1 of the cluster, when you specify the database paths they are listed by Log file location first and then by Data file location.

Install-CsDatabase –CentralManagementDatabase –SQLServerFQDN SQLNode1.contoso.com -databasepaths g:\RTC_logs,f:\RTC_data -sqlinstancename RTC –verbose

 Once the tables have been created, have the DBA verify Mirroring between the nodes is in place.

 Enable-cscomputer

Get-CsManagementStoreReplicationStatus

If replication is good then move the CMS to new server by using the Move command on the Front end server in the target pool.

Move-CsManagementServer

 

Once the move is complete, allow for server replication and then Run local Setup in the deployment wizard on all affected Front end servers and reboot them and Monitor replication

Get-csmanagementconnection

Get-CsService –CentralManagement

 

I hope this is helpful Documentation, if you have to face this situation.

 

Thank you,

 

Louis

Use a Baseline Database Generator Script for reviewing performance of SQL Instance

Use a Baseline Database Generator Script for reviewing performance of SQL Instance

For anyone trying to troubleshoot a Slow SQL server, I wanted to come up with a test that will take the SQL issue and generalize it. Why does this need to be generalized? I have found that a customer or a support team may introduce a bias in all aspects of the tests. Begin with the Data. Data is impossible to to show a unique result. You may say this database does not go as fast as my favorite one, on a separate server. you cannot accurately prove one server is faster or slower then another server. Why?; for a basic Idea, take look at another case, where I lay out some basic testing tenets to go by.I will re-state them here. They sound like car rules, but they are universal testing rules you can apply to any situation.

From Car Rules to Computers

  1. The performance should be documented and repeatable.
  2. More than one test should be run, and simple is usually more realistic.
  3. Tests should be standardized, down to a science, so that if applied to another matching scenario, you would expect similar results.
  4. Keep the time down to a short test. The longer the test, the more variables can be introduced.
  5. Do not focus on two separate car models not functioning the same, find a way to introduce a baseline into what a reasonable car will perform like. Then prove or disprove your baseline.

In order to get a good unbiased test result for SQL, I came of with a dynamically created SQL database, that gets created once. Once Created, you can run some test on this standardize database, and compare with results on, say your laptop, or another machine, where your processor, Memory and disk resources are similar. All you have to do is to follow the method. One simply must not use ones own data.

Read on-Grab the download from here or the top of the page.

The SQL Baseline for Customers who report server A is slow then Server B.

Disclaimer

When a customer claims that one machine is slower than the other, there is always the possibility the customer has an actual baseline. However, when they say one is slower than another, this usually indicates they don’t know what a baseline is.

A Baseline is a collection of metrics, about the server, when it is installed at Greenfield time. When the Server is first Deployed with SQL, a baseline should be taken. Then, future claims as to a slow server, should be taken against itself; not another server.

When a person wants to compare two servers, this is almost an impossible ask. It’s like asking us to compare why two people do not complete a personality test in a similar way. From a support standpoint, it is a fruitless pursuit, and often creates a bad CE, in trying to fulfill their request.

The goal of this process, is to give Support and the customer a way to meet on common ground. The customer claim that the server is slow may as well be translated into, The Data on my servers does not match!! And they are correct. And we don’t support data. The Key word is Data. This question of “SLOW-ER” pulls us into the customers’ data sets.

This process gives us a way to use our own data set. The advantage cannot be understated. We will be telling them one machine is slower or it’s not.

Accepting that generally one machine is slower, do not underestimate this result, as the customer re-introduces his production elements. If the Baseline test show a machine 20% slower, then any difference, more than 20%, will be due to specific workloads introduced by the customer. All of the SQL Subject matter experts have known this, but we all spend weeks trying to find the leverage to prove it. Without an “absolute”, we could not substantiate that claim. This caused these cases to last for months. This method below, should cut these cases into a two day case, at most.

 

The Process

 

In the following test for SQL you will see four files, which compose of a method of base-lining SQL performance, without using Bias Data from the Customer, or a third party company. This Test Is devoid of the implication of using cashing or indexing, so it is a perfectly simple test to illustrate capabilities between two machines.

The reason this test has been devised is due to customer demand. Customers often ask us to compare two adjacent machines. Often times these comparisons can only be done using apples to oranges methods. Often times, these cases end up being a point of contention for the customer and for support teams. The goal of this test is to mitigate that disparity.

 

Here are the files you will need

Figure 1. Files you will need

 

SupportBaseline.xlsx

The Excel Spread sheet is to be filled out and returned to Support. We keep a master copy of this spread to monitor the scripts performance against a variety of machines and situations. Over time, we will have a database of how this script performs, on average, across a multitude of platforms. And the simple measure we are obtaining, is time. How long does it take the baseline query to complete?

 

Test Parameters

The Results of the script, should answer the question, is my “server” really slower than average, or slower than another server? In order to do this, strict adherence to rules must occur. This test must be run, with all other operations terminated on the SQL server. There should be no Antivirus running, there should be no other applications running. Other than a baseline Windows machine, with core applications and services running, the server should be running SQL with no client connections. In other words the SQL machine needs to be out of production.  There are columns In SupportBaseline.xlsx, but it will be noted in the analysis, that the machine was in production, and the results may not reflect a true baseline.

Several baseline runs can be collected with the single variable as the total number of rows, this script will create. The default is set to 1 million. The recommendation is a million rows, on average. However, depending on how powerful the server is, or how much down time you are allowed, you can adjust this variable to fit into your needs CreateSupportTest.sql is the file where this change is made see below.

 

Figure 2. Where to adjust how long the script will run

How long do I run the initial test?

As a general rule, 1 million rows should take less than 15 minutes on a reasonable SQL server. However Performance degrades fast. For example, A SQL VM with only 3 GB of ram, will take 121 minutes to run the query. So the first run should be 100,000. Then multiply the length of time it takes to complete by 10.

This is how long a million rows should take to complete. You can judge how many rows you should choose, depending on the amount of time you want the query to take to complete.

Process

  1. Determine how long you want to run the query. Follow How long do I run the initial test?
  2. Set the value of the # of Rows. Follow Test Parameters
  3. Record the initial values of the server in the XL spread SupportBaseline.xlsx
  4. Run the Query named CreateSupportTest.sql here is a how to if you need it
  5. Record the results in the Excel spread sheet SupportBaseline.xlsx. use start and stop time and it will auto populate the time of execution
  6. Repeat as necessary, populating the spread sheet, and returning to Louis Reeves in Support support. He is keeping the overall list of how the query runs in several different scenarios and can give you more information about how your query results compare to other machines running the same query
  7. When you are finished testing a server, there are two scripts that are cleanup scripts. Run DropSupportTest.SQL. here is a how to if you need it
  8. Then run DropSupportMaster.SQL. here is a how to if you need it

That’s it, Now you can complicate things, by running things like Diskspd against these machines, but, it will be best to just keep it simple and stay with the program laid out. If you desire to look at diskspd, go ahead and read The Fallcy of Performance or; Are you bringing your Support Agent Apples or Oranges? This will help you the plan for running Diskspd commands. So here you really have two ways to testing the claim of a SLOW:

 

I hope this series of articles is helpful in troublsehooting issues with model data.

Louis.

 

 

 

 

 

The Fallacy of Performance or; Are you bringing your Support Agent Apples or Oranges? VM Virtualization Performance with DiskSpd.

The Fallacy of performance.

I don’t think you don’t already know this. My experience does tell me that we all group things together naturally and sometimes the performance issues we find, are really assertions made, with one piece of evidence. This performance claim is generally hard for a support agent to frame. Not that your case wont be worked, on. It will. It just it may take support teams hours or weeks to get to the truth of your statements.

I have to write this because it is so prevalent. When someone calls me and they want to open a support case, I generally try to standardize the case to some truthful statements, which I can prove, disprove, or alter.

However one such case type, that does not fit into such neat lines is the Virtualization of Performance case. Rather than describe in computer terms, let’s use American Auto Makers Ford and Chevy.

Baselines Matter

I Studied Ford and Chevy Specifications for months. I know the performance characteristics of each very well. Let’s even say I own a Chevy, and I am now looking for a second car of equal specifications. Let’s even say I own a Ford and am looking for a second ford. Then I purchase that second car, and on the way home, I find that the one car does not seem to meet the specifications of the first!

Of course I must call the car company and make the complaint that one car is not as fast as the other, or not as quick to brake, or some other specification. How about this; the air is not as cold in one as it is in the other!! This is what happens in Performance calls.

Just so you know we technically should not even entertain these types of questions. But in support we do, to some degree, because we want to help, and were not sure what your showing us yet. We don’t see the pieces for several calls.  You can’t force it, it just takes time.  This is because you are asking us to form a relationship to ideas that are not related. Two cars are not related, two computers are not related.

 

How about taking a cross country trip? One car crossed the country in three days. The other took 4. At this point you may be seeing my point. Trying to get down to difference in different items, of any type can be like comparing apples to oranges. What’s even worse is when one of the items is off limits. So My Ford is definitely slower, but my wife takes the other ford to work so we really can’t use that one for testing! Now what do we do?

Basic Performance assumptions

So there are some simple rules that you should apply to any performance problem:

 

  1. The performance should be documented and repeatable.
  2. More than one test should be run, and simple is usually more realistic.
  3. Tests should be standardized, down to a science, so that if applied to another matching scenario, you would expect similar results.
  4. Keep the time down to a short test. The longer the test, the more variables can be introduced.
  5. Do not focus on two separate car models not functioning the same, find a way to introduce a baseline into what a reasonable car will perform like. Then prove or disprove your baseline.

 

Now obviously, the complexity of computers can result in more rules, but if you follow these basics, you can at least find some sanity in your test results. In fact Support has an absolute need, that this happens. It is very possible, nothing is really, wrong, if we don’t get down to brass tacks.

So real world

Call into support and report when you run this command on one machine, things are fine. When you run on the other environment, things failed. This is the Disk Speed command. This is the replacement for SQLIO. I really like this tool.

 

  • diskspd –b8K –d30 –o4 –t8 –h –r –w25 –L –Z1G –c20G  D:\iotest.dat > DiskSpeedResults.txt

 

However, what is hiding in this statement, violates all the 5 rules above. This is an assertion, based on one command. Furthermore, you ran this command, in the test location,  over and over, while other VMS are also running, randomly, creating a random pattern of storage fragmentation, while the Production environment was only one once, in a very controlled situation. These commands were not run in a scientific fashion.

It literally took me a days to think of a way to baseline this situation and to  test this correctly. This is where the 5 rules came from. I think they are solid rules for support to go by. So here is how you test to make your case to Support:

User Guide and Product here

Introduce a Baseline. Anything is better then nothing

The Above diskspd command is complex and long. Come up with some simple tests and run more of them, over time. Second, test your commands, on a laptop, or desktop, with a specific Ram, Storage, and Processor Profile. Once you record all the results on the client machine, duplicate the test in the Virtual Machine. Make sure it’s the only Virtual Machine Running. Make sure nothing is running on the Host but this one VM, with specific resources.

Now below I am not giving you results. I am just giving you the commands, Along with some Instructions on how to use DSKSPD. I am also leaving you with Articles that VMware and Microsoft Hyper-V use, when asking for baseline testing. Notice, how many little requirements they have Seem familiar? There is a reason for this! We are all trying to be scientific.

Tests to establish a Baseline.

  • .\diskspd -c100M -d20 c:\test1 d:\test2
  • .\diskspd -c2G -b4K -F8 -r -o32 -W30 -d30 -Sh d:\testfile.dat
  • .\diskspd -t1 -o1 -s8k -b8k -Sh -w100 -g80 c:\test1.dat d:\test2.dat
  • .\diskspd.exe -c5G -d60 -r -w90 -t8 -o8 -b8K -h -L
  • .\diskspd.exe -c10G -d10 -r -w0 -t8 -o8 -b8K -h -L d:\testfile.dat
  • .\scriptname.ps1
  • Same as above- second location
  • .\Diskspd -b8k -d30 -Sh  -o8 -t8  -w20 -c2G d:\iotest.dat

 

This list will generate about 15 unique results. Any of these will run on a laptop or a server. Just make sure you read the text character decoder sheet available with the product.

So the instructions are very simple. The specs on the Hyper-V or VMware VM, must be the same as the laptop. My laptop has 16GB of ram, and 8 Processors.

The VM must be the only one running, and the OS should be a fresh Install. Now if the results of testing are in the ballpark of your comparison Client, then you are not having a performance issue.

The moral of the story is test from different perspectives, and use the Scientific method, as much as you are able to.

I hope this is helpful in your troubleshooting.

 

A few other Details

Here is a way to manually pre-create the files if disired

  • fsutil file createnew d:\iotest.dat 20000000
  • fsutil file createnew d:\iotest.dat 2000000000
  • fsutil file createnew d:\iotest.dat 20000000000

Here is all of the best articles on storage, and IO online right now. I was surprised that so many of  Storage Performance Needs are all in one place.

This could be an important point. If you came to this site because your numbers are not matching reality, your Monitoring tools, may not be collecting the right perfmon numbers, then you may need the Hyper-V performance script to use to see your actual VM numbers. try using this tool Run this tool on the host, while using diskspd on your VM.

DO not run more then one instance of Diskspd at once!! This will invalidate your tests!

 

Finally, as promised, here is how  VMware or Microsoft  handle these issues:

 

Louis

Microsoft Licensing Issues may Require a Tool Called MGADiag or the Web Version of Genuine Microsoft Advantage!

Good day,

 

A friend of mine hit me up looking for an application called MGADiag. Wow what an old tool!! But yes, I still have a copy.

I sent it to him. After review, I decided not to include a copy to this article. I did post a link to it, but i want to let people know

they should really look a the new web based tool.(Here) The new tool does not do the same thing, at first glance. but on further review, the

web page relies on a plugin, which seemingly collects similar information.

We don’t have much choice on newer Systems. Just look at figure 1. MGADIAG doesn’t work too well with Windows 10.  Other tabs have worse errors, but some tabs work OK.  .

 

Figure 1.

Reason for most failures

Hey, In its time, this was a great tool. It collected a lot of the licensing information, all in a few tabs. Great for Windows XP and maybe Vista and Windows 7. Beyond that, let the caveat emptor!

It looks like MGADiag was retired for a reason. So my evaluation of MGADiag in 2017 is its in need of a re-vamp

Well the revamp is basically this article and the web tool. here.

Basically there is a 90% chance your activation issue is captured in this simple document ( here). Did you activate the key from this server already? One License equals one machine. These are basic Tenants, that will tell you if you are genuine or not.

So My Licensing is really Broke

If everything checks out against the article, then go ahead and check licensing with the Genuine Advantage Web tool. If that checks out, then you can check with MS.

What to do about Activation (MS)

If you end up with MS, Microsoft, where you are not in the wrong, you should use standard procedures to get activation and license issues fixed.  So lets get that out of the way.

So you can just do the right thing. Just go to a command line and type  SLUI 3 or SLUI 4. This is command line activation and Voice Activation. When the Voice Activation answers the phone, you will be made to explain why you think your license is genuine. This may require a supervisor. Only the supervisors apparently have the information to make a determination, or perhaps, only the supervisors have the ability to fix a wrong determination in their job description.

So here is the thing. Microsoft can see all license keys. they can see if they are activated. So your story needs to explain why a license would be activated in their system. If your story makes sense, and you don’t mind a deactivation to activate your machine, or multiple activation’s are justified for the key you are using, they will generally make good on your license. .

To conclude, we have MGADiag, Genuine Advantage website, and we have SLUI as tools we can use to work on licensing issues.

But Be careful! MGADiag is no longer a public MS download

For example, look at this link, its wrapped in another application completely! Beware!

Here is the Newer tool, in all its glory:

 

and Finally, Another Blog example of the basic activation steps as they have not changed much over the years

 

Louis Reeves

Windows Performance Recorder, Xperf123 and CLUE all collect ETW traces for use with Windows Performance analyzer!

Good Evening.

I wanted to make a quick article for those Support cases where I need to perform an analysis on the issue, in a way that will allow me to see the Data Set in the most Creative way possible.

I do think you will prefer the Graphical Interface method of doing this , but  the site where it is hosted is going to close down at some point. So I will be attaching a link to the download, in case it becomes a lost web site.

Actually there are a couple of tools we should be aware of. So this article is about Ways to Use Xperf to collect logs for support evaluations.  Specifically, I am calling out three ways; the command line, The Core Windows Recorder, and two additional tools.

XPERF

So all of these tools will require xperf to be installed. this is part of the Windows performance tool kit. this also contains windows performance recorder and analyzer. The truth is you can just run the windows performance recorder, and this will achieve the objective of this article. But, you cant just let the recorder run, in perpetuity. There is a hit to the system for running it, and it will eventually fill up your hard drive.

Not that these other tools have methods which are any better. The main thing you need to know, is  you must monitor resources and know when to start and stop these tools your self. They can be dangerous if not used by an IT person with experience. The bottom line is use caution!

 

Command Line

For the command line options, I am just going to show you how to start,stop and obtain the log (ETW) file. You will then return the file to the support department, and they can give you an analysis.

Some example of commands which will result in a file you can give to your support team:

  • Start Trace
    • Xperf –on DiagEasy
    • Xperf –providers KG
  • Stop trace (and generate ETW)
    • Xperf –d trace.etl
  • display trace
    • Xperf trace.etl

Well That was easy wasn’t it? Well the rest of this is not too much harder. This is some complex stuff, but we want to make it easy to collect if possible. So the next tool on the block is the Windows Performance Recorder. I will not even spend any article time on this. This is the simple Next Next Finish windows method. You can do some searching on the internet if you need a few screen shots.

Now to the meat of the show. Two Tools I think you may find helpful. XPerf123 and the Clue tool. Clue is Collection of logs and the User Experience. Xperf123 is at codeplex, but they say codeplex is closing. I will not include links to their site. I will have a copy of the tool in this article. Xperf123 Download

So we are starting with XPERF 123. This is the tool on CodePlex. Download it here- Xperf123 Download

So this tool will let you form the syntax of a shell command to start and stop a log collection. It allows for all the variables you would ant like circular logging etc..

The basic article that was on the codeplex sight is below, for your convenience. This is in case the codeplex data is gone from the internet:

XPERF123

Project Description
This tool is used to automate the process of collecting xperf traces easy without the user worring about the various settings and configuration options.

UPDATE
The tool does not package XPerf.exe, perfctrl.dll, xbootmgr.exe, xbootmgrSleep.exe or xperfview.exe. Please download the Windows Performance Toolkit separately from http://msdn.microsoft.com/en-us/performance/cc752957 and then run this tool from the same location as the files.

Why this tool?
Collecting ETW traces was never this easy. With this new utility, xperf/xbootmgr logs can be collected without breaking a sweat. Just a few clicks and the required data gets collected. You no longer need to enter complicated commands to collect the data. Just select the kind of data/monitoring you desire and XPerf123 is going to get that data for you just like 1 – 2 – 3.
It also creates a simultaneous perfmon running at 5 seconds interval.

System Requirements
.NET Framework 3.0
Administrator rights on the machine.
Windows 2003/Windows Vista/Windows 7/Windows Server 2008/Windows 2008R2.

So how do I use it????
1. Follow the wizard interface of the tool.
2. From the drop down menu, select the kind of trace you want to capture.
3. Click on Start button.
4. Reproduce the issue.
5. Click on Stop button.
6. The file is will be created in the same location as the XPerf123.exe

Main features
– In Normal mode, the default paramaters for BufferSize, MinBuffers and MaxBuffers is 1024.
– It can be customized for advanced settings.
– There is option to have log the trace file in circular mode which is enabled by default. If required, it can be unchecked.
– Logs are created in the same directory by default.
– We can also save the logs to a different location then from the location where we run it from.
– It also creates a perfmon counter and starts it when we start the xperf capture.
– If Perfmon was also collected, the Perfmon logs are located in the C:\PerfLogs\ directory with the name perflognnnnnn.blg
– If we select stack walk, then the default stack walks for the respective traces will be enabled unless the user manually selects the stackwalk parameters. This is benifical for someone who wants to do stack tracing but doesn’t know what all the options to select for stack walk.
– The creation of the registry and the reboot prompts for stack walks have been automated. In the next build, I will try to log that information as well to the log file so that we know what registries were modified or created.
– Advanced options in the xbootmgr parameters to set the Buffer Options and the Enable Property .
– The Pool Trace will only work if we are using a version of xperf that supports the feature.

What do I need to get started
We need to have all the files in the same directory as xperf123.exe –
XPerf.exe
perfctrl.dll
xbootmgr.exe
xbootmgrSleep.exe
xperf.exe

Unless necessary, the General option should be able to get all the required information.
The program is designed to auto elevate, but if not getting the required results, please try running it as an administrator.
For reviewing XPerf logs, we need the xperfview.exe.

1.png
Starting up the Xperf123.exe

2.png
Select the kind of data collection you need

3.png
Enable Perfmon logging ( If you want )

4.png
And we are done. Click Start to start the capture

CLUE TOOL
Now we have one tool left. This is the newest I have seen. This tool will collect logs when there is a problem on the system. This could be a good tool to use under some circumstances.
This tool is the CLUE tool:

Clue stands for Collection of Logs and the User Experience. This tool is an automated way to collect the logs only when the issue is occurring. This is helpful, because the log collection itself can be part of a slowness or latency problem

 

Requirements for this tool:

 

  1. Download tool from – http://aka.ms/ClueTool
  2. Download and Install the Windows Performance Toolkit (WPT)
  3. Toolkit can be installed during setup. See the Clue Usage Guide.docx
  4. Right Click and choose properties of zip file. Choose unblock
  5. Unzip to long term location
  6. Run the Setup.bat file with Admin Rights

 

All features of the application will run out of C:\ProgramData\Clue  directory. If you need to run in a different directory, then change the config.xml file.

Output files will be located at \Microsoft\Windows\Clue\IncidentFolderManagement , again unless you specify otherwise in the config.xml file.

The bottom line is there are two things you want to check out. One is the scheduled Tasks, that start with CLUE_. Make sure they meet your needs as to when to collect data, and for how long.

Second is the config.xml file. You can set many things before the install, that saves you from making multiple changes after the install.

Below is what you will see in the scheduled tasks in Windows;

You will then see inside the CLUE folder, the tasks that you can change to meet your needs:
This is a great tool, in that you have some control over when and why the log collection runs. It can even survive a reboot. So this is a great tool, when
you dont know when the problem is going to occur.
To conclude, I have presented 4 ways you may get an ETL log collected and ready to send to your support person. If you have any issues, Call your support team and they should be able to help you out with it.
Windows Performance Recorder, Xperf, Xperf123, and Clue all try to do the same thing. However, it is our way of having many ways that makes us a great county!! Well Maybe a Great world, because I am certain the players in these tools are quite diverse. Indeed Hail Diversity! and Hail Molvania!
Louis