SPECvirt_sc2010 VMware ESX 4.1 vs RHEL 5.5 KVM benchmarks: digging a little deeper…

2010 is quickly drawing to a close – both consumer interest in and implementation of virtualized computing infrastructures is high and only continuing to increase going into 2011.  Keeping pace with consumer interest is competition in the hypervisor arena, which is really heating up.  Vendors and customers love to create feature matrixes comparing the various hypervisor + central management solutions but nothing gets competitive blood pumping like good old benchmark scores.  The trouble is that benchmarks suites are worth little if they are either not vendor-neutral or unsupported, which until recently described the major virtualization benchmark suites available.

The non-profit Standard Performance Evaluation Corp (SPEC) has stepped up this year and released their vendor-neutral SPECvirt_sc2010 benchmark suite.  According to the press release on SPEC’s web site, “SPECvirt_sc2010 uses a realistic workload and SPEC’s proven performance- and power-measurement methodologies to enable vendors, user and researchers to compare system performance across multiple hardware, virtualization platforms, and applications.” —– Great!

In Q3 2010, Red Hat and IBM submitted SPECvirt_sc2010 benchmarks for the Kernel-based Virtual Machine (KVM) hypervisor running on RHEL5.5, but things really got interesting when in Q4 2010 VMware and HP submitted their own set of benchmarks for ESX 4.1.  Check out the benchmark scores for the two hypervisors in the table below – RHEL5.5 KVM got a SPECvirt_sc2010 score of 1169 while ESX 4.1 scored a 1221.

These scores are very close but doing the math, that comes out to an ESX 4.1 performance benchmark advantage of 4.44% over RHEL5.5 KVM.  I know that a number of people are going to simply compare these two numbers and jump to the conclusion, “SPECvirt_sc2010 benchmarks show that ESX beats KVM in overall performance,” but I would advise against this.  The purpose of this post is to dig just a bit deeper into both of these benchmarks – that 4.44% performance advantage isn’t nearly as clear-cut as some might think.

Before we go any further, keep in mind that virtualization places significant stress on the CPU, memory, network, and storage (whether local or high-speed shared) available to the host.  Any significant weakness in one of these four areas can act as a bottleneck and bring down overall performance.

The one thing that remains consistent between the HP and IBM system under test (SUT) servers used as ESX and KVM virtualization hosts, respectively, is the CPU – an Intel Xeon X5680 “Westmere” hexacore chip clocked at 3.33 GHz per core.  With two populated sockets per SUT, that brings us to 12 physical cores, or 24 logical hyper-threaded cores.

The hardware used in benchmarking the two different hypervisors differs from this point onward.  First let’s take a look at the IBM x3650M3 SUT used for the RHEL5.5 KVM benchmark:

Notice the amount and speed of RAM in the SUT – 144 GB clocked at 800 MHz.  It is important to note why the IBM x3650 M3 SUT memory is clocked at 800 MHz when PC3L-10600 ECC DDR3 1333 MHz modules were actually used.  The explanation has to do with the Westmere architecture, so take a look at the diagram below depicting the dual X5680 CPUs, along with each socket’s dedicated triple-channel memory bus.

For the IBM x3650 M3 SUT, all three RDIMM slots in each memory channel feeding into the CPU were populated with 8 GB PC3L-10600 DDR3 1333 MHz memory modules.  That comes out to 24 GB per channel, 72 GB per CPU socket, and 144 GB for the entire system. This memory configuration provided enough resources for the 72 VM load running on the IBM SUT but the tradeoff is that when all three RDIMM sockets are used in this Westmere system, the clock speed on the RAM is limited to 800 MHz.  This may not seem especially significant at the moment, but it is very important to keep in the mind the cost of the memory used in the SUT.  I did some quick price lookups and found that the price of a single 8 GB PC3L-10600 DDR3 1333 MHz RDIMM was about $450.  By my rough estimate the 144 GB required to fully load the IBM SUT used in the RHEL5.5 KVM benchmark would have cost about $8100.

Now let’s take a look at the HP DL380 G7 SUT used for the ESX 4.1 benchmark:

Not only was more RAM used in this server – 192 GB – but it is clocked higher at 1333 MHz.  Now that we have seen the memory configuration for the Westmere IBM SUT and understand why the 1333 MHz RAM was actually clocked at 800 MHz, it is definitely worth pointing out how HP and VMware got the larger memory configuration at the faster clock speed (spoiler: it didn’t come for free).  The HP DL380 G7 SUT uses the same Westmere architecture as the IBM x3650 M3 SUT, so let’s bring up the Westmere CPU socket / memory channel diagram to explain this:

We still have the same three-RDIMM, triple memory channel per CPU socket architecture but take note how one of the three RDIMM slots in each memory channel has been greyed out – instead we have 6 RDIMMs per CPU, and 12 for the whole system.  This is significant because under the Westmere architecture, populating only two of the three RDIMM slots in each DDR3 memory channel allows you to maintain the 1333 MHz native clock speed of the memory modules.  That’s good if you want to get some better memory I/O performance out of your virtualized workloads (although this does not scale linearly with the 67% higher clock speed) but it limits the memory capacity of your server and thus the guest-to-host consolidation ratio.  VMware and HP got around this in a rather brute-force manner – through using very high-density memory modules.  Twelve quad-ranked, dual-sided 16 GB PC3L-10600 ECC DDR3 1333 MHz RDIMMs, to be exact, at approximately $4000 a pop.  To load the HP DL380 G7 SUT with the 192 GB of RAM used in the benchmark would have cost on the order of $48,000 – compare that to the 144 GB of RAM in the IBM SUT priced at $8100.  That almost 500% price increase (~$40,000) over the IBM SUT, for a 4.44% higher benchmark, speaks volumes.  $40k is a lot of money and this could be better used to buy shared storage hardware, HBAs, or even additional virtualization hosts.

…and that’s just the difference in memory between the two SUTs.  The network and shared storage configuration differences between the two sets of benchmarks are worth pointing out because VMware and HP clearly were not shy about pumping more expensive, higher performing hardware into their benchmark setup.

First, let’s review the network interface loadout in the IBM SUT used for the KVM benchmarks:

Not bad by any stretch of the imagination – eight 1 GbE interfaces provides ample room to load-balance and segment guest VM traffic.  But looking at the network interface loadout in HP SUT reveals that things are even rosier for ESX 4.1:

There are 22 network interfaces available to this host, all of which are at least 1 GbE, and two of which are 10 GbE.  Although 16 of the 22 network interfaces were used in the benchmark, two of those 16 were the 10 GbE interfaces.  Also of note is the 10 GbE adapter used in the DL380 G7 – a dual-port Intel X520.  10 GbE pipeline aside, this NIC has significant hardware enhancements that directly benefit virtualization:

  • 128 dedicated transmit (Tx) and receive (Rx) queues per port  > efficient packet prioritization without waiting or buffer overflow
  • VMDq, a feature that offloads the data-sorting functionality from the Hypervisor to the network silicon, improving data throughput and CPU usage
  • Dedicated Virtual Machine Load Balancing (VMLB) that provides both Tx and Rx traffic load balancing across virtual guests bound to the team interface.

Make no mistake, the Intel X520 10 GbE NIC in the DL380 G7 is an awesome piece of hardware designed to complement virtualized workloads on the host.  That and the other fourteen 1 GbE interfaces used in the SUT give ESX 4.1 plenty of room to work with.

The last hardware configuration aspect of the benchmark hardware that I want to highlight is the shared storage.  Let’s first look at the SUT storage for the RHEL5.5 KVM benchmark:

To sum it up, we have a 4 Gb Fibre Channel  backbone with 96 15k RPM SAS spindles, spread across eight IBM storage appliances (two DS3400s and six DS3000s).  Not a bad setup at all but HP and VMware had even better hardware for their SUT shared storage implementation:

Here we have an 8 Gb Fibre Channel backbone with 156 15K RPM SAS spindles, spread across 13 HP StorageWorks appliances (11 MSA2212fc, one P2000).  Keep in mind that both RAID 10 and RAID 5-level arrays were used for the HP SUT storage setup and RAID 10 features a significant write performance advantage over RAID 5, which can certainly come into play with virtualized disk I/O intensive workloads.

Conclusion:

For me, the take-away from this is that while ESX 4.1 had a 4.44% better SPECvirt_sc2010 score than RHEL5.5 KVM, this advantage likely has more to do with the superior memory, network, and storage configuration of the SUT used by HP and VMware than any inherent performance benefit that could be directly attributed to the vmkernel hypervisor.

Furthermore, a look at cost breakdown of the components in the IBM and HP SUTs used for the benchmarks reveals the financial lengths taken by HP and VMware to ensure that ESX edged out KVM.  I priced out the hardware used in the IBM System X3650 M3 and HP ProLiant DL380 G7 virtualization hosts alone by doing some quick searches on the Internet, threw everything together in a table, and rounded off the numbers for easy math.  For the sake of simplicity, I subtracted the cost of the initial RAM loadout from the base system price for both the IBM and HP servers – I wanted to only reflect the price of the RAM actually used in the benchmark (you know, the most expensive component of each system).

I have a hard time ignoring how HP and VMware decided to go with a server configuration that cost approximately 250% more (~$40,000) than that used by the IBM x3650 M3. This is all for the sake of a 4.44% benchmark advantage and being able to squeeze an additional 6-VM tile into the ESX guest loadout (78 vs 72 on the RHEL5.5 KVM SUT).  The cost of the DL380 G7′s high-density memory loadout is unrealistic for a production system in this current economic climate and sharply illustrates the exponential cost increase associated with pursuing the highest possible RDIMM memory densities in virtualization hosts – both for the sake of performance and high guest-to-host consolidation ratios.

I also have to point out that the KVM SPECvirt_sc2010 benchmark was done on RHEL 5.5 – RHEL 6 just recently went GA on November 10, 2010 and brings significant performance increases to the table with virtualized workloads (definitely a blog post for the near future).  I would be very interested to see SPECvirt_sc2010 benchmarks for RHEL 6 KVM using BOTH of the SUT configurations.

Now this is a virt / tech blog so I won’t dwell too much more on money here, but cost savings <is> a significant driving force in the virtualization of production workloads and I would be remiss if I concluded without bringing up the obvious cost differential between RHEL 5.5 KVM and ESX 4.1.  Cost considerations certainly extend to the choice of one vendor’s hypervisor + management solution over another – even if one were to take the 4.44% performance benefit of ESX 4.1 over RHEL5.5 KVM at face value, going with VMware over open-source RHEL 5.5 KVM is <certainly> more than 4.44% more expensive…

This entry was posted in Uncategorized. Bookmark the permalink.

Comments are closed.