How to simulate the impact of a Compute Node failure on running VMs in PureApplication?

Originally posted on IBM Developer blog “Exploring PureApplication System, Software Service and more”  by Jonathan Deberdt on 16 January 2017 (5924 visits)

When designing and building patterns on PureApplication, most clients need to demonstrate the impact of hardware failure of a Compute Node on VMs running software. For example, what happens when the Compute Node hosting the primary DB2 VM fails? Deploying a pattern instance and pulling out a Compute Node from a PureApplication System in the data center is usually not an option (as suggested in this thread on the dw Answers forum). Instead you could try and simulate this by:

  1. Explicitly killing the process(es) on the VM.
  2. Issuing a “shutdown -h” from the shell within a VM.
  3. Issuing a “Power off” from the PureApplication UI for the VM.

The issue with (1) is that this is very specific to the actual software. When performing (2), the OS actually performs a graceful shutdown (so again not realistic). Option (3) looks like the best option, however we found that the command to “Power off” a VM actually send a signal to the OS to perform a graceful shutdown as well.

We will describe a mechanism here to perform a more “realistic” and abrupt “Power off” of the VM here. We can achieve this by making sure that this signal that is sent to the VM simply does not arrive or does not perform a graceful shutdown. Under the covers, VMWare Tools is installed on every VM in PureApplication (on Intel). VMWare Tools is running a set of services within the OS of the VM, which can be used to receive the call from VMWare to perform a graceful shutdown of the OS. In order to prevent this for testing purposes, we can simply stop the VMWare Tools services:

Note:    Please keep in mind that performing this simulation of an abrupt shutdown of a VM may lead to data corruption of the filesystem(s) within the VM!

-bash-4.1# /etc/vmware-tools/services.sh stop
Stopping VMware Tools services in the virtual machine:
   Guest operating system daemon:                          [  OK  ]
   Unmounting HGFS shares:                                 [  OK  ]
   Guest filesystem driver:                                [  OK  ]
   VM communication interface socket family:               [  OK  ]
   VM communication interface:                             [  OK  ]

With the VMWare Tools services stopped, issuing a “Power off” from the PureApplication UI for the VM will effectively be an immediate shutdown. You can validate this by examining the file messages in /var/log after the “Power off”, it should show a message that recovery is required on a readonly filesystem:

Jan  5 08:11:29 pure-9-3-172-232 kernel: dracut: Scanning devices sda2  for LVM logical volumes vg_root/LogVol00 vg_root/LogVol01 
Jan  5 08:11:29 pure-9-3-172-232 kernel: dracut: inactive '/dev/vg_root/LogVol01' [2.00 GiB] inherit
Jan  5 08:11:29 pure-9-3-172-232 kernel: dracut: inactive '/dev/vg_root/LogVol00' [9.75 GiB] inherit
Jan  5 08:11:29 pure-9-3-172-232 kernel: EXT4-fs (dm-1): INFO: recovery required on readonly filesystem
Jan  5 08:11:29 pure-9-3-172-232 kernel: EXT4-fs (dm-1): write access will be enabled during recovery
Jan  5 08:11:29 pure-9-3-172-232 kernel: EXT4-fs (dm-1): recovery complete
Jan  5 08:11:29 pure-9-3-172-232 kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: 
Jan  5 08:11:29 pure-9-3-172-232 kernel: dracut: Mounted root filesystem /dev/mapper/vg_root-LogVol00

Another approach here would be to simply uninstall the plugin from VMWare Tools that facilitates the call from VMWare to the OS for a graceful shutdown. The name of this module is vmware-tools-plugins-powerOps, and you can simply uninstall it using the following command:

yum remove vmware-tools-plugins-powerOps

The advantage of this approach is that you will have the remaining VMWare Tools modules still in place, which in general optimise performance of the OS running inside the VM. Should you wish to re-install the module vmware-tools-plugin-powerOps, you can simply download it from here. Note that you must ensure to download the correct version, i.e. it should match the OS and VMWare ESXi version used. For PureApplication System 2.2.2, ESXi 6.0.0 is installed and you can find the vmware-tools-plugin-powerOps for RHEL6 here. Older versions of PureApplication System use ESX 5.1.0 update 3, you can find the vmware-tools-plugin-powerOps for RHEL6 here.

Note: You can determine the version of ESX by performing a REST GET call to https://<PSM>/admin/resources/hypervisors/. Here you will find the ESX version installed on each Compute Node:

software_version": "VMware ESXi 6.0.0 build-3620759",

You can find more information about the VMWare Tools Operating System Specific Packages here.

Published by jonathandeberdt

I'm a Technical Specialist in IBM Technology Expert Labs. I'm working as a consultant for IBM Cloud Pak System products (formerly IBM PureApplication) since the first release in 2012 and for several European customers.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Design a site like this with WordPress.com
Get started