Quantcast
Channel: VMware Communities: Message List
Viewing all articles
Browse latest Browse all 251495

ESXi 5.0 U2 Froze when 1 SAN switch hiccuped

$
0
0

Environment: ESXi 5.0 U2, Cisco UCS Blade servers, redundant Cisco MDS FC switches in separate fabrics, and Fibre Channel storage array.  The FC array is redundantly connected to the MDS switches. ESXi configured for active/active round robin (per array vendor best practices).

 

Today one of the MDS switches had a brain fart of some type (still investigating), so one of the two fabrics went down. The Windows Server 2012 physical UCS server that had LUNs presented maintained I/O connectivity through the one operating switch. However, the dozen ESXi servers did not gracefully handle the fabric hiccup and basically froze. Could not connect via the vSphere client and the DCUI was unresponsive once you authenticated. VMs also froze, but could still ping.

 

After realizing it was a fabric problem, we admin shut the MDS uplinks on the affected switch to the UCS FI, and then all of a sudden the ESXi hosts unfroze and started sending I/Os down the other (non-affected) fabric.

 

During the installation of the array a few months ago, we did thorough HA testing including turing off SAN switches, pulling cables from the array, pulling UCS FI cables, etc., and had no problems like what we had today. But that was using ESXi 5.0 U1.

 

Anyone have an idea why, with two valid paths to each LUN, ESXi froze? Only when the UCS FIs downed the server HBA ports did ESXi wake up and re-route the I/Os to the other fabric. The APD condition should not have been triggered, since two paths via the other fabric were 100% available.


Viewing all articles
Browse latest Browse all 251495

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>