I attended a panel discussion session at VMUG this week. A guy from the audience asked a question on how to troubleshoot connectivity issues after moving a VM. The guy had a one flat VLAN with one IP subnet and every time he vMotioned a VM to another host, users lost connectivity with that VM.
To answer his question, a guy on the stage advised him to recreate another VLAN/IP subnet, move that VM to the newly created subnet, and then try to do the vMotion again.
Well obviously that wasn’t a good advice and changing the VM IP address in this case would not help. In this post I will explain what happens when a VM moves and steps you can take to troubleshoot network delay and connectivity issues related to vMotion.
First before you do any vMotion, ensure that the VLAN is configured on all switches and allowed on all necessary trunk ports in the network. You can use the commands show vlan and show trunk to verify that that.
When you create a new VM, the host allocates and assigns a MAC address to that VM. The physical switch which the host is connected to eventually learns the VM MAC address (that happens when the VM ARPs for its gateway, starts sending traffic and the switch sees that traffic, or when the Notify Switches in vSphere option is tuned on).
When a VM moves to another host (the new host could be either connected to another port or the same physical switch or connected to a different switch), somebody has to tell the network to change its destination port for that MAC address in order to continue to deliver traffic to that VM. In vSphere that is usually handled by the host when the “Notify Switches” option is turned on. The host in this case notifies the network by sending several RARP messages on behalf of the VM to ensure that the upstream physical network updates its MAC table.
So the first step in troubleshoot this specific issue is to ensure that the “Notify Switches” settings is set to Yes before you vMotion the VM. In vSphere 5.5 you would go to the vSwitch/vDS settings and then to Teaming and Failover to verify the setting.
If that does not help, try to do the followings:
On the physical switch (the switch the destination host is connected to), issue a command to look up MAC table and find out which destination port the switch is using to reach the VM. On a Cisco Catalyst switch, you can use the “show mac address-table” command for example. If the switch is still using the old MAC-to-port mapping, then it’s either that the switch is not reaching the RARP notification from the host or the host itself is not sending it.
If you don’t do anything to correct the problem, the aging timer for that MAC address on the switch (default is set to 5 mins on Cisco switches) will eventually expire and the switch will learn the MAC address via the new port but to speed things up you can alway flush out the VM MAC address from the switch MAC table (clear mac address-table on Cisco catalyst).
Obviously having to manually intervene and troubleshoot every time you move a VM defeats the whole purpose of vMotion. vMotion is supposed to do live migration of a VM and preserves all active network connections during the process. But hopefully the above steps will give you some pointers on where to look and start to find the root cause.
Additionally VMware recommends disabling the followings on the physical ports connected to the host to minimize networking delay:
– Port Aggregation Protocol (PAgP) and Link Aggregation Control Protocol (LACP).
– Dynamic Trunking Protocol (DTP) or trunk negotiation.
– Spanning Tree Protocol (STP).