Quantcast
Channel: VMware Communities: Message List
Viewing all articles
Browse latest Browse all 251495

Re: WARNING: NMP: nmp_DeviceRequestFastDeviceProbe

$
0
0

I might have some helpful information here. I have a lab setup where I have completely broken various things a lot during the learning curve and through carelessness. I ran into this particular issue today after I moved my top-level openfiler VM's IP Storage vNIC (using VT-d to present pass-through volume sets from my areca controller to hosts)  onto another vSwitch Port Group in the same VLAN on its host. I was getting all kinds of errors related to this thread. The Hypervisor for the host was locking up, the iSCSI devices and datastores were flapping, the VMs were in unknown status, and when I could get info from some of the datastores, or when I tried to re-add them, the wizard said they were empty. Multiple reboots of the host and filer did nothing.

 

I realized that the switch I moved the vNIC into did not have jumbo frames enabled, but I'm gathering that if jumbo frames are suddenly disabled anywhere in the network loop that this might happen. I have no clue whether an update would affect the jumbo frames setting on vSwitches. In any case, it seems feasible that an upgrade/update might do something to muck up the VM Kernel ports or Port Groups related to the initiator or IP storage virtual network. Here is what I did to fix my scenario...

 

=========================

First, stabilize the host(s):

 

1. Stop iSCSI target service on filer.

2. Remove "unknown" guests from the affected host(s).  The logs should stop going nuts, but for me the vSphere client  was still very slow, so...
3. Reboot ESX host(s).

4. Unmap the LUNs from the target(s) on the filer. (I had to create entirely new targets as part of the process)
5. Make sure jumbo frames are turned on in the vSS/vDS at the switch level, port group and/or VM Kernel port for the initiator or filer. Of course this is only relevant if you have jumbo frames enabled on the filer and physical switch(es), which is what I assume.

6. Create a NEW target on the filer and map a LUN to  it, allow one ESX host in the ACL, and start the iSCSI target service.
7. Rescan  the HBA on the host. If this ultimately doesn't work then I would start over and  nuke/pave the switch, PG, VMK, etc. if not done already.

 

###I've performed several types of screw-ups with the iSCSI HBAs where the entire HBA/switch setup needed to be nuked and paved. If this process doesn't work, try removing the VM Kernel port(s) from the initiator(s) and removing the switches and creating them again with the relevant port group(s)/VM Kernel Port(s). Make sure jumbo frames are enabled everywhere relevant. Switch level, PG level, VMK level. Then add the new VM Kernel port(s) back to the initiator(s). All I can gather is that when something goes really bad the OS doesn't know how to deal with the existing devices or targets anymore.###

 

6. If the HBA devices show up normally again, check the datastores. All of mine but one out of 6 were not present and had to be re-added. That one showed up as "unknown (unmounted)". I tried to mount it and got an error, but then it mounted. It was probably already mounting, I guess. For the ones that I added back, I chose "Keep existing signature" in the wizard. I don't know what creating a new signature could ultimately affect, but it didn't seem like the right choice because I think you only need to resignature a copied datastore.

 

I added one LUN at a time to the target and brought all 6 datastores back online successfully without any data loss, ending my streak of a half-dozen irreparable catasrophes. I hope this helps.


Viewing all articles
Browse latest Browse all 251495

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>