Monday, April 21, 2014

VMWare ESXi 5.5 NFS Disconnect Issues -- Notification on general vsphere events

Ever since we've been running vsphere 5.5 we have been experiencing NFS Datastore disconnects. We have delved into every facet of storage, networking, and even advanced configuration of our vsphere hosts.

So far little information is known about the root cause. VMWare finally put out a KB about the issue. The only problem with this guy is A. we're not running 5.5 update 1 and B. we're not running 5.5 update 1.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004684

So for all the hardworking techies out there I provide at least a base consolation for you. Should you be living this dream, then you can use the following and save it as a .ps1 script then create a task that will kick it off every 5 minutes. This way you'll at least know where the pain is coming from without having to spend precious time finding the broken host(s). We've found the only option to recover is to power off the VM Guest, which is literally the only command you'll be able to effectively issue it, then power it on, which will move it to a working host. To simplify it further you can put the host into maintenance mode. By doing this you'll know of any VM Guests that are in a locked state.

##### Start Script #####

#Generated by JH

####################FUNCTIONS####################
Function sendMail($body){
set-content \\pathtoalogfile\nfs_logs_$(get-date -format M.d.yyyy).txt $body
Write-Host "Sending Email"
$smtpserver = "yourmailserver"
$recipient = "yourdistributionlist"
$sender = "yoursourceemailaccount"
$subject = "ESX NFS Issue Report"
send-MailMessage -SmtpServer $smtpserver -To $recipient -From $sender -Subject $subject -Body $body -Priority high
}

Function processoutput($bodyfile){
$body = [IO.File]::ReadAllText($bodyfile)
$endcount = $log.count + 1
$count = 0
Do {
$log[$count] | % {
If ($psitem.createdtime -ne $null){
If (!($body -imatch "$($psitem.createdtime) -- $($psitem.objectname)")){
add-Content $bodyfile @"

$($psitem.createdtime) -- $($psitem.objectname) -- $($psitem.fullformattedmessage)

"@
}
Else{
$body = $body.Trim()
set-Content $bodyfile $body
}
}
}
$count++
}
Until ([int]$count -eq [int]$endcount)
}

#################################################

$start = (get-date).adddays(-1).ToString('M/d/yyyy')
#$start = get-date -format M/d/yyyy
#$start = "1/1/2014"
$end = (get-date).adddays(+1).ToString('M/d/yyyy')
$start2 = get-date -format M.d.yyyy
set-content \\pathtoalogfile\nfs_logs_$start2.txt "Starting Run @ $(get-date)"
Add-PSSnapin VMware.VimAutomation.Core
Set-PowerCLIConfiguration -DefaultVIServerMode Multiple -Confirm:$false
cls
connect-viserver yourhostorvcenternumber1 -User youruseraccount -Password yourpassword
connect-viserver yourhostorvcenternumber2 -User youruseraccount -Password yourpassword

$pattern = "esx.problem.vmfs.nfs.server.disconnect|esx.problem.storage.apd.timeout"
$hostlist = (get-vmhost | sort).name

cls
$log = $null
$hostlist | % {
$psitem
$log += ,@(get-vmhost $psitem | get-vievent -start $start -Finish $end | ? { $psitem.EventTypeID -imatch ".*($pattern).*" })
}
$body1 = [IO.File]::ReadAllText('C:\vievents.txt').Trim()
processoutput C:\vievents.txt
$body2 = [IO.File]::ReadAllText('C:\vievents.txt').Trim()
disconnect-viserver * -Confirm:$false

If ($body1 -ne $body2){ sendmail $body2 }
Else { add-content \\pathtoalogfile\nfs_logs_$(get-date -format M.d.yyyy).txt "No NEW Errors as of $(get-date)" }

No comments: