Sunday, November 13, 2011

Problem accessing CSV from passive Cluster Node

I recently had a perplexing problem where Cluster Shared Volumes in a Hyper-V cluster were not working correctly.  The volumes were only accessible from the node currently owning the volume.  Attempts to access the volumes from any of the other nodes resulted Windows Explorer hanging indefinitely.  Enabling maintenance or redirected mode made no difference.

Event ID 5120 was logged:  Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer available on this node because of 'STATUS_BAD_NETWORK_PATH(c00000be)'. All I/O will temporarily be queued until a path to the volume is re-established.
Event ID 5142 also occurred:  Cluster Shared Volume 'Volume1' ('Cluster Disk 1') is no longer accessible from this cluster node because of error 'ERROR_TIMEOUT(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.

I could ping all nodes over both the Production and Heartbeat network links, and I could access file shares from any node on any node.

The Problem
After much troubleshooting I realised I disabled both File and Print Sharing and Client for Microsoft Networks on the Heartbeat NIC on all nodes.  This is a best practice drummed into me since working on Microsoft Clustering when it was still code-named Wolfpack.

The Resolution
I enabled File and Print Sharing and Client for Microsoft Networks and immediately afterwards all my Cluster Shared Volumes started functioning as expected.

The Explanation
It’s documented in MS KB Article 2008795.  When accessing a CSV volume from a passive (non-coordinator) node, the disk I/O to the owning (coordinator) node is routed through a 'preferred' network adapter and requires SMB be enabled on that network adapter. For SMB connections to work on these network adapters, the aforementioned protocols must be enabled.  Ugh.

1 comment:

  1. Hi,
    You're right but I had same problem with SMB already enabled !
    To solve prob I just restarted DNS then Clusterservice on node with no access and everything worked fine again.
    Hope this could help...