Tuesday, August 13, 2013

Failover Clustering Event 1196 and 1228

In this case Hyper V failover cluster was installed on Windows Server 2012, and on one of the nodes that was hosting the "Cluster Group" started to log following error events in System event log:

Event 1228:
Cluster network name resource 'Cluster Name' encountered an error enabling the network name on this node. The reason for the failure was:
'Unable to obtain a logon token'.
The error code was '1326'.
You may take the network name resource offline and online again to retry.


and Event 1196:
Cluster network name resource 'Cluster Name' failed registration of one or more associated DNS name(s) for the following reason:
DNS bad key.
Ensure that the network adapters associated with dependent IP address resources are configured with at least one accessible DNS server.

I moved the "Cluster Group" to another node, but same story and same events were logged. I tried to live migrate VMs between nodes, but unsuccessfully, the live migration was failing. Quick migration was working fine. Failover Clustering Diagnostic Log during live migration was showing following error messages:

[RES] Network Name: [NNLIB] LogonUserEx fails for user 'Cluster Name': 1326 (useSecondaryPassword: 0)
[RES] Network Name: [NNLIB] LogonUserEx fails for user 'Cluster Name': 1326 (useSecondaryPassword: 1)
[RES] Network Name: [NNLIB] Logon failed for user 'Cluster Name' (Error 1326), DC \\dc.domain.name, domain domain.name
[RES] Network Name <Cluster Name>: Identity: Obtaining Windows Token for Name: 'Cluster Name', SamName: 'Cluster Name', Type: Singleton, Result: 1326, LastDC: \\dc.domain.name
...
[RES] Network Name <Cluster Name>: Initializing Identity module failed with error 1326
[RHS] Error 1326 from ResourceControl for resource Cluster Name.
[RCM] ResourceControl(NETNAME_GET_VIRTUAL_SERVER_TOKEN) to Cluster Name returned 1326.
[RES] Virtual Machine <Virtual Machine Name>: Live migration of 'Virtual Machine Name' failed.

I've checked for the permissions of the CNO DNS record and CNO AD object, and everything was fine, but somehow the password was out of sync with AD. And here are the steps for remediation:
  • Moved the CNO account to Computers container
  • Logged on one of the cluster nodes with account that had Reset Password right
  • Simulate multiple failures of the cluster Network Name resource until permanent failed state
  • Once in failed state, right click on resource and in More Action chose Repair
The last action will reset the CNO password in AD, and will bring the resource online. CNO DNS record was successfully updated, live migration of VMs started to work, and no error events were logged on 'Cluster Group' owner.

For more info about CNO on Windows Server 2012 please check http://blogs.technet.com/b/askcore/archive/2012/09/25/cno-blog-series-increasing-awareness-around-the-cluster-name-object-cno.aspx

 

3 comments:

  1. I seem to be having this same issue with my production environment. Will running through these steps take down my running VM's that are currently clustered?

    Thank you

    ReplyDelete
  2. Nope. Your vms will coutinue to work ... But, before resetting cno password in AD, have you tried just to take Cluster name resource offline, and bring it online ? You're vms will not be affected ...

    ReplyDelete
  3. After many hours of research have finally solved the problem!

    By default a cluster will try to register the DNS record to all available DNS servers, including the secondary DNS server (ISPs or other specified DNS). The solution is simple,

    1. REMOVE ALL SPECIFIED DNS SERVER EXCEPT YOUR LOCAL DNS SERVER.

    2. Open cluster manager, right click on the cluster IP address, and take offline (will not affect your cluster rolls)

    3. Right click on tour cluster and click repair. WALLA issue fixed.

    ReplyDelete