One of our developers came across with a problem they were having with a new app they’d put live on IIS. They were getting a lovely 500 error back about SQL Server connection issues.
So, as usual my first port of call when dealing with connection issues is to have a look through the SQL Server error log. And, once again it turned up the error:
2014-02-20 10:56:29.210 Logon Error: 18456, Severity: 14, State: 11. 2014-02-20 10:56:29.210 Logon Login failed for user 'Domain\Acme-test-web-1$'. Reason: Token-based server access validation failed with an infrastructure error. Check for previous errors. [CLIENT: 251.148.71.16]
Looks promising and simple doesn’t it. The
$ on the end of the Active Directory object name tells us that this is a Machine account. So we have 2 options here, either the server itself is trying to connect, or it’s impersonating another user for the connection (The Kerberos Double Hop issue).
A quick look through the logins on this SQL Server showed that
Domain\Acme-test-web-1$ had a valid login, and also permissions to the database it was trying to connect to.
So now, I moved on to looking at any errors generated in the SQL Server Ring buffers. For this I use this query:
SELECT CONVERT (varchar(30), GETDATE(), 121) as runtime, dateadd (ms, (a.[Record Time] - sys.ms_ticks), GETDATE()) as [Notification_Time], a.* , sys.ms_ticks AS [Current Time] FROM (SELECT x.value('(//Record/Error/ErrorCode)', 'varchar(30)') AS [ErrorCode], x.value('(//Record/Error/CallingAPIName)', 'varchar(255)') AS [CallingAPIName], x.value('(//Record/Error/APIName)', 'varchar(255)') AS [APIName], x.value('(//Record/Error/SPID)', 'int') AS [SPID], x.value('(//Record/@id)', 'bigint') AS [Record Id], x.value('(//Record/@type)', 'varchar(30)') AS [Type], x.value('(//Record/@time)', 'bigint') AS [Record Time] FROM (SELECT CAST (record as xml) FROM sys.dm_os_ring_buffers WHERE ring_buffer_type = 'RING_BUFFER_SECURITY_ERROR') AS R(x)) a CROSS JOIN sys.dm_os_sys_info sys ORDER BY a.[Record Time] ASC
Which I picked up years ago from this MSDN blog post Troubleshooting specific Login Failed error messages
Which returned the following truncated results (also reformatted to fit better):
runtime Notification_Time ErrorCode CallingAPIName 2014-02-20 15:30:47.807 2014-02-20 10:56:03.147 0x534 LookupAccountSidInternal APIName SPID LookupAccountSid 61 RecordID Type Record Time Current Time 69 RING_BUFFER_SECURITY_ERROR 637951566 649096225
Which pointed to a login not being found. As we new the machine account had a working login, we went back to look at the IIS box that was doing the calling. A quick peruse of the Application Pools showed an obvious cause. One of the app pools had been left running as “Application Identity”. This is a built in server level account which is there to allow people to quickly setup application pools. But being a server account it has no permissions off of the box. So when it does request access to a remote resource it delegates via the machine account. So, in our case, this application pool was trying to connect to the SQL Server by delegation, so while the login reported as failing was fine, it was only acting on behalf of an account that didn’t have a valid login.
A new service account was requested, and once live the application pool identity was changed over to that. And the problem disappeared.
Having had a quick BinGle around shows plenty of forum posts and others talkig about this issue, but I didn’t see this specific situation mentioned. I also saw a number of ‘solutions’ talking about adding random accounts to the servers Administrators group, which is almost NEVER the correct solution for a SQL Server login problem