Some tips for a BizTalk catastrophic event

Introduction

Today I witnessed a BizTalk catastrophic event. One of the SQL drives for a company I work for failed. In scrambling to find some solutions and fixes for the errors we were seeing I realized I could not find a great case study in a catastrophic event for BizTalk. One thing that would be nice would be to have sort of a list of errors you may see and a list of resolutions. In this post I just want to document a few errors I did not know about and mention what was helpful and what you can do to resolve the issues.

One helpful page I found on BizTalk errors is at http://www.codedigest.com/Articles/BizTalk/250_BizTalk_-_Errors_and_Warnings_Causes_and_Solutions.aspx.

Details

Today we have seen quite a few errors. A couple errors were logged indicating the catastrophic errors:

The following stored procedure call failed: ” { call [dbo].[bts_InsertPredicate_BtsWebServices]( ?, ?, ?, ?, ?, ?, ?, ?)}”. SQL Server returned error string: “Warning: Fatal error 823 occurred at <datetime>. Note the error and time, and contact your system administrator.”.

The following stored procedure call failed: ” { call [dbo].[bts_InsertProperty]( ?, ?, ?, ?, ?)}”. SQL Server returned error string: “Connection failure”.

There were lots of other similar ones. Basically if you see { call [dbo].[bts_<anything>](?*) } then it means some BizTalk stored proc is failing.

We started investigating at this point. The fatal error 823 is a disk I/O error and is documented here: http://technet.microsoft.com/en-us/library/aa337267.aspx. A handy tip for SQL errors is to just search “MSSQLSERVER_” + error code to find the TechNet article. So for us it was searching on MSSQLSERVER_823.

Eventually it was determined that we needed to restore the BizTalk databases. The following link was really helpful as far as getting ready for the restore process: http://msdn.microsoft.com/en-us/library/cc296638(v=BTS.10).aspx. Our system did not have log shipping enabled (the article refers to it but it still provides some good details). So I stopped the remaining host instances in the BizTalk admin console. Then I had to switch to the Service Control Manager to stop other services.

But errors keep getting thrown. The BizTalk applications were still running so I stopped them. Then I checked the ports and they should be stopped by this point but a few were not. I had to manually stop the receive locations, which stopped without complaint. Then when trying to stop the send ports I get this error for every one:

Could not stop Send Port ‘<name>’. Unable to acquire the necessary database session for this operation.  (Microsoft.BizTalk.SnapIn.Framework)

Sometimes I am also getting this error:

Could not retrieve transport type data for Receive Location ‘<name>’ from config store. Both SSO Servers (Primary='<servername1>’ and Backup='<servername2>’) failed. Backup server failure: Exception from HRESULT: 0xC0002A0A (Microsoft.BizTalk.SnapIn.Framework)

The BizTalk administration console is unable to resolve this error while the database is in an inconsistent state. I tried a couple things such as restarting MSDTC or the WMI service (some common admin console workarounds) as well as restarting the admin console but it did not work. One weird thing is that the admin console is able to connect to the group database server, but it seems like something about the send port information in the database cannot be retrieved.

— Update —-

The above errors occurred because something about the BizTalk databases was having problems. For lack of a better word the databases were corrupted. After talking to Microsoft Product Support (shout out to Anzio Breeze), they helped us resolve the problem. The problem was that a BizTalk backup had been restored successfully but was not functioning properly. To diagnose the problem, dbcc checkdb was run on BizTalkMsgBoxDb and SSODB.  The resolution was to restore an older backup. The older backup restored and took longer so we are guessing that more data restored than the previous restore attempt.

If you get a SQL fatal error 824 then one possible symptom is that a database is corrupted. Try restoring from a backup.

Thanks,

Blog at WordPress.com.

Up ↑