So, what’s up with MAIL02?

AEMS users whose mailboxes reside on mailbox server MAIL02 will have undoubtedly noticed that their particular mailbox server has been woefully unstable of late, much to everyone’s consternation. The problem is that the store.exe processes, which handles interactions with the databases containing the mailboxes, has been experiencing sporadic crashes. Through analysis of crash dumps, Microsoft Support has determined that problem is a known issue, addressed in Exchange Server 2007, Service Pack 2, Rollup 3.

We had been planning on installing Service Pack 2 and the latest rollup at some point this fall since it is a prerequisite for an eventual transition to Exchange 2010. However, this issue has forced our hand, and we are now planning to install Service Pack 2 and Rollup 4 during the maintenance window this coming weekend (or sooner if the Powers-That-Be™ authorize it). I have been knee-deep in testing the new updates to move them through our change management process.

Microsoft has not published details on the nature of the bug, and for very good reason. A nefarious individual could use the bug to launch a denial of service attack on Exchange servers which are not patched up to at least SP RU 3.  Suffice it to say, that it involves store.exe not being able to properly digest certain strings of data coming from mobile devices.

We apologize for the inconvenience and ask that everyone bear with us until we get this fixed.

UPDATE [Aug. 24, 2:30PM]: Microsoft’s analysis of the last two crash dumps that we sent in revealed that the user responsible for them was the same one found in the first crash dump analysis, despite the fact that we had disabled ActiveSync access for that user. It turns out that all previous cases that the folks at Microsoft had encountered of this problem were due to mobile devices, but the crash analysis only reveals that the method of access being used was non-MAPI. The user in question happens to be an Entourage user.  We’ve cut off all non-MAPI access for the user in question and provided instructions on accessing his mailbox via Outlook on vDesk.

That having been said, although we haven’t had a store crash since 11:11am, CPU usage on MAIL02 is still being pegged. The store crashes have clearly left the server in a degraded state performance-wise. We will likely have to perform a failover and reboot at some point in order to clear it up, possibly after 5:00pm.

UPDATE [Aug. 25, 9:25AM]: Mid-afternoon yesterday, I decided to go ahead and perform an emergency failover of MAIL02 since its performance was in such a degraded condition. Performance has been solid since then, and there have been no further store.exe crashes.

We are preceding with plans to apply updates this weekend. In fact, the change board meeting to gain approval for the change takes place in about an hour.