AppDomains, AppDomains: Can’t live with ‘em…but can live with fewer of them

March 13th, 2011 by Tim Stewart

Recently we noticed that FogBugz was slowwiiiiinng down.  Some customers still praised it compared to the competition. But, it was definitely no longer as instant as we were craving it to be. We decided to investigate.
We discovered the culprit quickly enough:  AppDomains.  Yes, those fuzzy wonderful things that keep FogBugz humming for thousands of customers each with their own user Plugins, Extras, BugMonkey scripts, API calls, and whatnot.

The problem started rather innocently.

We want to allow third-party plug-ins to run on FogBugz On Demand, so our developer community can continue to expand and improve FogBugz’s functionality. But to protect your FogBugz instance from being affected negatively by code we don’t directly control, we segregate each plugin’s operations into its own AppDomain, a discrete area in which the plugins can operate without affecting one another. We created one AppDomain for each account running third-party plugins.
It turns out, IIS/.NET don’t handle hundreds or thousands of AppDomains very well. The marginal performance hit for each additional AppDomain grew with each AppDomain that was added.  The graph below gives an idea of this extra-linear growth. With 33% more AppDomains, we saw a 400% increase in response times.
Performance degraded extra-linearly for each new AppDomain, and we were adding AppDomains linearly per new customer (one new AppDomain per customer per plug-in).
We’ve contacted Microsoft to learn why AppDomains scaling is not linear, but it’s rather clear that it’s not.

Solution

Once we determined what the problem was, we figured this: theoretically, any given plug-in only needs one AppDomain to run across all FogBugz On Demand accounts. We could load each plugin into its own AppDomain, since there are far fewer plugins than there are accounts.
The only potential issue was that a value stored in a static variable in the AppDomain would be visible across multiple accounts. This caused quite a bit of concern until one of our engineers pointed out that we already vet On Demand plug-ins for security, with a rigorous review for XSS vulnerabilities. If we’re willing to trust our review process to guard against nasty XSS attacks, why aren’t we willing to review for static variables to prevent data bleedover?
We reviewed all plug-ins available to FogBugz On Demand and found zero need for changes outside one plug-in developed in-house. (It shall remain nameless… but if you wanted to download the source code for old versions of all the plug-ins and unzip and pore over the source code, you could probably figure it out).

How did we do?

Just to give a sense of the improvement, our web servers each had on the order of 1,000 AppDomains before we deployed the fix. Once we deployed the fix, they each had less than 50. Response times, accordingly, got much much better.  Since deploying, we’ve also noticed that our web servers and our backend servers are using much less CPU and memory.
Our next performance attack is client-side.  We’re also improving performance profiling of installed FogBugz so we can be alerted early in testing as to whether a new feature degrades response times.
Please keep an eye on your FogBugz performance.  We hope you’ll be seeing it get faster and faster!