Developing Situational Applications with Web 2.0 Mashups

The evolution of Web sites to dynamic rich interactive applications is a true revolution for users. But for ASP.NET developers tasked with building high-performing scalable applications, it presents major challenges. The features that characterize blogs, wikis, personalized pages, and other data-driven Web 2.0 applications fundamentally change processing, transmission, and rendering workloads, and require new approaches and solutions. In Web 2.0 applications:

•   Content is highly dynamic, with the most of the content generated by users. The popularity of individual pages or the application itself can change rapidly, making it extremely difficult to anticipate workload and traffic volume.

This Year AJAXWorld Is Sponsored by More Than 60 Leading Rich Web Technology Companies
AJAXWorld Conference & Expo this year was sponsored by the world's leading rich web technology providers including: 3Tera, Addison-Wesley, Adobe, Apress, Backbase, Bindows, Conference Guru, Cynergy Systems, Dynamic Toolbar, Extension Media, Farata Systems, Flash Goddess, FrogLogic, GoingToMeet.com, Google, Helmi Technologies, IBM, ICEsoft, ILOG, IT Mill, Ittoolbox, JackBe, JetBrains, Kaazing, Krugle, Laszlo Systems, Lightstreamer, Manning Publications, Methods & Tools, Microsoft, Nexaweb, OpenSpot, OpSource, Oracle, Parasoft, Passport Corporation, PushToTest, Quasar Technologies, Rearden Commerce, Servoy, SmartClient / Isomorphic Software, SnapLogic, Sun Microsystems, TechTracker Media, Tele Atlas, The Thomson Corporation, ThinWire, TIBCO Software, TileStack, Universal Mind, Vertex Logic, Web Spiders, and Webtide.

•   Pages are personalized and remember who the user is and what he or she did last, but users don't follow a predefined process.

•   Applications literally stitch pages together on-the-fly, generating the page in the user's browser from multiple sources. Instead of rendering pages via a single trip to a server, applications rely on content from multiple services that may reside on separate virtual or physical servers, which means multiple round trips between the browser and the various content sources. And because the dynamically generated content is dependent on the individual user, the frequency and order of those trips is highly unpredictable.

•   Popular content is apt to be viral, which means that the volume of traffic can jump exponentially with little warning. And the more features and user-generated content you have, the faster it can happen.

Creating an application that can deal with these characteristics for a known number of users is challenging enough. But if your site suddenly becomes hugely popular, the complexity of managing your application increases exponentially. Ultimately, having your application succeed beyond your expectations isn't a bad problem to have. The need for new features and the need to scale to accommodate unpredictable loads, however, put enormous demands on you and your application. You need an architecture that's agile enough to accommodate both growth and sudden surges.

Achieving such an architecture is no simple task. Page rendering in applications that incorporate user-generated content is substantially more complex than traditional data-driven Web applications. Additionally, in the race to deliver new features as quickly as possible, developers often turn to tools designed to accelerate development, many of which aren't optimized for performance at scale. When you combine more complex development processes with unpredictable use patterns and a rush to get applications to market as quickly as possible, programming for scalability inevitably takes a back seat.

Fundamentally, to ensure optimal performance even under extreme swings in demand, you need a system that can intelligently distribute load and lets you scale individual components of the application as needed. To achieve that, you need to be thinking about two key strategies: granular distribution and specialization.

Architecting More Granular Distribution
The traditional solution to increasing the scalability of an application has been basic distribution - throwing more hardware at the problem and distributing the application load among more servers. The problem with this strategy is that it's only effective if your entire application scales symmetrically, and in a Web 2.0 world, that's extremely unlikely. Your image demands may be rising much faster than your page computation demands, for example, but adding servers and having them all do the same work doesn't take that into account. What you need is a system that distributes intelligently at a more granular level - that's organized to scale individual components of the environment as needed. And that requires both a more intelligent approach to distribution and greater use of specialization.

The key to effective distribution is the ability not only to replicate servers but to manage all of those servers as a group. But the biggest impediment to doing that (and to responding dynamically to rapid changes in demand) is hidden resource affinities. The most common affinity is session, but there are a number of others in ASP.NET. Session affinity cripples the ability to distribute load between servers because a given user must always work (or "stick") with the same server where the session data resides. The theory of distribution is that you can double the number of users you can support by doubling the number of servers. An affinity, like session, undermines that behavior, so doubling the number of servers may only support 50% more users. Over time, that ratio continues to degrade until you get virtually no additional load support for additional servers.

As an application is developed developers focus primarily on features and performance and affinity issues rarely have a high priority. And as long as a relatively small number of users are using the application, they don't present a significant problem. When the application grows in popularity and requires more resources, however, these affinities can significantly impair the application's ability to scale. Ultimately, they can make it impossible to load balance effectively, undermining the entire distribution strategy.

To get rid of session affinity, you must move from an in-process session to out-of-process session. ASP.NET includes out-of-process options for handling session state. Without any additional application coding, you can configure the Web server to store session data in a separate database. However, developers typically avoid this solution because the additional processing tends to sap performance. The two extra trips across the internal network (reading session data from the database at the beginning of each session and writing session data at the end) makes an out-of-process session take as much as six times longer than an in-process session - a huge impact on overall application performance.

Fortunately, these out-of-process options aren't the only way to solve the session state affinity problem. One of the great things about ASP.NET is its broad support for third-party tools, components, and services. Session state management, in particular, uses a standard set of interfaces for storing and retrieving data, which means that many steps in the request processing pipeline can be handled by code from third-party vendors and solutions. This opens the door to third-party software and hardware products that address affinity.

Software solutions are available that provide distributed in-memory caching of session state and other workload data, partitioned across a Web server farm. There are also hardware solutions, such as the Strangeloop AS1000 Application Scaling Appliance, which centrally manages session state from an appliance. Because hardware solutions are deployed in-line, between the network load balancer and the application servers, they can manage session information out-of-process without a performance penalty. In Figure 1, you can see where the acceleration appliance sits in the Web farm, so that it can provide out-of-process session data while minimizing performance impact.

Specialization
Distributing load more intelligently is the first step to creating a more agile application, but a second, equally important, requirement is specialization. Fundamentally, specialization is the process of taking specific elements that the application reuses and isolating them from other elements. By doing that, you can distribute the workload more evenly and scale individual elements independently, as needed. Three immediate targets to consider for specialization are image handling, encryption, and caching.

IMAGES
Images are fundamentally different from the rest of an ASPX page and are handled by a different part of IIS entirely. So why put the additional load of image handling on servers that are primarily geared toward ASPX processing, when you can move them somewhere else? You can handle images with separate IIS servers inside your data center that are configured and optimized for image retrieval. You can also use third-party image services, such as Akamai, and take image processing out of your environment entirely.

Of course, distributing image management isn't without its challenges. It's code-intensive, and it can make the management of your application more complicated. When you're updating your site, for example, you have to update image servers as well as Web servers.


ENCRYPTION
SSL Encryption is perhaps the most common and straightforward service that lends itself to specialization. If your site includes some pages and services that need to be secured, you can run those pages separately - either through a third-party SSL processor or through dedicated Web servers that are configured to run your SSL pages.

Like image handling, building a specialized SSL service creates a need to point to a different group of servers in your code, adding additional development complexity and requiring additional management effort. Alternatively, as with affinity, there are hardware solutions, such as application acceleration appliances that can reduce the coding requirement for SSL by handling the handshaking and encryption that would normally have to take place at the server. (See Figure 2.)

In general, like all types of specialization, distributing SSL and image handling have a cost in terms of development complexity and management effort. But what these strategies let you do is distribute each specialized task independently of the others and utilize your hardware more effectively - ultimately making your application more agile.

BROWSER & OUTPUT CACHING
One specialization that can have a huge impact on your ability to scale for growth or for sudden changes in demand is caching of both static and dynamic content. By incorporating browser caching, you can significantly reduce the load on your Web servers. You can also use appliance-based caching solutions that adjust HTTP headers to manage browser caching, as well as providing a more scalable substitute for ASP.NET output caching services.

Of course caching has its own costs. Both browser caching and output caching are complicated and time-consuming to code, which means you don't want to use them everywhere. Finding the places in your application where caching makes the most sense and would deliver the most value can be difficult. Even more significant, when you add caching you also add new requirements for maintaining the currency of the cache. The last thing you want is to be serving data that's no longer accurate. Knowing when the cache is wrong and developing a dynamic strategy to deal with cache expiries is critical. Often, the complexity of addressing these issues causes many developers not to use caching, despite its obvious benefits.

DATA CACHING
The most significant affinity in any ASP.NET application is the data. Introducing data caching (separating data into read-only and read/write specializations) is ultimately the highest-impact technique you can use to improve the scalability and agility of your application. Developers with the biggest Web 2.0 sites such as MySpace and Facebook recognize this, and it's the data caching strategies they've employed that have allowed these sites to become hugely successful Web 2.0 applications.

Of course, data caching is probably the most complicated thing you can do in an application. Data, by its nature, wants to be stored in only one place and doesn't distribute well. You have to use a system called multi-master replication to code the "write" components of the application to write to different databases than the "read" components, while allowing read/write components to feed data continually to read-only components. It's incredibly difficult to do. (As a consultant, if a client wants to use data caching, I know I'll have a job for years.) But when you really need to scale to millions of simultaneous users, data caching is the best way to do it.

Building the Agile Web 2.0 Application
Actually implementing all of these strategies isn't trivial. But if your application is going to be agile enough to truly embrace Web 2.0 capabilities at scale, intelligent distribution and specialization are essential.

Of course, there's one other vital piece of the puzzle: instrumentation. How do you know which parts of your site are ideal targets for distribution and specialization? Even after you've architected a sound distribution strategy, how do you measure whether it's continuing to meet your needs as they change? You need proper instrumentation in place to tell you when sudden changes are occurring and when your various distribution and specialization strategies should come into play. Without hard information about what's actually happening in the environment, even the most advanced specialization and distribution strategies are operating in the dark.

When you combine more intelligent distribution and specialization with up-to-the-minute knowledge of your environment, you can build true agility into your application. More important, you can take full advantage of all of the potential that Web 2.0 has to offer, knowing that no matter what demands the future may hold, your application and environment are prepared to meet them.

© 2008 SYS-CON Media