Abstract
Caching of content is an important consideration to maximizing the scalability of most computer applications. HTTP caching utilizes the technology built into the internet itself in web servers and user agents (browsers) to scale performance. Below I discuss the headers that control HTTP caching on an example website, beginning first with the serving of static content, and then with the serving of dynamic content. Special considerations of caching under SSL is noted. (For additional information see the List_of_HTTP_header_fields).
I finish the discussion with recommendations for a particular website--based on my "huh, whot?!" experience after looking at their use of headers.
Caching Static Content
Static resources consist of hard-coded files with extensions such as .htm, .js, .cs, .gif, .jpg, etc. When a resource is initially requested from a browser, the raw HTTP/1.1 request looks like this (as seen in Fiddler):
Exhibit 1: Request
GET https://myserver.com/test.htm?aa=myorg&app=myapp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: myserver.com
Exhibit 2: Response (headers only, body ellided)
HTTP/1.1 200 OK
Cache-Control: max-age=10800
Content-Type: text/html
Last-Modified: Mon, 30 Apr 2012 19:21:29 GMT
Accept-Ranges: bytes
ETag: "afa0ca71627cd1:0"
Date: Sat, 02 Jun 2012 17:13:14 GMT
Content-Length: 53751
Server: myserver.com
The response headers from this server show the following—the
yellow highlighted value has been specifically configured (in IIS7) in the
applicationHost.config (or overridden in the
website.config) to expire in 3 hours (10800 seconds), the
Last-Modified value corresponds to the
Date Modified property of the file on the web server, while the
ETag value is calculated automatically by IIS and represents a hash of the file contents.
During the three-hour
max-age window the requested resource will be served from the client’s local cache (default IE setting). If the user presses “refresh” in their browser then the request is pushed back to the server. The headers in that case will be as follows—notice how the
green highlighted values match the
Last-Modified and
ETag headers above:
Exhibit 3: Request
GET https://myserver.com/test.htm?aa=myorg&app=myapp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Host: myserver.com
If-Modified-Since: Mon, 30 Apr 2012 19:21:29 GMT
If-None-Match: "afa0ca71627cd1:0"
Connection: Keep-Alive
Because the file hasn’t changed, the server response to that request consists of only headers—this is the complete response—and the client will continue to use the cached copy:
Exhibit 4: Response
HTTP/1.1 304 Not Modified
Cache-Control: max-age=10800
Last-Modified: Mon, 30 Apr 2012 19:21:29 GMT
Accept-Ranges: bytes
ETag: "afa0ca71627cd1:0"
Date: Sat, 02 Jun 2012 17:22:38 GMT
Server: myserver.com
Until the file changes the calculated
ETag and
Last-Modified values remain the same.At the end of three hours the client will once again make a request as in Exhibit 3, the response will be the same as in Exhibit 4, with the exception that the
Date header will be updated. The client will update its corresponding
Last Checked property on the cached entry (as seen in the
Temporary Internet Files pseudo-folder), and continue serving the original copy from the cache.
Applications that serve and cache static content of (usually small) files scale easily to hundreds, thousands and millions of users, even with fairly limited server-side hardware. Small increases in hardware and networking further scale the delivery of static content. Additionally non-SSL applications can make use of intermediate proxy servers that further distribute servicing requests, limiting direct connections to the original source servers.
Static Content: Overriding IIS7
This customer's website has been specifically tuned to cache static resource as served from the web farm. These settings include the setting the
Cache-Control header for static content. In IIS6 the
ETag suffix value had to be forced as well—IIS7 automatically forces the suffix to “:0”. Below is the command-line to set the
max-age value in IIS7 to three hours:
appcmd.exe set config -section:system.webServer/staticContent /clientCache.cacheControlMaxAge:"03:00:00" /commit:apphost
Because content from this customer is served under SSL the content is not cached by proxies—it is only cached on the client.
Caching Dynamic Content
Dynamic content are web pages or other files or streams of information that are generated (usually) at the time of access by a user or that change as a result of interaction with the user. Because the content changes, the headers issued with dynamic content typically disable caching. My customer's website emits a typical request-response pair, notice the
Cache-Control header in the response, and the absence of
ETag and
Last-Modified headers:
Exhibit 6: Request for dynamic content
GET http://myserver.com/test.aspx?aa=myorg&app=myapp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: myserver.com
Exhibit 7: Response
HTTP/1.1 200 OK
Cache-Control: private, no-store
Content-Type: text/html; charset=utf-8
Date: Sat, 02 Jun 2012 23:47:20 GMT
Content-Length: 4747
Server: myserver.com
Subsequent identical requests have similar responses—as there is no caching, the client round-trips to the server each time; code is executed to re-create the content and streamed back to the client for every time, even when the request is identical to prior requests.
Because there is no caching, and because code is executed for each requests, and the database is accessed for every request, the ability of this application to scale is limited to the capability of the web server to dynamically generate the file (“CPU”) and the responsiveness/capability of the backing database server and storage.
Dynamic Content: Overridden Header
Looing through the code, I see this website currently forces the cache-control header for nearly all dynamic content as noted above—this line of code is typical:
Response.Cache.SetNoStore();
Caching Proxies
As indicated above, non-SSL content can be stored by intermediary servers in between the original content provider (web server) and the user (see Caching Proxy for details). Because traffic is encrypted between this server and the user, caching proxies are not available for this applications. As this website is part of a webfarm that supports SSL offloading, there is an opportunity to utilize caching proxies on a reverse proxy router. Fortunately IIS7 with URL Rewrite and Application Request Routing (ARR) can act as an HTTP caching proxy, and can be trivially implemented on each web server.
The Future is Bright
I'll be working with this customer to improved their strategy for HTTP caching of dynamic content—upgrading to IIS7 and the rollout of ARR are enabling technologies that position this customer for improvements in scale and uptime in the near term. Separately it's time to being implementing a distributed memory cache for intermediate results. Both technologies involve caching, but the layer of the caching strategy is different, and solves the caching of partial results rather than complete results as discussed here.