Abstract
Caching of content is an important consideration to maximizing the scalability of most computer applications. HTTP caching utilizes the technology built into the internet itself in web servers and user agents (browsers) to scale performance. Below I discuss the headers that control HTTP caching on an example website, beginning first with the serving of static content, and then with the serving of dynamic content. Special considerations of caching under SSL is noted. (For additional information see the List_of_HTTP_header_fields). I finish the discussion with recommendations for a particular website--based on my "huh, whot?!" experience after looking at their use of headers.Caching Static Content
Static resources consist of hard-coded files with extensions such as .htm, .js, .cs, .gif, .jpg, etc. When a resource is initially requested from a browser, the raw HTTP/1.1 request looks like this (as seen in Fiddler):Exhibit 1: Request
GET https://myserver.com/test.htm?aa=myorg&app=myapp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: myserver.com
Exhibit 2: Response (headers only, body ellided)
HTTP/1.1 200 OK
Cache-Control: max-age=10800
Content-Type: text/html
Last-Modified: Mon, 30 Apr 2012 19:21:29 GMT
Accept-Ranges: bytes
ETag: "afa0ca71627cd1:0"
Date: Sat, 02 Jun 2012 17:13:14 GMT
Content-Length: 53751
Server: myserver.com
The response headers from this server show the following—the yellow highlighted value has been specifically configured (in IIS7) in the applicationHost.config (or overridden in the website.config) to expire in 3 hours (10800 seconds), the Last-Modified value corresponds to the Date Modified property of the file on the web server, while the ETag value is calculated automatically by IIS and represents a hash of the file contents.
During the three-hour max-age window the requested resource will be served from the client’s local cache (default IE setting). If the user presses “refresh” in their browser then the request is pushed back to the server. The headers in that case will be as follows—notice how the green highlighted values match the Last-Modified and ETag headers above:
Exhibit 3: Request
GET https://myserver.com/test.htm?aa=myorg&app=myapp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Host: myserver.com
If-Modified-Since: Mon, 30 Apr 2012 19:21:29 GMT
If-None-Match: "afa0ca71627cd1:0"
Connection: Keep-Alive
Because the file hasn’t changed, the server response to that request consists of only headers—this is the complete response—and the client will continue to use the cached copy:
Exhibit 4: Response
HTTP/1.1 304 Not Modified
Cache-Control: max-age=10800
Last-Modified: Mon, 30 Apr 2012 19:21:29 GMT
Accept-Ranges: bytes
ETag: "afa0ca71627cd1:0"
Date: Sat, 02 Jun 2012 17:22:38 GMT
Server: myserver.com
Until the file changes the calculated ETag and Last-Modified values remain the same.At the end of three hours the client will once again make a request as in Exhibit 3, the response will be the same as in Exhibit 4, with the exception that the Date header will be updated. The client will update its corresponding Last Checked property on the cached entry (as seen in the Temporary Internet Files pseudo-folder), and continue serving the original copy from the cache.
Applications that serve and cache static content of (usually small) files scale easily to hundreds, thousands and millions of users, even with fairly limited server-side hardware. Small increases in hardware and networking further scale the delivery of static content. Additionally non-SSL applications can make use of intermediate proxy servers that further distribute servicing requests, limiting direct connections to the original source servers.
Static Content: Overriding IIS7
This customer's website has been specifically tuned to cache static resource as served from the web farm. These settings include the setting the Cache-Control header for static content. In IIS6 the ETag suffix value had to be forced as well—IIS7 automatically forces the suffix to “:0”. Below is the command-line to set the max-age value in IIS7 to three hours:
appcmd.exe set config -section:system.webServer/staticContent /clientCache.cacheControlMaxAge:"03:00:00" /commit:apphost
Because content from this customer is served under SSL the content is not cached by proxies—it is only cached on the client.
Caching Dynamic Content
Dynamic content are web pages or other files or streams of information that are generated (usually) at the time of access by a user or that change as a result of interaction with the user. Because the content changes, the headers issued with dynamic content typically disable caching. My customer's website emits a typical request-response pair, notice the Cache-Control header in the response, and the absence of ETag and Last-Modified headers:Exhibit 6: Request for dynamic content
GET http://myserver.com/test.aspx?aa=myorg&app=myapp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
Host: myserver.com
Exhibit 7: Response
HTTP/1.1 200 OK
Cache-Control: private, no-store
Content-Type: text/html; charset=utf-8
Date: Sat, 02 Jun 2012 23:47:20 GMT
Content-Length: 4747
Server: myserver.com
Subsequent identical requests have similar responses—as there is no caching, the client round-trips to the server each time; code is executed to re-create the content and streamed back to the client for every time, even when the request is identical to prior requests.
Because there is no caching, and because code is executed for each requests, and the database is accessed for every request, the ability of this application to scale is limited to the capability of the web server to dynamically generate the file (“CPU”) and the responsiveness/capability of the backing database server and storage.
Dynamic Content: Overridden Header
Looing through the code, I see this website currently forces the cache-control header for nearly all dynamic content as noted above—this line of code is typical:
Response.Cache.SetNoStore();