Wednesday, January 4, 2012

IIS Log File Analysis Breadcrumbs


One of the web applications I work on has significant seasonal variability, with significantly more load experienced during the Holiday season than at other times of the year. Having just past the peak season, now is time to look back and check logs to spot areas to improve or refactor in anticipation of peak loads expected in the next year. This particular web farm (IIS on Windows 2003 server, and Windows 2008R2) has no centralized logging, and the files have to be fetched from another network. Plus, the individual log files can be quite large, so even on a relatively fast network it is time consuming and tedious to fetch the files. I've not spent much time closely scrutinizing web server logs, so my skills are limited and/or rusty. That's the big picture...let's see if I can break this down to a problem statement:
How to analyse and monitor website activity real-time, centralized, ad hoc, with triggers, for least effort and cost?
Glad to have gotten that question off my chest...because i'm a small-picture guy i need to think about what I can do within my scope of control, with fewest change requirements to existing systems, no budget and limited time.  Right, here's my list initial list of steps to begin surveying these logs:
  1. Grab a couple logs (copy them to my desktop machine)
  2. Inspect the logs (doh! they're too big to view in NotePad...huh, wot?!)
  3. Look for interesting patterns
  4. Come up a scalable solution for analysing tens-to-hundreds of log files
Luckily Microsoft's excellent Excel application is quite happy loading huge files.  I haven't tested Excel's limits lately, but it's got no problem at all loading the 5 gigabyte files I'm looking at today.  Right...so my desktop machine is running Win2008R2, has 24GB of RAM, and a nice fat SSD--all of this makes loading/parsing big fat files in Excel fairly pain free.
Visually scanning the hundreds of thousands of rows quickly reveals some of our clients are misbehaving and spamming us with unnecessary requests. Great to know...add that to the research-and-fix list.
Next, I know that Excel has a nice "text-to-columns" button in the Data ribbon that quickly parses the IIS log files into nice columns.  And filtering is quickly accomplished using the Filter button.  Now I can filter the list to look at particular source IP's, user agents, and query string values, etc. All of this is simple enough...and I'm ready to consider how to scale this up to gigabytes-per-day of log files:
  1. Identify all the servers and paths to the target website log files
  2. Write a batch file to copy those files across the network to my local machine
  3. Learn how to use Microsoft's Log Parser 2.2
  4. Learn how to use Lizard Lab's useful Log Parser Lizard GUI application
None of that took too long--and as long as I limit my queries to one day then the queries complete fairly quickly.  Questions I had were:
  1. Do transactions times increase with load? [no, or not by much]
  2. Do a significant number of transactions block/take long times to complete? [no, few appear to be blocking, those taking longer than a couple seconds appear to be byte-size related--that means clients with low bandwidth. Consider ways to minimize size of responses]
  3. Is our caching strategy (caching headers) working correctly? [thank goodness, yes]
Other than the misbehaving clients that I identified by visually scanning the log files, the parser tools made it easy to satisfy the high-level questions, so it was well worth the few hours required to get up to speed using them. I'll try and find time to share the specific Log Parser queries. Shucks, I'll have to rediscover them next year if I don't post them.  That should be motivation enough to extend this article at some point. 

2 comments:

  1. Be cautious of using IIS logs time-taken field for time critical analysis.

    Time-taken is based on network time so if you have a dialup or mobile connection they requests will take longer to be served.

    ReplyDelete
  2. Rovastar,
    Thanks for pointing that out. This KB article elaborates further (http://support.microsoft.com/kb/944884):

    Note The value in the time-taken field does not include network time if one of the following conditions is true:•The response size is less than or equal to 2 KB, and the response size is from memory.
    •TCP buffering is used. Applications that use HTTPAPI.dll can set the HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA flag to enable TCP buffering on Windows Server 2003 Service Pack 1 and later. This allows the server to send all of the response data to the client without having to wait for the client’s corresponding acknowledgements.

    ReplyDelete