One of the web applications I work on has significant seasonal variability, with significantly more load experienced during the Holiday season than at other times of the year. Having just past the peak season, now is time to look back and check logs to spot areas to improve or refactor in anticipation of peak loads expected in the next year. This particular web farm (IIS on Windows 2003 server, and Windows 2008R2) has no centralized logging, and the files have to be fetched from another network. Plus, the individual log files can be quite large, so even on a relatively fast network it is time consuming and tedious to fetch the files. I've not spent much time closely scrutinizing web server logs, so my skills are limited and/or rusty. That's the big picture...let's see if I can break this down to a problem statement:
How to analyse and monitor website activity real-time, centralized, ad hoc, with triggers, for least effort and cost?Glad to have gotten that question off my chest...because i'm a small-picture guy i need to think about what I can do within my scope of control, with fewest change requirements to existing systems, no budget and limited time. Right, here's my list initial list of steps to begin surveying these logs:
- Grab a couple logs (copy them to my desktop machine)
- Inspect the logs (doh! they're too big to view in NotePad...huh, wot?!)
- Look for interesting patterns
- Come up a scalable solution for analysing tens-to-hundreds of log files
Visually scanning the hundreds of thousands of rows quickly reveals some of our clients are misbehaving and spamming us with unnecessary requests. Great to know...add that to the research-and-fix list.
Next, I know that Excel has a nice "text-to-columns" button in the Data ribbon that quickly parses the IIS log files into nice columns. And filtering is quickly accomplished using the Filter button. Now I can filter the list to look at particular source IP's, user agents, and query string values, etc. All of this is simple enough...and I'm ready to consider how to scale this up to gigabytes-per-day of log files:
- Identify all the servers and paths to the target website log files
- Write a batch file to copy those files across the network to my local machine
- Learn how to use Microsoft's Log Parser 2.2
- Learn how to use Lizard Lab's useful Log Parser Lizard GUI application
- Do transactions times increase with load? [no, or not by much]
- Do a significant number of transactions block/take long times to complete? [no, few appear to be blocking, those taking longer than a couple seconds appear to be byte-size related--that means clients with low bandwidth. Consider ways to minimize size of responses]
- Is our caching strategy (caching headers) working correctly? [thank goodness, yes]
Be cautious of using IIS logs time-taken field for time critical analysis.
ReplyDeleteTime-taken is based on network time so if you have a dialup or mobile connection they requests will take longer to be served.
Rovastar,
ReplyDeleteThanks for pointing that out. This KB article elaborates further (http://support.microsoft.com/kb/944884):
Note The value in the time-taken field does not include network time if one of the following conditions is true:•The response size is less than or equal to 2 KB, and the response size is from memory.
•TCP buffering is used. Applications that use HTTPAPI.dll can set the HTTP_SEND_RESPONSE_FLAG_BUFFER_DATA flag to enable TCP buffering on Windows Server 2003 Service Pack 1 and later. This allows the server to send all of the response data to the client without having to wait for the client’s corresponding acknowledgements.