Sunday, April 27, 2008

Some collected articles on monitoring server performance - original sources lost

To make sure that you have enough memory for IIS, you should monitor the following counters:
  • File Cache Hits
  • File Cache Hits %
  • File Cache Misses
  • File Cache Flushes

By monitoring the successful and failed hits, you can determine whether IIS has to rely on paging as opposed to going to cache. Keep in mind that the IIS file cache can use up to 4 GB of RAM for caching


Monitoring and Tuning Your Server


Next Topic

Suggestions for Optimizing Memory Usage

Servers running IIS 5.0, like other high-performance file servers, benefit from ample physical memory. Generally, the more memory you add, the more the servers use and the better they perform. IIS 5.0 requires a minimum of 64 MB of memory; at least 128 MB is recommended. If you are running memory-intensive applications, your server could require a much larger amount of memory to run optimally (for example, most of the servers that service the microsoft.com Web site have at least 512 MB of memory).

Adding RAM to your system is not the only option, however. Here are a few suggestions for optimizing memory performance without adding memory:

Improve Data Organization Keep related Web files on the same logical partitions of a disk. Keeping files together improves the performance of the File System Cache. Also, defragment your disks. Even well-organized files take more time to retrieve if they are fragmented.

Try Disk Mirroring or Striping The optimum configuration is to have enough physical memory to hold all static Web pages. However, if pages must be retrieved from disk, use mirroring or striping to make reading from disk sets faster. In some cases, a caching disk controller may help.

Replace or Convert CGI Applications CGI applications use much more processor time and memory space than equivalent ASP or ISAPI applications. For more information about ASP, ISAPI, and CGI applications, see Web Applications .

Enlarge Paging Files Add paging files and increase the size of the ones you have. The Windows 2000 operating system creates one paging file on the system disk, but you can also create a new paging file on each logical partition of each disk.

Retime the IIS Object Cache Consider lengthening the period that an unused object can remain in the cache (use the ObjectCacheTTL setting in the registry, as mentioned earlier in this section, to accomplish this).

Change the Balance of the File System Cache to the IIS 5.0 Working Set By default, servers running the Windows 2000 operating system are configured to give preference to the File System Cache over the working sets of processes when allocating memory space. Although IIS 5.0based servers benefit from a large File System Cache, the setting Maximize Throughput for File Sharing often causes the IIS 5.0 pageable code to be written to disk, which results in lengthy processing delays. To avoid these processing delays, set Server properties to the Maximize data throughput for network applications option.

To change Server properties

1.

On the desktop, open My Computer and select Network and Dial-up Connections .

2.

Right-click Local Area Connection and open its property sheet.

3.

Select File and Printer Sharingfor Microsoft Networks and select Properties .

4.

On the Server Optimization property sheet, select Maximize data throughput for network applications .

Limit Connections If your server doesnt have enough memory, limiting the number of connections on the server might help alleviate the shortage because some physical memory (about 10 KB per connection) is consumed by the data structures the system uses to keep track of connections.

To control the number of current connections

1.

In the IIS snap-in, right-click a site, then choose Properties and select the Web Site tab.

2.

Select the Limited To check box in the Connections panel. Type into the field the maximum number of connections you want to allow.

Eliminate Unnecessary Features You can also disable the performance boost for applications in the foreground. In addition, at times when you are not actively checking performance, you can disable performance-related logging in order to squeeze a bit more performance from your server.

Using PerfMon to Monitor the File System Cache

There are several counters in the Memory and Cache performance objects that you can use to monitor the size and effectiveness of the File System Cache. Table 5.3 lists these counters.

Table 5.3 Counters for Monitoring the File System Cache

Counter

Indicates

Memory\ Cache bytes

The size of the cache, in bytes. This counter displays the last observed value; it is not an average.

Memory\ Cache faults/sec

How often data sought in the File System Cache is not found there. The count includes faults for data found elsewhere in memory, as well as faults that require disk operations to retrieve the requested data.
This counter displays the number of faults, regardless of the number of pages retrieved in response to the fault.

Cache\ Copy Reads/sec

The frequency of reads from pages of the File System Cache that involve a memory copy of the data from the cache to the applications buffer. This is a method used by the LAN Redirector, the LAN Server (for small items), and the disk file systems.

Cache\ Fast Reads/sec

The frequency of reads from the File System Cache that bypass the installed file system and retrieve the data directly from the cache. Normally, file I/O requests invoke the appropriate file system to retrieve data from a file. However, this path permits direct retrieval of data from the cache without file system involvement, if the data is in the cache. Even if the data is not in the cache, one invocation of the file system is avoided.

Cache\ MDL Reads/sec

How often the system attempts to read large blocks of data from the cache.
Memory Descriptor List (MDL) Reads are read operations in which the system uses a list of the physical address of each page to help it find the page.
MDL Reads are often used to retrieve cached Web pages and FTP files.

Cache\ Pin Reads/sec

How often the system attempts to read recently accessed blocks of data from the cache. This counter is more accurate for ASP content than the MDL Reads/sec counter is.
Pin counters display reads of cache data that is held because it has just been read or written. They reflect cache data that is used repeatedly.

Cache\ MDL Read Hits %

How often attempts to find large sections of data in the cache are successful.
You can use the Cache\ MDL Read Hits % counter to calculate the percentage of MDL misses. Misses are likely to result in disk I/O.

Cache\ Pin Read Hits %

How often attempts to find recently accessed sections of data in the cache are successful. This counter is more accurate for ASP content than the MDL Read Hits % counter is.
You can use the Cache\ Pin Read Hits % counter to calculate the percentage of misses. Misses are likely to result in disk I/O. Pin counters display reads of cache data that is held because it has just been read or written. They reflect cache data that is used repeatedly.

Cache\ Data Maps/sec

How often pages are mapped into the cache from elsewhere in physical memory or from disk.
To measure the percentage of data maps from elsewhere in physical memory, use Cache\ Data Map Hits %. 100 minus the value of Cache\ Data Map Hits % is the percentage of data maps retrieved from disk.

Cache\ Read Aheads/sec

A measure of sequential reading from the cache. When the system detects sequential reading, it anticipates future reads and reads larger blocks of data. The read ahead counters are a useful measure of how effectively an application uses the cache.

Memory\ Page Faults/sec

Hard and soft faults in the working set of the process. This counter displays the number of faults, without regard for the number of pages retrieved in response to the fault.

Memory\ Page Reads/sec

Table 5.1 Counters for Monitoring the IIS 5.0 Working Set

Counter

Indicates

Computername\ Memory\ Available Bytes

The amount of physical memory remaining and available for use, in bytes. This counter displays the amount of memory not currently used by the system or by running processes. It displays the last observed value, not an average.
The operating system attempts to prevent this value from falling below 4 MB. It often trims the working sets of processes to maintain the 4 MB minimum available memory.

Computername\ Process\ Working Set: Inetinfo

Size of the working set of the process, in bytes. This counter displays the last observed value, not an average over time.

Computername\ Process\ Page Faults/sec: Inetinfo

Hard and soft faults in the working set of the process.

Computername\ Memory\ Page Faults/sec

Hard and soft faults for all working sets running on the system.

Computername\ Memory\ Page Reads/sec

Hard page faults. This counter displays the number of times the disk is read to satisfy page faults. It displays the number of read operations, regardless of the number of pages read in each operation.
A sustained rate of 5 reads/sec or more can indicate a memory shortage.

Computername\ Memory\ Pages Input/sec

One measure of the cost of page faults. This counter displays the number of pages read to satisfy page faults. One page is faulted at a time, but the system can read multiple pages ahead to prevent further hard faults.

Hard faults in the working sets of processes and in the File System Cache.

Table 5.2 Counters for Monitoring the IIS Object Cache

Counter

Indicates

Internet Information Services Global\ Cache Hits
Internet Information Services Global\ Cache Misses
Internet Information Services Global\ Cache Hits %

A measure of the efficiency of the IIS Object Cache. These counters demonstrate how often data sought in the IIS Object Cache is found.
Internet Information Services Global\ Cache Misses indicates how often the system must search elsewhere in memory or on disk to satisfy a request.
The first two of these counters (Cache Hits and Cache Misses) display totals since the service was started. Internet Information Services Global\ Cache Hits % displays an instantaneous value, not an average over time.

Internet Information Services Global\ Cache Flushes

How many times an object was deleted from the IIS Object Cache, either because it timed out, or because the object changed.

Internet Information Services Global\ Objects
Internet Information Services Global\ Directory Listings

The total number of objects currently stored in the IIS Object Cache.
Directory Listings is a subset of the Objects counter.
At any given time, the difference between the total number of objects and the number of Directory Listings is equal to the number of other objects stored in the cache. The Directory Listings counter is most important to servers running the FTP service.


Using PerfMon to Monitor Processor Activity

To monitor your servers processors, use PerfMon to log data from the counters listed in Table 5.4:

Table 5.4 Counters for Processor Activity Monitoring

Counter

Indicates

System\ Processor Queue Length

Threads waiting for processor time. If this value exceeds 2 for a sustained period of time, the processor may be bottlenecked.

Processor\ % Processor Time (Total instance)

The sum of processor use on each processor.

Processor\ % Processor Time

Processor use on each processor (#0, #1, and so on). In a multiprocessor server, this counter reveals unequal distribution of processor load.

Processor\ % Privileged Time

Proportion of the processors time spent in privileged mode. In the Windows 2000 operating system, only privileged mode code has direct access to hardware and to all memory in the system. The Windows 2000 Executive runs in privileged mode. Application threads can be switched to privileged mode to run operating system services.

Processor\ % User Time

Proportion of the processors time spent in user mode. User mode is the processor mode in which applications like IIS 5.0 services run.

Process\ % Processor Time

The processor use attributable to each processor, either for a particular process or for the total for all processes. (These are shown in the list of instances.)

1.1 Server

CPU:

Processor\% Processor Time\_Total - just a handy idea of how 'loaded' the server is at any given time.

Processor\% Processor Time\_Instance - just a handy idea of how 'loaded' any particular CPU is at any given time.

System\Processor Queue Length - number of threads queued and waiting for time on the CPU. The number of non-running ready threads in the processor queue. There is a single queue for processor time even on computers with multiple processors. If a computer has multiple processors, you need to divide this value by the number of processors servicing the workload. A sustained processor queue of less than 10 threads per processor is normally acceptable, depending on workload. Divide this by the number of CPUs in the system. If the answer is less than 10, the system is most likely running well.

Processor(_Total)\Interrupts/sec
An indirect indicator of the activity of hardware devices that generate interrupts, such as the system clock, the mouse, disk drivers, data communication lines, network interface cards, and other peripheral devices.

Memory:

Process (All processes)\Working Set
the set of recently touched memory pages for all processes. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before leaving main memory.
(more)

Memory\Pages/sec
the rate at which pages are read from or written to disk to resolve hard page faults. This is a primary indicator of the kinds of faults that cause system-wide delays. It includes pages retrieved to satisfy faults in the file system cache.

Memory\ Page Reads /sec – The rate of page faults, although it cannot be used in isolation, is should be less than 20% of the I/O throughput capacity.

Process\Working Set\_Total (or per specific process) - this basically shows how much memory is in the working set, or currently allocated RAM.

Memory\Available MBytes - amount of free RAM available to be used by new processes.

Memory\Pages Input/Sec - The best indicator of whether you are memory-bound, this counter shows the rate at which pages are read from disk to resolve hard page faults. In other words, the number of times the system was forced to retrieve something from disk that should have been in RAM. Occasional spikes are fine, but this should generally flat line at zero.

Memory\%Committed Bytes in Use – Sum of main memory and paging file size and reflects what % of that total is in use.

Disk:

Available Disk Space - Self explanatory

PhysicalDisk\% Disk Time
the percentage of elapsed time that the selected disk drive was busy servicing read or writes requests.

PhysicalDisk\Bytes/sec\_Total (or per process) - shows the number of bytes per second being written to or read from the disk.

PhysicalDisk\Current Disk Queue Length\driveletter - this is probably the single most valuable counter to watch. It shows how many read or write requests are waiting to execute to the disk. For single disks, it should idle at 2-3 or lower, with occasional spikes being okay. For RAID arrays, divide by the number of active spindles in the array; again try for 2-3 or lower. Because a shortage of RAM will tend to beat on the disk, look closely at the Memory\Pages Input/Sec counter if disk queue lengths are high.

\%Idle Time

Shows the percentage of elapsed time during the sample interval that the selected disk drive was idle

The recommended counter for measuring disk utilisation

\ Avg. Disk Queue Length

Shows the average number of both read and write requests that were queued for the selected disk during the sample interval

As a guide, a disk bottleneck may be identified when the average disk queue length is consistently greater than 2 * number physical disks and %Idle Time is consistently less than 20%

\ Avg. Disk sec/Transfer

Average response time across the disk subsystem in seconds

Includes all subsystem layers, e.g. device driver layer, I/O bus and I/O channel

Includes queuing time at these layers

Does not pinpoint where delays are occurring

\% Free Space

Shows the percentage of the total usable space on the selected disk that is free

As a guide for NTFS volumes, usable capacity is exhausted when this counter reaches 15%

\ Free Megabytes

Shows the unallocated space, in megabytes, on the disk

Should be employed with the previous counter in order to assess disk space capacity

Network

Network Interface\Bytes Total/Sec\nic name - Measures the number of bytes sent or received.

Network Interface\Output Queue Length\nic name – is the number of packets in queue waiting to be sent. If there is a sustained average of more than two packets in queue, you should be looking to resolve a network bottleneck.

Network Interface\Packets Received Errors\nic name - packet errors that kept the TCP/IP stack from delivering packets to higher layers. This value should stay low.

1.2 Network SNMP counters

Packet drop rates

Router CPU – CPU utilisation on router

Router memory – CPU memory utilisation

Latency – Measure of time taken for traffic to be sent and return from a given point. High latency may indicate congestion.

Errors- Measure of data packets dropped by network may indicate malfunctioning equipment.

Link Up/Link Down – Change in port status

Up time – Router availability

% availability – Availability over time

No comments: