Wednesday, October 19, 2005

Essential performance forecasting, part 2: I/O

Craig Shallahamer

As I wrote in the first part of this series, forecasting Oracle performance is absolutely essential for every DBA to understand and perform. When performance begins to degrade, it's the DBA who hears about it, and it's the DBA who's supposed to fix it. Fortunately, low precision forecasting can be done very quickly and it is a great way to get started forecasting Oracle performance. This time, I'll focus on I/O performance forecasting.

The key metrics we want to forecast are utilization, queue time, and response time. With only these three metrics, as a DBA you can perform all sorts of low precision what-if scenarios. To derive the values, you essentially need 3 things:

a few simple formulas
some basic operating system statistics
some basic Oracle statistics
Modern I/O subsystems can be extremely difficult to forecast. Just as with Oracle, there is batching, caching, and a host of complex algorithms centered around optimizing performance. While these features are great for performance, they make intricate forecast models very complex. This may seem like a problem, but actually it's not. I have found that by keeping the level of detail and complexity at a consistently lower level (i.e., less detail), overall system I/O forecasts are typically more than adequate.

At a very basic level, an I/O subsystem is modeled differently than a CPU subsystem. A CPU subsystem routes all transactions into a single queue. All CPUs feed off of this single queue. This is why, with a mature operating system, any one CPU should be just as busy as the next. If you have had I/O performance problems you know the situation is very different.

In contrast to a CPU subsystem, each I/O device has its own queue. A transaction cannot simply be routed to any device. It must go specifically where the data it needs resides or where it has been told to write a specific piece of data. This is why each device needs its own queue and why some I/O queues are longer than others. This is also why balancing IO between devices is still the number one I/O subsystem bottleneck solution.

Today an I/O device can mean just about anything. It could be a single physical disk, a disk partition, a raid array, or some combination of these. The key when forecasting I/O is whatever you call a "device" is a device throughout the entire forecast. If a device is a 5 disk raid array, then make sure whenever a device is referenced, everyone involved understands the two devices are actually two raid arrays, each with five physical disks. If your device definition is consistant, you'll avoid many problems.

The forecasting formulas we'll use below assume the I/O load is perfectly distributed across all devices. While today's I/O subsystems do a fantastic job at distributing I/O activity, many DBAs do not. I have found that while an array's disk activity is nearly perfectly balanced, the activity from one array to the next may not be very well balanced. Hint: If an I/O device is not very active (utilization less than 5%), do not count it as a device. It is better to be conservative then aggressive when forecasting.

Before you are inundated with formulas, it's important to understand some definitions and recognize their symbols.

S : Time to service one workload unit. This is known as the service time or service demand. It is how long it takes a device to service a single transaction. For example, 1.5 seconds per transaction or 1.5 sec/trx. For simplicity sake, this value will be derived.

U : Utilization or device busyness. Commonly shown as a percentage and that's how it works in our formulas. For example, in the formula it should be something like 75% or 0.75, but not 75. This value can be gathered from both sar or iostat.

λ : Workload arrival rate. This is how many transactions enter the system per unit of time. For example, 150 transactions each second or 150 trx/sec. When working with Oracle, there are many possible statistics that can be used for the "transaction" arrival rate. For simplicity sake, this value will be derived and will refer to the general workload.

M : Number of devices. You can get this from the sar or iostat report. Be careful not to count both a disk and a disk's partition, resulting in a double count.

W : Wait time or more commonly called queue time. This is how long a transaction must wait before it begins to be serviced. For simplicity sake, this value will be derived.

R : Response time. This is how long it takes for a single transaction to complete. This includes both the service time and any queue/wait time. This will be gathered from the sar and iostat command (details below).

The IO formulas for calculating averages are as follows:

U = ( S λ ) / M (1)

R = S / (1 - U) (2)

R = S + W (3)

Before we dive into real-life examples, let's check these formulas out by doing some thought experiments.

Thought experiment 1. Using formula (1), if the arrival rate doubles, so will the utilization.

Thought experiment 2. Using formula (1), if we used slower disks, the service time (S) would increase, and therefore the utilization would also increase.

Thought experiment 3. Using formula (2), if we used faster devices, the service time would decrease, then the response time would also decrease.

Thought experiment 4. Using formula (2), if the device utilization decreased, the denominator would increase, which would cause the response time to decrease.

Thought experiment 5. Using formula (3), if we used a faster devices, service time would decrease, then the response time would also decrease.

While gathering I/O subsystem data is simple, the actual meaning of the data and how to apply it to our formulas is not so trivial. One approach, which is fine for low precision forecasting like this, is to gather only the response time, the utilization, and the number of devices. From these values, we can derive the arrival rate and service time.

Gathering device utilization is very simple as both sar –d and iostat clearly label these columns. However, gathering response time is not that simple. What iostat labels as service time is more appropriately the response time. Response time from sar –d is what you would expect, the service time plus the wait time. (For details, see "System Performance Tuning" by Musumeci and Loukides.)

There are many different ways we can forecast I/O subsystem activity. We could forecast at the device level or perhaps at the summary level. While detail level forecasting provides a plethora of numerical data, forecasting at the summary level allows us to easily communicate different configuration scenarios both numerically and graphically. For this article, I will present one way to consolidate all devices into a single representative device.

Capacity Planners like to call this process of consolidating or summarizing aggregation. While there are many ways to aggregate, the better the aggregation, the more precise and reliable your forecasts will be. For this example, our aggregation objective is to derive a single service time representing all devices and also the total system arrival rate. The total system arrival rate is trivial; it's just the sum of all the arrivals. Based upon the table below, the total arrival rate is 0.34 trx/ms.

To aggregate the service time, we should weight the average device service time based upon each respective device's arrival rate. But for simplicity and space, we will simply use the average service time across all devices. Based upon the table below, the average service time is 4.84 ms/trx.

Armed with the number of devices 5, the average service time 4.84 ms, and the system arrival rate of 0.34 trx/ms, we are ready to forecast!

Example 1. Let's say the I/O workload is expected to increase 20% each quarter and you need to now when the I/O subsystem will need to be upgraded. To answer the classic question, "When will we run out of gas?", we will forecast the average queue/wait time, response time, and utilization. The table below shows the raw forecast values.

Here's an example of the calculations with the arrival rate increased by 80% (arrival rate 0.71 trx/ms).

U = ( S λ ) / M = ( 4.84*0.71 ) / 5 = 0.69

R = S / (1 - U) = 4.84 / ( 1 – 0.69 ) = 15.46

W = R – S = 15.46 – 4.84 = 10.62

So what's the answer to our question? Technically speaking the system will operate with a 120% workload increase. But stating that in front of management is what I would call a "career decision." Looking closely at the forecasted utilization, the wait time, and the response time, you can see that once the utilization goes over 57%, the wait time skyrockets! Take a look at the resulting classic response time graph below.

No comments: