Question: Am I Recalculating the Lower and Upper Limits Correctly?

Reader question:

“When you are recalculating the average because you see a shift, does the average from your previous calculation only take into account those values up to the new average? (not sure if this makes sense so I will ramble on a little bit more) Usually my average is calculated with 30 values…but if I see a shift before those 30 values appear, and I recalculate the average, should I only average the, lets say, 10 values I have collected? Or should I still have 30?”

My answer:

Yes, I would go back and calculate the average and limits to include just those 17 data points in that timeframe. That clearly seems like a “different system” than the timeframes before and after that shift.

I wouldn’t include a data point in more than one calculation over time. You don’t need 30 data points if the timeframe for that system was only 17. You’re on the right track in how you’re thinking through this. “Only take into account those values up to the new average” seems correct.

Question: Why Use 25 Data Points for the Average & Limits in the Process Behavior Charts Template?

A question:

I have a question about the formulas for columns C & E in your template. Is there a reason they’re set to only calculate the average on the first 25 data points as opposed to all them?

My answer: Thanks for your question. It’s somewhat arbitrary, but 25 data points is good for a baseline. You certainly can use more data points, but only if it’s part of the same predictable system (with no shifts in performance).

There are diminishing returns from using more than 25 data points in the baseline.

I’d encourage you to experiment with a data set that’s stable and predictable over time. Try 10, 15, 20, 25, 30 data points for the baseline average and limits and see how much the “voice of the process” changes (if at all).

Question: Does the System Need to be “In Control” to Get Started With Process Behavior Charts?

A question:

Hi Mark — long time reader, first time sending a question…. I love your recent emphasis on eliminating “red/green” analyses, and moving toward more statistically sound tools like control charts. I’ve seen your slides and webinars on creating them.

But one question has bugged me. When selecting data on which to calculate control limits, you suggest (in one deck I saw) at least 20 data points. Does the implicitly assume that the data represents a “stable system”, or at least a stable time within a larger system? If there’s any undiagnosed special variation in that initial data set, are the resulting upper and lower control limits really valid?

Put another way, what considerations should we use when selecting the initial data set? Thanks so much!

My reply:

Thanks for checking out my work and for your question. Great question.

The Process Behavior Chart, when we create it, will answer the question of “Is this a predictable system over the timeframe of the baseline data?”

It doesn’t make any assumptions about “the larger system” (before or after the baseline period).

Don Wheeler’s work has shown that the limits are still valid even if the baseline if not a predictable system. The presence of signals tells us that we have work to do to eliminate the causes of those signals, in order to create an improved and now-predictable system.

See this article — (myth 4)

“Myth Four: It has been said that the process must be operating in control before you can place the data on a process behavior chart”

As to considerations for choosing the initial data set… I generally would use the 20 most recent data points for the purposes of creating baseline average and limits. You can experiment with a data set to see if using 15 or 20 or 25 or 30 data points makes a big difference (it might not).

I’d be careful if there’s a known system change that occurred during that time frame (but the PBC might show that, unfortunately, that the change didn’t lead to a signal in the chart).

Hope that helps.

Question: Do I Need Normally Distributed Data to Use a Process Behavior Chart?

Question: To have a valid control chart (or “Process Behavior Chart”), don’t I need data that’s normally distributed? In other words, a “bell curve”?

Short Answer: No

No, your data doesn’t have to fit a normal distribution or any distribution. The PBC methodology is robust for real world data.

You can read more from Donald J. Wheeler, Ph.D. about this.

“…three-sigma limits will filter out virtually all of the routine variation regardless of the shape of the [distribution].”

Three sigma limits are calculated in the PBC methodology, using the formulas of:

  • Lower Limit = Average – 3 * MR-bar / 1.128
  • Upper Limit =  Average + 3 * MR-bar / 1.128

There’s no need to analyze the distribution of your data and there’s certainly no need to transform the data.

Wheeler also points out:

“… symmetric, three-sigma limits work with skewed data.”

For more, read this article by Donald J. Wheeler, Ph.D.:

Myths About Process Behavior Charts

“Myth One: It has been said that the data must be normally distributed before they can be placed on a process behavior chart.”

“Shewhart then went on to note that having a symmetric, bell-shaped histogram is neither a prerequisite for the use of a process behavior chart, nor is it a consequence of having a predictable process.”

Here is another Wheeler article:

The Normality Myth

In part:

“The oldest myth about process behavior charts is the myth that they require “normally distributed data.” If you have ever heard this idea, or if you have ever taught this to others, then you need to read this article.

While this myth dates back to 1935, and while Walter Shewhart exposed this idea as a myth in 1938, it continually reappears in various forms even today.”

I hope that helps.

Question: Can the Lower Limit Below Zero (Negative)?

Question from a reader:

As we know sometimes when we calculate the Natural Process Limits, the Lower Limit is negative.  In some measures, that’s not a practical value, like in the example below (where we set the limit to zero).  Therefore we made the Lower Limit = 0.

A couple things when looking at this:

  1. to eye the limits don’t look symmetrical – will these create confusion for people?
  2. application of the rules, particularly the rules that state if it’s closer the limit than the median.  With this artificial limit might we create a risk of a false signal?

Thoughts on situations like these?


Good question… it can create confusion to have asymmetrical limits… and it can cause confusion to have a negative limit for measures that can’t be negative (like infection rates, defects, falls, new customers, etc).

We have the same problem when there is an Upper Limit that’s calculated to be greater than 100% when the measure can’t be more than 100%.

I’ve done it both ways…

  • manually setting the limit to zero or
  • let the limit be negative and display as negative

The downside of tweaking the limit that way is that you could really only rely on “Rule 2” signals, looking for eight or more consecutive data points below the average. If there’s a data point right at zero, when the Lower Limit has been set to zero, itwouldn’t be a signal).

Here is an example with a negative limit shown (a count of weekly book sales, which cannot be a negative number):

My suggestion would be to try both methods (or survey people)) to get a sense of which is least confusing in general.

Hope that helps…

Question: Is it Appropriate to Plot Averages on a Process Behavior Chart?

I’ve gotten this question a few times recently, basically asking if it’s OK to plot metrics like these on a Process Behavior Chart (PBC):

  • Weekly average emergency patient waiting time
  • Monthly average lost sick days
  • Daily median waiting time for clinic patients

The concern gets expressed in terms of, “I was taught it was dangerous to take an average of averages.” See here for an example of that math dilemma, which is worth paying attention to in other circumstances. We don’t need to worry about it with PBCs.

In a PBC, we are plotting a central line that’s usually the average of the data points that we are analyzing. So, it’s an average of averages, but it’s less problematic in this context.

I asked Donald J. Wheeler, PhD about this and he replied:

“The advantage of the XmR chart is that the variation is characterized after the data have been transformed.
Thus, we can transform the data in any way that makes sense in the context, and then place these values on an XmR chart.

PBC is another name for the XmR chart method.

In his excellent textbook Making Sense of Data, Wheeler plots weekly averages with no warnings about that being problematic, as shown here:

I did an experiment with a made up data set.

The data consist of individual patient waiting times. The only thought I put into it was that waiting times might get longer as the day goes on, so I built in some of that.

Here are the X chart and the MR chart:

When I look at the PBC of individual waiting times, it’s “predictable” (or “in control”) with an average of 31.42. The limits are quite wide, but I don’t see any signals. I had 7 consecutive above the average (dumb luck in how I entered the data). I could see a scenario where, for example, afternoon waiting times are always longer than morning waiting times, so we could see daily “shifts” in the metric perhaps).

I then plotted the average waiting time for each of these 5 days (using a minimal number of data points to test these charts with admittedly minimal effort to start).

The average of the averages is 33.23. Not a huge difference.

The PBC for the daily averages is also predictable, with narrower limits, as I’d expect.

I think it’s fine to plot a series of averages. What matters most is how we interpret the PBCs — to avoid overreacting to every up and down in the data, for example.

Question: Why Are There Different Ways of Expressing the Formulas for the Limits?

Here’s a good question from a reader:

In your KaiNexus webinar on PBCs you use:

LCL = Mean – 3*(MR bar)/1.128

and

UCL = Mean + 3*(MR bar)/1.128

When 3.0/1.128~=2.66

I use: Ave(x) +/- (2.66 × Ave(mR))

Those are just different ways of expressing the same formula.

“Mean” and “Ave(x)” are two different notations for the average of the metric over the baseline period.

“MR bar” and “Ave(mR)” are two different notations for the average of the moving range values.

The formula with 2.66 is a bit simpler, mathematically.

But, using the 3 (divided by the statistical constant of 1.128) means that the 3 is used to represent the estimate of “3 sigma” limits.

3/1.128 = 2.66 like you said…. which simplifies the formula slightly but sometimes confuses people into thinking its “2.66 Sigma” limits. But, again, 3/1.128 helps give us the proper estimate of 3 sigma limits… it’s not 2.66 sigma.

I hope that clarifies things…

Read more about that “scaling factor” or statistical constant.

Question: Why are the Limits so Wide?

A reader shared a Process Behavior Chart with me and asked this:

“On the Falls_Harm tab, the limits seem a bit far away from the mean.”

As I wrote back, the Lower and Upper Natural Process Limits are calculated based on the amount of point-to-point variation that exists in the metric. We use the “Average Moving Range” as an input into the formulas:

Upper or Lower Limit = Average +/- 3 * Average Moving Range / 1.128

From his spreadsheet:

The limits “are what they are.” They are the “voice of the process,” if you will.

The PBC tells us that we have a “predictable system” that will continue fluctuating around an average of 8.6 falls per month. The lower and upper limits are 0.4 and 16.8, so we’d expect future months to fall within that range unless we can improve the system in some way.

The goal is 7 falls per month, but the PBC tells us that the system is not capable of hitting that goal each and every month. It will happen some months, but even the month of just 2 falls is not a “signal” of system improvement. It’s all random fluctuation, where a stable system is producing variable results.

We can work to improve the system in ways that:

  1. Reduce the average
  2. Reduce month-to-month variation — this would lead to narrower limits around the average

The best evidence of significant changes to the system would be:

  1. A month with zero falls
  2. Eight consecutive months with 8 falls or fewer
  3. 3 out of 4 (or 3 consecutive months) of 4 falls or fewer

Any of those events are very unlikely to be due to continued random fluctuation.

If I were in the reader’s shoes, I’d suggest using all 25 data points as baseline… which leads to this:

The overall average is lower when we use all 25 data points for the baseline… about 7 (the same as the goal).

The lower limit is calculated to be negative… so this now suggests that a month with zero falls is possible (if not unlikely) in the current predictable system.

Using 25 data points instead of 12 provides limits that are a bit more accurate… but the high level voice of the process is the same.

In either scenario, there is one Moving Range value that’s just near the Upper Range Limit:

That corresponds with a month-to-month drop from 13 falls to 3 falls. In the 25-data-point baseline, that one MR point is 10.1 and the Upper Range Limit is exactly 10.1.

If there’s ANY point there worth investigating, it might worth looking for a different in the system and work processes from November 2017 to December 2017, but I wouldn’t expect to find any “special cause” — especially with that being so far in the past. That jump from 13 to 3 probably represents the limit of how much month-to-month variation we’d expect to see with the random fluctuation in that predictable system.

Question: Why are There so Many Data Points Outside of the Limits?

I got a message from a reader of my book Measures of Success: React Less, Lead Better, Improve More.

The questions were:

“I entered data from Press Ganey patient satisfaction responses (12-month running totals) and since the numbers were so close to each other, the PBC showed a bunch of data points outside the upper/lower process limits.

When I changed the limits to +- 3 sigma, this issue disappeared.  Have you run into this issue before?  Can you help me to understand the difference and its impact on analysis using the PBC?”

Read more and see my answer and analysis here:

[Updated] Reader Question: Why are There so Many Data Points Outside of the Limits?