Medical error quality blog - Archive

This blog contains selected rants and other items on all aspects of the effort to reduce medical errors.

Important Note - See my Word Press blog, as I have stopped posting here. The Word Press blog is current.

8/04/08 - CLSI EP22 EP23 Review - Update

hula hoop

EP22 was created as a means to use risk management to allow manufacturers to recommend the frequency of external quality control run by clinical laboratories. This was the so called option 4. Options 1-3 were part of the original CMS proposal to allow clinical laboratories to reduce the frequency of external quality control to once a month (provided certain conditions were met).

EP23 was the clinical laboratory follow on document to EP22.

Here’s my take on these two documents.


1.       Manufacturers won’t provide the information as suggested by EP22.  (This information consists of experiments to demonstrate the efficacy of internal control measures). It would be a lot of work (e.g., cost) and there’s no regulatory requirement to do so. Moreover, if this information were provided, then it is labeling which would require FDA to review it. It is not clear that FDA has accepted this review task.


Update on 8/4/08 - During a CLSI presentation at the AACC meeting in Washington, Alberto Gutierrez from the FDA gave a presentation. Afterwards, I asked him if FDA would review the material about internal control experiments that manufacturers might present as part of the package insert. He said that FDA would review this material - but from what was said it seemed that the review would be superficial and that only egregious problems would be flagged by the FDA.

2.       Clinical laboratory staff does not have the expertise to review this information, were it provided. This does not mean that clinical laboratory staff is incapable of reviewing it – they could acquire the expertise – it just seems unlikely.


3.       Should manufacturers provide this information and clinical laboratory staff review it, there would be no benefit with respect to improving QC. This is illustrated by an example in EP22 where the failure mode of “incorrect results due to low volume sample” is examined. After presenting the results of an experiment to show how an internal system control works, the user control measure is to “ensure that adequate volume of sample is presented to instrument.” But clinical laboratory staff would (or should) do this anyway. They don’t need EP22 and EP23 to know that one should follow the manufacturer’s instructions and to refrain from doing something stupid.


In clinical chemistry, risk management is “in.” But there are signs that its popularity is already starting to wane. This is unfortunate, as there is a great opportunity to use risk management tools to reduce both the risk and occurrence of laboratory errors. But one must focus not just on potential system errors, as EP22 and EP23 do, but on human errors as well.

6/17/08 - Reading Quality Digest can be dangerous to your health

right tool for job

In the June 2008 issue of Quality digest, there is an article by Jay Arthur entitled “Statistical Process Control for Healthcare”. After the usual boilerplate type of introduction, something caught my eye; namely, the so called good news that there is “inexpensive Excel based software to create control charts … .“ This made me go to the end of the article where sure enough the author just happens to sell such software. This may have been a good place for the author to introduce the term bias.

To understand a more serious problem with this article, consider a hospital process; namely analyzing blood glucose in a hospital laboratory. Because such a process has error, quality control samples are run. Say such a control has a target value of 100 mg/dL.  The values of the quality control samples are plotted by SPC software and rules are formulated. If the glucose control value is too high or too low, the process is said to be out of control and action is taken.

Now,  Mr. Arthur is trying to push SPC software not for a process but for errors in the process. For example, he uses the infection rate in a hospital. But the infection rate error is not a process that one wants to control – of course one does not want it to become worse - but its target is zero.

A more useful example than the hypothetical one provided by Mr. Arthur was published recently (1). Here, the authors were faced with an undesirable hospital infection error rate and set out to observe where errors occurred in the process of placing central lines. They then provided control measures and continued to track the error rate, which was reduced to zero. This is not SPC! It is much more like a FRACAS (Failure Reporting And Corrective Action System).

In another part of the article, Mr. Arthur suggests that “never events” can be tracked by SPC. Never events – a list of 28 such events have been put forth by the National Quality Forum – have as implied, targets of zero. Such an event is wrong site surgery. One should use something like FMEA (Failure Mode Effects Analysis) to reduce the risk of such events. It is silly to suggest SPC software for never events.


1.       An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. Pronovost P, Needham D, Berenholtz S, Sinopoli D, Chu H, Cosgrove S, Sexton B, Hyzy R, Welsh R, Roth G, Bander J, Kepros J, Goeschel C N Engl J Med 355:2725, December 28, 2006

6/5/08 - Westgard Quality Control Workshop – Part 3


I just returned from the Westgard quality Control Workshop, where I was a speaker and have a few blogs worth of comments – this is the third.

EQC – Equivalent Quality Control

This is the CMS proposal (1) to allow clinical laboratories to reduce the frequency of quality control from twice per day to once a month given that 10 days of running QC shows no values that are out (and given some other conditions).

Let’s try to construct a hypothesis to base such a recommendation. For example:

given any possible error condition that could be detected by external quality control, internal quality control would detect the same error 100% of the time.

This is about the best I can think of, which would result in the recommendation:

Stop running external quality control.

What does running 10 days of external QC with no out of control results show? The answer is nothing. This is because one can assume that during these 10 days, there were either no errors or if there were errors, external QC was not able to detect them. (It is possible that internal QC detected errors during these 10 days). In fact, this experiment is guaranteed to be meaningless. To see this, one must realize that internal QC is always “on” and precedes external QC. So to see if external QC is redundant to internal QC for an error, would mean that internal QC would detect the error and either shut down the system or prevent the result – this being the external QC sample – from being reported. However, one can get different information by running external QC for a longer period because if internal QC misses an error but external QC detects the error, then one has proved that external QC is not redundant to internal QC. This was shown to me (2) as out of control results for a range of assays ranging from 1 to 10 per year, where these were real problems. Since controls are run twice per day, the number of affected patients samples is larger.

So a lab that reduces external QC to once a month is risking an even larger number of bad patient results which is made worse since the clinician has probably acted on the erroneous results.

Rather than do the experiment suggested by CMS, a lab can simply examine its external QC records for a sufficient length of time.


1.       To review, see: See

2.       Personal communication from Greg Miller of Virginia Commonwealth University


6/5/08 - Westgard Quality Control Workshop – Part 2


I just returned from the Westgard quality Control Workshop, where I was a speaker and have a few blogs worth of comments – this is the second.

How does one determine acceptable risk

This was one of the questions asked by a participant – are there any guidelines? I also commented recently, that in spite of all of talk about risk management and putting in place control measures until one has acceptable risk, no one knows what acceptable risk means. Here’s some more thoughts on this.

There are different risks (1). These can be enumerated. These include:

perception – complaints from either hospital or non hospital staff

performance – traditional quality, including errors that can affect patient safety

financial – errors that threaten the financial health of the service including lawsuits

regulatory – errors that threaten the accreditation status of the service

So first, one must say which risk one has in mind. One can envision an acceptable regulatory risk (we always pass inspections) but an unacceptable patient safety risk.  Note also, that the risks are not necessarily unique. One can have a patient safety failure with or without a lawsuit.

Assume the risk in question is the performance risk and specifically about patient safety. The Cadillac version of assessing risk would be to perform a quantitative fault tree and arrive at a numerical probability of patient risk. This is unlikely and one would probably have a qualitative assessment. Whether the assessment is quantitative or qualitative, this still hasn’t answered the acceptability question.

The problem is there is no easy answer to this question. If one had unlimited funds, one could lower the risk to whatever level was desired but funds are limited by the economic healthcare policy of the laboratory’s country (2). So one answer of acceptable risk is how this economic policy is translated into regulations. (e.g., one follows existing regulations and passes inspections). Yet, this is only a quasi legal way of stating acceptable risk.


I suggest that risk be assessed by traditional means (FMEA, fault tree) which includes a Pareto chart or table to rank the risks. Then, if one optimizes the money that one has in implementing control measures (mitigations) by a portfolio type means, then one has an acceptable risk under the imposed financial constraints.

portfolio analysis


1.       Managing risk in hospitals using integrated Fault Trees / FMECAs. Jan S. Krouwer, AACC Press, Washington DC, 2004.

2.       See

6/5/08 - Westgard Quality Control Workshop – Part 1


I just returned from the Westgard quality Control Workshop, where I was a speaker and have a few blogs worth of comments – this is the first.

What’s Missing from Clinical Laboratory Inspections

At the Westgard Workshop, most of the participants were from clinical laboratories and I was impressed with how smart these people are. I also got a sense of a tremendous regulatory burden. From the CAP CD, I obtained at the Workshop:

      The mission statement of the CAP Laboratory Accreditation Program is:

“The CAP Laboratory Accreditation Program improves patient safety by advancing the quality of pathology and laboratory services through education and standard setting, and ensuring laboratories meet or exceed regulatory requirements.”

I have had mixed feelings about inspections that certify quality and have previously reported my experience with an industry quality program – ISO 9001 (1).

Here’s my assessment of clinical laboratory inspections to certify laboratories. It would seem that the premise of these inspections is to ensure that specific policies and procedures are in place and executed as proven largely by documentation, which guarantees high quality. So what’s missing? As far as I can tell – and it is with great difficult to read through these materials – that there is no measurement of error rates. Without such measurements, quality is unknown.


The regulatory bodies would describe a list of errors and their associated severities. The severities would be given numerical values such as the VA hospital system which uses 1-4. Every clinical laboratory would record each error (failure mode) that occurs in their laboratory, its severity, and its frequency (default frequency is of course 1).  They would multiply frequency x severity for each unique error (failure mode), add this up and get a rate by dividing by the number of tests reported per year.

Failing to count errors would be a serious violation.

This would be the start of a new premise for the regulatory bodies. Measure quality – if it’s unacceptable, the clinical laboratory would suggest and implement process changes. It’s a simple closed loop process. With emphasis on measurement, reliance on documentation should decrease and inspections should be less burdensome.

closed loop


1.       Krouwer JS. ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred. Qual. Assur. 2004;9:39-43

5/4/08 - Acceptable Risk – Easy to talk about, but no one knows what it means


Standards about risk management always talk about “acceptable risk.” This is a qualitative term. Unfortunately, for much of healthcare there is no matching quantitative assessment or goal. Consider two examples.



Precision is acceptable

CV is 8% and goal is 10%

Residual risk is acceptable




It is possible to estimate the probability of a severe adverse event and to have an associated goal for such a probability but no one in healthcare does this. So one will see things like, “with this mitigation we have reduced the risk of the adverse event to an acceptable level” but the reality is no one knows what this really means.

5/3/08 - Never Events – Never a meaningful goal


This has been considerable discussion about the National Quality Forum’s  so called 28 never events (1). Here are some problems with this concept.

Never is a poor goal – Adverse events can be considered within a risk management program. Risk is the combination of two items – severity and probability of occurrence. By their selection, one can gather than severity is high for the 28 events. However, probability can never be zero. Consider a simple example. The likelihood of performing wrong site surgery is X. One performs a double check to prevent wrong site surgery. Now the likelihood is 0.0001X. But the double check can fail. So one can perform a triple check. Now the probability is much lower but it is still not zero. And so on. Working with probabilities (as in fault trees), is one way to see that probabilities are never zero, nor is risk.

28 goals are too many – If one wants to manage anything, one needs a limited number of goals. There is no reason why one can’t combine events to give a single goal – the overall risk of an adverse event.

“largely preventable” is not the same as preventable – In the NQF site, the never events are said to be largely preventable. The problems with this are obvious.


1.       See

3/19/08 Alternatives to Six Sigma


This entry continues where the entry (Six Sigma can be dangerous to your health) left off. Given the problems with six sigma, what are some solutions to estimate the quality of an assay, using hCG as an example assay.

First, when total analytical error is calculated to estimate the values in zones A-C in an error grid, one should use conservative methods such as the empirical distributions suggested by the CLSI EP21A method, and where no data are deleted. Let’s say a clinical laboratory has done this evaluation with 40 patient samples for a new and reference method and found no results in zone C for an hCG assay. What can one conclude? Although there are 0% of the values in zone C, the 95% confidence interval extends to 7.2%. This means that for every million hCG results performed, up to 72,000 results could be in zone C. This is not very comforting and these types of evaluations don’t prove much, although one knows that the 7.2% rate is unlikely (because if this rate to occurred, it would be noticed).

FMEA is an approach that will provide an answer to the quality question but in its complete form, it requires considerable effort. To complete a FMEA analysis, one has to postulate all possible reasons why a result could fall into zone C. To get an idea of what is involved, take two possible failure modes, HAMA interference and a patient sample mix-up.

HAMA interference – To estimate the likelihood of a zone C result from HAMA interference, one needs to know the level of HAMA that will cause erroneous results in the assay and the probability of such levels in the population being sampled. Contacting the manufacturer might give one the level of HAMA to watch out for – I am not familiar with data about the distribution of HAMA in patient samples. Yet, one knows HAMA interference occurs (Clinical Chemistry. 2001;47:1332-1333).  

Patient sample mix-up – There are some data for patient sample mix-ups (Archives of Pathology and Laboratory Medicine: Vol. 130, No. 11, pp. 1662–1668). However, it seems that these cases are caught within the laboratory. One would need to determine how many cases actually are not caught within the laboratory. One could then model the likelihood of a zone C result by sampling from the empirical distribution of hCG results that are observed on the lab to see the likelihood of a mix-up causing a zone C result.

Because there are so many existing data in a clinical laboratory, one may also have the opportunity to perform FRACAS types of analyses. That is, in addition to modeling probabilities, once could use existing data to count actual failures.

One must then continue:

  • with each other possible failure mode, calculate the probability of zone C results
  • calculate the overall probability of zone C results (from all failure modes) and determine if that risk is acceptable
    • special software is typically used to perform these calculations
  • construct a Pareto table if the overall probability of zone C results is too high and
  • propose control measures to lower the overall risk to an acceptable level
    • the control measures must of course be affordable

At this point, one can get the idea that this level of effort is out of reach for clinical laboratories since the level of expertise and work need just to estimate the likelihood of a zone C result is huge. Even if a clinical laboratory could perform this task, it makes no sense to require every clinical laboratory to do so.

One possibility is to have a standards group tackle such a task., although this too has limitations as was shown for a (universal) control measure to prevent wrong site surgery.

Another possibility is to perhaps leverage resources beyond the clinical laboratory. For example, one could insist that before treatment for trophoblastic carcinoma, an hCG result should be confirmed either by performing a reference assay or perhaps by treating the sample and rerunning it. This requires an interaction between the clinical laboratory and clinicians.

So there are no easy answers to preventing severe, low frequency failures, (that cause patient harm) but as discussed before, coming up with a sigma estimate for an hCG assay, is also not the answer. Nor is doing nothing.

3/15/08 - Jan gets an award


I recently spoke at the Quality in the Spotlight conference in Antwerp, Belgium and gratefully acknowledge being awarded the Westgard Quality Award. This award was presented by Jim Westgard himself. The Quality in the Spotlight conference is a two day conference in Antwerp, devoted each year to a quality theme. This year’s theme was quality tools. I spoke about FMEA on each of the two days. It wasn’t until the second day of the conference that I realized that some of the other presentations were bothering me – perhaps I had a case of brain jetlag. This is an interactive conference so had I been quicker I would have presented my concerns to the speakers. But this did not happen so my concerns are in the previous entry to this blog. Prof. Dr. Jean-Claude Libeer, who founded the conference and also spoke about me with respect to the award, said that it was my blog which impressed people. So perhaps my previous entry could be taken as an acceptance speech.

On the second day, per instructions, I attempted to do a “workshop”. This is in quotes because I had to involve the audience but was only given one hour. Had I to do this again, I would have given an award to one lady, who answered some of the questions I posed to the audience. One example – name a case of at risk behavior that you have experienced. Answer, a technician, who had trouble getting a barcode on a patient sample to register, scanned the barcode from another patient. So perhaps this is also an illustration of the need to perform a FMEA on a control measure (what can go wrong with implementing barcodes).

Another highlight of my trip was spending three days in Amsterdam and hearing that in spite of frequent mistakes, my Dutch is begrijpelijk (understandable).

3/13/08 - Six Sigma can be dangerous to your health


At a recent conference, there were several presentations about six sigma for clinical laboratory assays. To recall, sigma is calculated as Sigma = (TEa – bias)/CV where

TEa is the total allowable error
Bias is the inaccuracy of the measurement procedure
CV is the imprecision of the measurement procedure

The problem with six sigma is that’s it taken as a sole measure of quality – that is, if you have a high sigma value (greater than 6) then your assay is assured of high quality. The rest of this entry explains why this is wrong.

First, TEa (total allowable error) is often specially called out as medically acceptable limits. One need only read the ISO 15197 standard for glucose to see this connection. I have previously commented about this standard. The implied meaning of medically acceptable limits in shown in below.

figure 1

This is simply not the real world. Taguchi long ago specified a more realistic quadratic model of worth, which is shown below, superimposed on the original figure but in green.

figure 2

Thus points A and B are similar in bias and are similar in causing (or not causing) medically unacceptable results. It is also likely then that if point A is ok, then so is point B. It is only when one gets far away from these limits that one is almost certain to have results that can cause harm. This is shown below with point C.

figure 3

This can also be expressed as an error grid such as those for glucose. So the “sigma” calculations really only express the zone A region (grey) where 95% or more of the results should be. Zone B (white) can contain up to 5% of the results and zone C (dark grey) should contain no results. The error grid contains more information since each set of limits is different for each concentration. An error grid is shown below, taken from FDA guidance. In the guidance, WM is the test method and CM is the reference method. (In the document WM=waiver method and CM=comparative method).

figure 4

So the problem is that sigma only accounts for zone A, but patients are harmed by values in zone C!

Now one might argue that there is nevertheless a relationship between sigma and the three zones, meaning that high sigma values are unlikely to have values in zone C and low sigma values are likely to have such values. This is also not true. Here is why.

1.       Often incorrect models are used to asses total error – see here.

2.       In estimating bias and CV, outliers – the very values that cause harm - are often thrown out.

3.       All sigma calculations are based on the assumption that the data are normally distributed. Most data do not fulfill this criterion. This means that often there are more frequent values in the tails of the distribution (again, this is zone C) than expected by calculations based on the normal distribution

4.       And maybe the biggest reason of all, values can occur in zone C that have nothing to do with the analytical process. If there is a patient sample mix-up, this can occur and these values are excluded (when detected) from virtually all analytical evaluations.

Think of it this way. If a loved one suffered medical harm, due in part to an erroneous lab result, would it make you feel better to know that the assay had a high sigma value? And would you associate that assay with quality?

I will comment on how one can address these issues in a future entry.

3/3/08 - At risk behavior


I am involved in risk management standards for clinical laboratories, where the focus has been on understanding how manufacturer’s devices can fail and how a clinical laboratory can put in place control measures to prevent these failures from causing harm.

My concern with these standards is that there is not enough emphasis given to the clinical laboratories own sources of error – its people. Among problems related to human errors are cognitive errors, non cognitive errors, reckless behavior, and at risk behavior – the topic of this entry.

At risk behavior is behavior that increases risk where risk is not recognized, or is mistakenly believed to be justified. Anyone who manages people must have had the experience by hearing  (perhaps second hand) “I don’t think that’s necessary and I’m not going to do it.” And of course, parents are familiar with at risk behavior practiced by their children.

An example of healthcare at risk behavior is reusing syringes. This occurred recently at an endoscopy clinic in Nevada and has affected up to 40,000 people. In reading the patient empowerment blog, one learns about other cases of reused syringes. In a case in Long Island, the physician reused syringes only for the same patient, but the syringes were used with multi-dose vials and these vials were used across patients.

In the recent case of reducing central line infections, Dr. Peter Pronovost observed that of the steps associating with placing a central line, in a third of patients, doctors skipped at least one step. Whereas, some of this could be attributed to non cognitive errors (slips), it could also be associated with at risk behavior. The control measure that worked here, was a double check step, whereby another healthcare provider would check to make sure each step was followed.

Discovering at risk behavior may not be easy, hence it needs to be on one radar’s screen.

2/14/08 Should one focus on a failure in a procedure or the outcome of such a failure?


Withholding payment for adverse events is a financial incentive to promote patient safety. Whether this incentive makes financial sense is something I will comment on later or perhaps not at all. For now, my comments are about the policy as it recently appeared (1).



The authors suggest the following criteria to withhold payment.

·         Evidence demonstrates that the bulk of the adverse events in question can be prevented by widespread adoption of achievable practices.

·         The events can be measured accurately, in a way that is auditable.

·         The events resulted in clinically significant patient harm.

·         It is possible, through chart review, to differentiate the adverse events that began in the hospital from those that were “present on admission” (POA).

The problem is with the third bullet and can perhaps be illustrated by the following figure.


In this figure FMEA events are shown by the dashed line.  The red dashed line is before FMEA. The green dashed line shows that after a successful FMEA, risk of failures has been reduced. FRACAS events are shown by the solid lines. The green line shows a reduction in the failure rate after FRACAS.

Keep in mind, for the dashed lines (FMEA), no failures have occurred, while for the solid lines, failures have occurred.

Now the policy defines a failure as an adverse patient outcome. One can view outcomes as the end of  an event cascade as in the next figure.

error cascade

Assume that event C is an adverse patient outcome. According to the policy, payment is withheld only when event C is observed. In the first figure, the relevant concern area is shown by the ellipse as it is assumed that these are all high severity (severe patient harm) events.

This policy therefore excludes the following cases:

All FMEA events. That is, a procedure with a correctable high risk will be excluded from this policy because the event has not yet occurred. Considered the case of the Duke transplant error (2), before it happened. One can infer that this was a high risk procedure that would have benefited from a FMEA. In essence, this policy waits for disasters to happen.

All near miss events. Consider the case of the patient who had an MRI (3). Blood pressure monitor tubing had to be disconnected for the MRI. After the procedure, the tubing was incorrectly connected to an IV line. Before air was delivered from the automated blood pressure monitor, a family member noticed that things didn’t look right and contacted a nurse, who corrected the problem. Thus, there was no adverse event.

All defective procedures that don’t result in severe patient harm. Consider a healthcare worker who violates hospital policy (at risk behavior according to Marx (4)), which results in a patient fall. In this case, the fall results in a minor injury.  This is an important case because the policy fails to properly reflect risk management principles.

For a procedure that has a problem (e.g., a failed event), one has to classify the severity of the failed event and its probability (FMEA) or frequency of occurrence (FRACAS). The severity is classified not necessarily by the failed event but by the effect of the failed event. The effect is itself an event and can be a spectrum of severities. In the case of a patient fall, there is a distribution of harm associated with the fall event – some falls will result in severe harm, some will result in minor harm. Traditionally, in risk management, if severe harm is possible, then severity is associated with severe harm, even if the probability of severe harm is low. In this sense, severity is equated with potential outcome, regardless of whether that specific outcome has occurred.

One also has to classify the probability (FMEA) or of frequency of occurrence the event (FRACAS). Here, assume FMEA, one could choose between the probability of the failed event or the probability of the effect of the event (the adverse outcome). It is recommended to use the probability of the failed event, not the probability of the effect of the event. This is because one usually has control over the failed event and does not have control over the effect of the event.

Example: If a clinical laboratory provides a clinician with an erroneous result and the effect of that could be patient harm, the event is classified as severe. The probability is the probability of erroneous result, not the probability of patient harm, because patient harm is outside of control of the clinical laboratory (the clinician might not act on the result, might suspect it is erroneous and request it to be repeated, and so on).


This policy will miss many quality issues and deviates from traditional risk management.


  1. Wachter RM ,Foster NE and Dudley RA Medicare’s Decision to Withhold Payment for Hospital Errors: The Devil Is in the Details The Joint Commission Journal on Quality and Patient Safety 2008;34: 116-123, see
  2. See
  3. See
  4. Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives

1/24/08 Software Verification and Validation

SW bug

In spending two sessions with groups of people who verify and validate medical device software, I got the impression that most effort is spent on testing code (to the requirements that exist). In part, I based this assessment on the amount of questions (e.g., interest by the audience) when code testing was discussed vs. examining requirements. Yet, in reviewing recalls, and my experience in the IVD industry, I suspect that that most errors are caused by wrong requirements (see figure).



code requirements

This makes me recall some definitions.

Bug – A coding error that prevents the software from meeting its stated requirement. A divide by zero error is a bug, but if the denominator can never be zero, this bug will never be a failure. Never be zero means the value can never be zero without a code logic statement such as If X <> 0, then … If the code logic statement were present, there would be no divide by zero bug.

Failure – Any deviation from customer expectations. This rather liberal statement is similar to the general definition of quality by ASQ. Each failure must be evaluated by the software / product development team to decide whether they agree and of course deviations have non software causes.

Example – A home glucose meter produces a value over 500 mg/dL. The meter displays ERR1. This is a requirement errors. It is known the value is too high ( it could be 501 or 1,000). The meter should say something like HIGH.

1/4/08 FMEA vs. FRACAS


I have previously compared FMEA and FRACAS, here. Another simple difference is:

(Successful) FMEA reduces risk.

(Successful) FRACAS reduces failure rates.

Now, one often hears about successful FMEAs. In my experience, these are not FMEAs, they are examples of FRACAS. An example is here. How can one tell that this is FRACAS and not FMEA. It’s simple - what is described is the reduction of a too high failure rate to a lower rate. With FMEA, the failure rate is zero – the event has not happened. What one does is to reduce the risk of this potential failure, from some amount to a lower amount. This is perhaps one of the reasons, one does not hear too much about FMEA successes. As I said before, to say that something that has never happened is now even less likely to happen (due to FMEA) just isn’t too exciting.

To reduce failure rates is a good thing and it is not a big deal to call this FMEA when it is FRACAS. However, it is simple to use the correct terms and if one doesn’t one might wind up neglecting to perform FMEA when it's needed.

1/1/2008 A Different Animal


I have spent my career in industry in R&D in a quality role. As I continue to interact with people that deal with quality in the in vitro diagnostics industry, I get the impression that most of these people are not from R&D but rather from regulatory affairs. What’s the difference? My perception is that regulatory affairs professionals focus more on compliance – I have focused on measuring things. Compliance is often assessed through audits with documentation a large part of audits. Measuring things forces activities to focus on improving the metric of interest. Documentation is of less importance.

What’s another difference? Whenever I write an article for publication on quality, it’s reviewed by regulatory affairs professionals. I can tell by the comments (e.g., they disagree with most of what I say). R&D people agree with me.


12/9/2007 - Frequency of QC in the clinical laboratory


Kent Dooley has written an interesting essay, which is here. One of the points he makes is that not all clinical laboratory errors result in patient harm because clinicians will not always act on the erroneous result. So if an assay result doesn’t agree with other clinical data, the clinician may suspect the result might be wrong and ask to have it repeated. Dooley suggests that the minimum QC frequency should follow the time course for the likelihood of a clinician requesting a repeat sample, so that upon repeat, if the result had been in error, the new result will be correct (because now QC has been run).

Now, I am unencumbered by the knowledge and experience of working in a lab but my view of things is somewhat different. It seems to me that there are several error/detection/recovery possibilities as shown in the figure below.

Error Detection Recovery

The problem of waiting for a clinician (of for that matter a patient) to question a result, before running QC is that it doesn’t take advantage of the purpose of QC, which is shown below.


That is, one runs the assay and at some time QC. If the QC is ok, then the results are released to the clinician. If not, one troubleshoots the assay including possibly rerunning patient samples. Using this scheme, QC frequency should not be determined by a retest time course but rather by the turn-around-time requirement for the assay.

Now if the clinician requests a the assay to be repeated, and QC had already been run, it is unlikely that running a second QC will detect anything. QC has limitations in its ability to detect error (see figure below). Random biases and random patient interferences will not be detected by QC.

QC properties

This figure came from previous considerations about equivalent QC, which are here, and here.

Besides suspecting assay error, many assay results are repeated because a condition is being monitored. Delta checks are a type of QC that is performed on these samples to determine whether the difference between results is expected. Exactly how the clinical laboratory could act on the knowledge that the clinician suspects that something is wrong with the assay result is a topic for clinical laboratorians to answer.

12/07/2007 - Central lines and FRACAS


One hears of FRACAS success stories (like the one below) and FMEA failure stories (like the wrong blood type organs transplanted at Duke). A reason one doesn’t hear of FMEA success stories is that to say that something that has never happened is now even less likely to happen (due to FMEA) just isn’t too exciting. FMEA success stories are often not cases of FMEA, they are FRACAS, since rate improvements are discussed. FRACAS failures – we tried something, it didn’t work – are not very interesting.

A recent article in The New Yorker (1) provides an example of a FRACAS success story.

In the article, there is no mention of FRACAS but many of the steps were followed. The issue was a too frequent infection rate in central lines. It is important that one can measure this rate. One knows how many central lines are used, infections manifest themselves and their cause can be determined by culturing the lines. Some undercounting is possible but the rate seems fairly reliable.

The man behind the work, Dr. Peter Pronovost, first observed events for a month within the context of the process of placing central lines (e.g., process mapping). Errors in the process steps were identified. Since these steps were simple, such as washing hands, one could partly view these errors as non cognitive errors. This suggests a control measure such as a double check to prevent such “slips”. Actually, besides slips, there may have been some at-risk behavior (2). This is behavior that increases risk where risk is not recognized, or is mistakenly believed to be justified. The main control measure used was a checklist, with the addition of having nurses double check to see that the checklist steps were properly done. Then the rate was measured again and found to be considerably lower. All of this was published (3).

It was mentioned that an alternative control measure had been tried; namely, using central lines coated with antimicrobials. This expensive control measure failed to provide a substantial reduction in infection rates. This illustrates that one must be open minded when selecting control measures. There is sometimes a bias towards fixing the “system” (e.g., such as with coated lines) rather than fixing a people issue (e.g., which often implies blame). Dr. Pronovost implemented some system control measures by getting the manufacturer of central lines to include drapes and chlorhexidine – items that should have been available at the bedside but often were not.

Another big part of this story is ongoing resistance towards implementing this control measure more widely, even after it has been shown to be effective and low cost. Any control measure can be viewed as a standard and standards are not very popular. People will argue “but our situation is different”, “ICUs are too complicated for standards”, and so on. Financial incentives (or disincentives) for standards (e.g., P4P) loom. Dr. Gawande goes on to say how complicated things are in an ICU, yet there is precisely where standards helped. A similar situation happened in anesthesiology in the late 70s and early 80s. (Here, critical incident analysis was used and is basically the same as FRACAS.) The error rate was too high, effective control measures were developed, and widespread implementation of the control measures took considerable effort. You can read about that story here.


1.       Gawande A. Annals of Medicine. The checklist. The New Yorker, Dec. 7th issue, 2007, see here (don’t know how long this link will work).

2.       Marx, D. Patient Safety and the “Just Culture”: A Primer for Health Care Executives

3.       Pronovost P. et al. An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU. N Engl J Med 2006;355:2725-32.

11/28/2007 - ISO 14971 authors, expertise, and potential conflicts of interest


I have questioned the elevated status of ISO standards claimed by some. Often, people justify this status by asserting that ISO standards are prepared by a consensus of experts. This entry explores three topics related to this assertion:

·        ISO authorship

·        Expertise of authors

·        Potential conflicts of interest for authors

The membership of an ISO committee

If you have an ISO document – I have the latest version of ISO 14971 – one thing to notice is that there is no list of authors nor even a list of the committee members. I don’t understand why it is the policy of ISO to hide this information, nor could I find such an explanation (or list of members).

Note that CLSI (formerly NCCLS) has in each standard a list of authors and subcommittee members, advisors, and observers (as well as area committee members).

What does it take to be an expert?

A simple if not flip answer to this is to be on an ISO committee, since by assertion, all committee members are experts. Of course, for ISO committees, one cannot form an opinion, since membership is unknown outside of the committee.

Potential conflicts of interest

Here are some opinions about conflict of interest regarding ISO membership (given that I don’t have a clue who the authors are). To understand conflict of interest concerns, it is helpful to understand that ISO documents have quasi regulatory status. As such, organizations can be divided into two groups: regulatory providers, and regulatory consumers (see

Manufacturers – The membership from this (regulatory consumer) group is often filled with regulatory affairs professionals. Their potential conflict of interest is to shape the documents to favor ease of compliance. They favor horizontal over vertical documents (see

Clinical laboratory or hospital professionals – Although this group would not seem to have a vested interest, one can question, how many of these people serve as consultants for industry. If a standard is written for the clinical laboratory or elsewhere in the hospital than this group has the same regulatory consumer potential conflict of interest as the manufacturer.

Regulators – As a regulatory provider group, the potential conflict of interest is the healthcare economics policy in place by the current administration.

Consultants – This group often has a high potential conflict of interest since some consultants make their living by helping companies comply with ISO standards.

Trade associations – This group is the voice of manufacturers and if represented on a ISO group has the same potential conflict of interest as for manufacturers, but with the added concern that trade groups are skilled in organizing manufacturers.

Note that for CLSI, any prospective member must fill out a conflict of interest statement. I am unaware of anyone ever being turned away from membership due to the conflict of interest statements.

11/21/2007 - ISO 14971 and Residual Risk


The last entry was about FMEA goals, yet, the word “goal” isn’t in ISO 14971. Maybe “goal” suffered the same fate as the word “mitigation” – banned from ISO. There is an implied goal in ISO 14971 - the residual risk must be acceptable. To recall, residual risk is the risk that remains after control measures have been taken. Here’s where things get a little tricky.

In cases where the residual risk is unacceptable, one is supposed to perform a risk benefit analysis to determine if benefits of the medical procedure performed by the device outweigh any possible residual risk.

To frame this discussion, consider two types of residual risk:



1.       A residual risk from a known issue, such as an interference, where eliminating this risk is not “practical “

2.       The overall residual risk from unknown issues. A certain amount of effort is used to search for risks (e.g., through FMEA, FTA, and FRACAS). At some point, more effort is considered not practical. Note: One can look at FDA recalls to see that unknown risks are often found in released products and lead to recalls (1).

Use of the word practical in ISO 14971 implies that in some cases, risk reduction is too expensive. This is not meant to be pejorative since everyone has limited resources.

In most cases in the standard, the cost benefit analysis is positioned as an analysis of the medical device’s clinical benefit to the patient vs. its risk. But ISO 14971 does point out an additional frame for the discussion.

“Those involved in making risk/benefit judgments have a responsibility to understand and take into account the technical, clinical, regulatory, economic, sociological and political context of their risk management decisions.”

To understand the issue, consider Type 1 diabetes as an example with the medical procedure being use of a home glucose meter. Because of risks 1 and 2 above, the glucose meter will fail and provide an erroneous result, albeit rarely. This is the current status and it is clear the benefit of the home glucose meter outweighs the risk (e.g., ADA recommendations to test for glucose). Yet, if one conducts a thought experiment and starts raising the frequency of (all) home glucose meter failures, simple decision analysis (2) still warrants use of the device. That is, measuring glucose, even if it occasionally (e.g., more often than rarely) gives an erroneous result, is better (clinically) than not measuring it.

If a company is working on a home glucose meter which provided an erroneous result too often (e.g., compared to existing meters), they will keep developing the meter until its failure rate is competitive. That is, there is a hierarchy of requirements for release for sale and often the competitive requirements (features needed to sell the product – including quality) are more stringent than any medical need or regulatory requirement (3).

Would you pay 2.5 million dollars to go to Cleveland?

Richard Fogoros suggests that there is a limit that we can spend for healthcare (4). To make this point, he says that if a plane could be built that could be survivable for most crashes, most people would not pay for an astronomical ticket price.

So regulators could require lower failure rates (less risk), causing companies to invest more, which would result in higher healthcare prices, but this is not done because it is unaffordable, hence the level of risk allowed is usually driven by competition. This is risk management but it is not the clinical benefit risk analysis described in ISO 14971– it is financial risk management.


1.       See

2.       Krouwer JS. Assay Development and Evaluation: A Manufacturer’s Perspective, AACC Press, Washington DC, 2002, Chapter 3.

3.       Krouwer JS. Assay Development and Evaluation: A Manufacturer’s Perspective, AACC Press, Washington DC, 2002, pp 38-39.

4.       Fogoros RN. Fixing American Healthcare. Publish or Perish Press, Pittsburgh, 2007.


11/17/2007 - FMEA goals in healthcare


FMEA is now a common risk management tool used in healthcare. Here’s a quick test. If the words “minimal cut set” and "Petri net" don’t mean anything to you, then you probably don’t have a quantitative FMEA goal. The rest of this entry explains some things to know about goals.

A quantitative goal must also be measureable and realistic. For example, a goal for imprecision (reproducibility) for a clinical laboratory sodium assay, might be 4% CV. One can measure this goal using a variety of experiments including those defined by standards such as the CLSI standard EP5A2.

FMEA deals with risk. Some common pitfalls about risk goals are:

·         A goal that an event should never happen. For example, the NQF (National Quality Forum) implies such by talking about “never events.” Risk is probabilistic and can never be zero. It is possible that an estimated risk is so low that in lay terms, it may be said to never be possible to occur but this lay usage is different from a formal quantitative assessment.

·         Too many goals. The NQF has a list of 28 “never events.” Virtually all of these cause serious patient harm. A goal could be restated in terms of patient harm, as the combination of risk from any of the 28 events.

·         The institute of Healthcare Improvement (IHI) implies goals in terms of evaluating the RPN (risk priority number) before and after implementing control measures. Some problems here are:

o   One may improve this metric by reducing the risk of less severe events (without reducing risk of severe events)

o   A severe risk with the lowest (categorical) probability of occurrence may be ignored as a candidate for improvement, since its RPN won’t change, but there still may be a way to lower risk (and still have the same (categorical) probability of occurrence rank.

Quantitative FMEA goals are possible and are used in the nuclear power industry although fault trees are used instead of FMEAs. Quantitative fault trees are evaluated among other ways using “minimal cut sets” and "Petri nets."

A reasonable non quantitative goal for FMEA is to learn more about potential failure modes. However, one should realize that it is difficult to assess how much is learned.

It is easy to have a quantitative FRACAS goal because it is easy to measure failure rates from observed failures, before and after implementing control measures.

11/10/2007 - Why FRACAS is important for medical device manufacturers


I have commented before that FMEA (and FTA) are used to prevent potential errors and that FRACAS is used to prevent the recurrence of observed errors. FRACAS is easier than FMEA, FTA because for FRACAS:

·         no modeling is required with respect to enumerating the possible failure modes (errors) – one simply observes the errors

·         one can easily calculate a failure rate, which can also help  predict when a failure rate goal will be achieved

From a user’s perspective (e.g., medical device customer), it is of course more important to prevent errors than to prevent their recurrence (e.g., no melt down vs. preventing another melt down). However, if FRACAS is completed before release for sale, then the FRACAS activity of preventing the recurrence of observed errors is also preventing potential errors from the user’s perspective, because (again, from the user’s perspective) the clock is at zero – no errors have occurred yet because the system hasn’t been used. This is summarized in the following table.


Before release for sale

After release for sale


Errors are:

Control measures used to

Effect of tool:



Prevent potential errors

Errors prevented



Prevent recurrence of errors

Errors prevented

This does not mean that FMEA, FTA should be dropped. If a potential error has never been observed, one still must be sure that adequate control measures are in place.

So FRACAS is part of risk management in spite of the fact that it is not mentioned in ISO 14971.


FMEA – Failure mode Effects Analysis
FTA – Fault Tree Analysis
FRACAS – Failure Reporting And Corrective Action System
Failure Mode - Error

11/6/2007 - Some ISO 14971 risk control measures won’t reduce risk


The previous entry dealt with some limitations of the ISO risk management standard for medical devices – ISO 14971. This entry covers one of the limitations in more detail.

ISO 14971 fails to embrace the error – detection – recovery scheme, since they omit recovery. To see the problem, consider a clinical laboratory example in which a serum sample is analyzed for potassium.

Error – As the specimen is processed, some error occurs (OK, I am not that good at making up errors), which hemolyzes the specimen. If the cause of the error is known, then steps might be taken to minimize or eliminate it.

Detection – A technician visually examines the specimen before it is analyzed. The hemolyzed specimen is detected.

Recovery – The technician does not analyze the specimen and notifies the appropriate party to get another specimen. The end result depends on the turn-around-time requirement after re-assay.

If the turn-around-time requirement is met, no effect of the original error is observed

If the turn-around-time requirement is not met, the effect of the original error is a delayed result.

In either of the above cases, the error – detection – recovery scheme has prevented an erroneous result as the effect of the original error. (OK, one could get an erroneous result in the new specimen).

Whereas recovery in this case seems trivial, what if just as the technician is ready to perform the recovery, he/she gets called away and never performs the recovery. There is a well known example of a failed recovery where the error was the incorrect leg was scheduled to be amputated – the error was detected – but the recovery failed.  Although, the correct leg was identified in the operating room schedule (successful detection), there were multiple operating rooms and not all schedules were corrected (failed recovery) (1).

Where recovery becomes even more of an issue is when detection and recovery are located in different organizations. This is actually a common occurrence. For example, manufacturers detect a problem (this could be an official recall) and it is up to the hospital or clinical laboratory to follow the manufacturer’s recommendation as to the recovery (e.g., discard that lot of reagent).

In the risk management standard ISO 14971, a recommended control measure presents the opportunity for a failed recovery. ISO 14971 provides a hierarchy of risk control measures (mitigations), which in order of preference are:

1.       Eliminate the error

2.       Detect the error

3.       Inform the user of the error possibility (e.g., state a limitation of the procedure)

Number 3 is really part of detection (e.g., the detection is communicated). Number 3 is also commonly used for interfering substances for in-vitro diagnostic assays. This error is the stepchild for diagnostic assays. For example, I once surveyed a year’s worth of Clinical Chemistry assay performance complaints and found that interferences were the main complaint (2). One can speculate how this happened. A clinician realized that some treatment or patient status was inconsistent with a laboratory result, the laboratory investigated, and the assay result was found to be incorrect with an interfering substance as the cause of the erroneous result.

So consider the risk control measure for an assay whereby the manufacturer lists 10 substances that may interfere with the assay. How can the clinical laboratory “recover” using this knowledge (e.g., detection)? They can’t. To determine the concentration level of ten substances in every specimen is impractical (too expensive). So to review this situation:

1.       Eliminate the error – the manufacturer has tried, but failed. Ten substances still interfere (at or above certain concentrations)

2.       Detect the error – the only “detection” possible is to inform the clinical laboratory. Note that all other common detection methods (external quality control, internal algorithms) fail.

3.       Recovery – The clinical laboratory cannot perform a recovery

One should realize that whereas this is an undesirable state, it may be the best possible way of doings things given the economic constraints. As stated in the previous entry, the manufacturer is doing the right thing (as are regulators and the clinical laboratory).

However, the problem is that ISO 14971 would have us believe, that all risk is now at an acceptable level, which is not the case. The erroneous result is likely to occur, after which a cause is likely to be found since the manufacturer has stated a list of possible interfering substances.

Also, as in the previous entry, patient awareness is needed to be added to the mix as a significant way to prevent patient harm.


1.       Scott D. Preventing medical mistakes. RN 2000;63:60-64.

2.       Krouwer JS. Estimating Total Analytical Error and Its Sources: Techniques to Improve Method Evaluation. Arch Pathol Lab Med 1992;116:726-731.

11/4/2007 - Improvement is needed for risk management guidance for in vitro medical devices


When either a manufacturer or a clinical laboratory performs risk management, it is implied in the risk management standard ISO 14971 (and other literature) that risk management (1-4):

·         Identifies any product component or process step that has unacceptable risk

·         Through mitigations, reduces all remaining risk to an acceptable level

The purpose of this entry is to show that this doesn’t always happen and to suggest what to do about it.

Note 1: in order to understand ISO 14971, you need to learn ISO speak (“globally harmonized terminology”). For example, there are no lab “test results” or “assay results” - these are called “examination results.”

Note 2: ISO 14971 is intended for manufacturers. The section about risk management for clinical laboratories is based on my discussions with clinical laboratory directors.

The problem frame – ISO 14971 has a figure (H.1, page 61), which shows that there are three possibilities to prevent harm to the patient – the medical device manufacturer, the clinical laboratory, and the physician. ISO 14971 describes a mitigation* as either a way to prevent or detect an error. ISO fails to include recovery (5), which is a serious omission.

risk cascade

* I use here the word “mitigation” but should point out that mitigation has been banned from ISO speak and isn’t in ISO 14971.

An example problem– hCG (human chorionic gonadotropin) is an assay used to test for pregnancy. Such assays are subject to interferences, with HAMA (human anti-mouse antibody) a common example. In one case, a woman with an elevated hCG was diagnosed as having cancer and underwent chemotherapy, hysterectomy, and partial removal of one lung (6). Eventually, it was determined that she did not have cancer and all of the hCG assay results were incorrect due to HAMA interference – her actual hCG was not elevated. Cole studied this problem and found that it has occurred multiple times (7).

Manufacturer – One of the most difficult problems for manufacturers to overcome is lack of analytical specificity. This means that for many assays, a few results will be way off due to substances in the specimen that interfere with the assay. The fact that the rate of occurrence of this error is low is good, but as seen above, the consequences can result in severe harm to the patient. It is standard practice for manufacturers to accept the small rate of erroneous results and deal with the issue by stating these limitations in the product labeling (the package insert).

ISO 14971 provides the use of stating limitations as one method – albeit the least desirable method  - of risk reduction (H.4.1.c p70).

In the case of HAMA and other interferences, this warning is of little value to the laboratory since a laboratory has no information as to which specimens have HAMA or other interferences and it would be prohibitively expensive to try to determine this information (e.g., the recovery will fail). (I once had roof rack straps for my car which had a warning on the package – “stop every 25 miles to make sure the straps are secure”).

Clinical Laboratory – It was a surprise to me to learn from some clinical laboratory directors that:

·         They know that occasional erroneous hCG results are reported to clinicians, which ultimately causes patient harm

·         There is a quality control possibility to test a specimen for HAMA interferences by diluting it and rerunning it, but this is rejected as too expensive

·         Thus, clinical laboratory directors recognize the risk as unacceptable, but live with it

Analysis – The manufacturer is doing the right thing. If they could economically develop an assay without interferences, they would. Regulators who approve the assay are doing the right thing. Rejecting the assay would cause more harm to patients due to the lack of information of no assay result than the harm caused by a small number of erroneous results. The clinical laboratory directors are doing the right thing. If they reran too many samples, their costs would be too high and the laboratory would go out of business (more likely the laboratory director would be fired first and the rerunning process stopped).

The manufacturer notification of limitations, while necessary and conforming to ISO 14971, is ineffective to prevent risk. The clinical laboratory either does nothing to prevent risk or could potentially do the same thing as the manufacturer – issue a warning about potential interferences in the assay report to physicians.

Proposed Solutions – Recognize the problem. The current status quo of the risk management scheme is that after risk management has been performed there is no issue, which is wrong. Issuing limitations that are ineffective in reducing risk must be so acknowledged. The outcome of this risk management task for either the manufacturer or the clinical laboratory must result in the HAMA event as an undesirable* risk. It should be acknowledged that it is a work in progress to come up with a method – which must be economical – which reduces this risk to an acceptable level.

*Use of the term unacceptable risk makes no sense, since no one would tolerate unacceptable risk. Hence, a risk management program could through mitigations reduce previously unacceptable risk events to some combination of acceptable risk events and undesirable risk events.

The role of the physician and patient – I will leave the role of the physician to someone else. I suggest that the ISO figure above is wrong. It should have one more cascade; namely, the possibility for the patient to detect and recover from a problem and if this fails, then harm will occur. One should not discount patients as being not knowledgeable enough.  Through the use of the Internet, there is a growing movement for patients to take more control of their health. This includes assessing laboratory results which are playing an increasing role in medical decision making (for one example see reference 8).  So as part of a risk management program, one should include the patient.


1.       ISO 14971

2.       Can’t afford to buy ISO 14971? Then read summaries in Ref. 2-4



5.       See Figure 4 in Krouwer, JS. An Improved Failure Mode Effects Analysis for Hospitals. Archives of Pathology and Laboratory Medicine: Vol. 128, No. 6, pp. 663–667. See

6.       Sainato, D. How labs can minimize the risk of false positive results. Clin Lab News 2001;27:6-8.

7.       Cole, LA Rinne, KM Shahabi S.and Omrani A. False-Positive hCG Assay Results Leading to Unnecessary Surgery and Chemotherapy and Needless Occurrences of Diabetes and Coma. Clinical Chemistry. 1999;45:313-314.



11/1/2007 - Who made ISO king


I have been working on a CLSI (Clinical Laboratory and Standards Institute) standard on risk management.  A preliminary version is available. This version needs revision and is getting it. As part of this process, comments are received and addressed using a consensus process. Having seen a few of the comments, one of them bothers me, not so much about the issue raised but the justification supplied, which is that the CLSI document deviates from the ISO standard on risk management – 14971. So this blog entry questions whether ISO documents should be taken as gospel.

I have commented  before on a specific ISO document – 9001. The title of my article says it all – “ISO 9001 has had no effect on quality in the in vitro medical diagnostics industry.”

ISO 14971 states things without providing any justification. There is a bibliography at the end but no links from the text to the bibliography. The document is not peer reviewed, although it undergoes its own consensus process. One is basically supposed to take ISO 14971 as correct because it is “based on an international group of experts”. I put the preceding phrase in quotes because, anyone serving on an ISO committee is automatically conferred expert status (this is true for CLSI committees as well).

So perhaps it is not even iconoclastic to question an ISO document, and one should certainly not suppress an idea because it deviates from an ISO document.

10/28/2007 - FDA Classes


Bob had a comment about my previous FRACAS post, which reminds me of something. In his comment, he refers to FDA device classes and says that Class II devices do not require as much rigor. FDA classes can cause some confusion because there are two types of classes - device classes and recall classes.

Devices classes are: class I, class II, or class III. It is class III that requires the most data and can “present a potential, unreasonable risk of illness or injury.”

Recall classes are also class I, class II, or class III. It is class I that is the most dangerous type of incident and can “predictably could cause serious health problems or death.”

Can one get a class I recall for anything other than a class III device? I don’t know the answer to this question but to a company, it is somewhat besides the point. Recalls are expensive, regardless of what device class they belong to or what the FDA requires for data and are to be avoided (e.g., using tools such as FRACAS).

10/25/2007 Fixing American Healthcare – A Review

Fixing American Healthcare

My review of this book is from the perspective of a healthcare consumer and also as consultant to the medical device industry – I have no expertise in healthcare economics. In fact, the topic itself was initially of no interest for me – I figure we’re all going to get screwed and so someone talking about net present values of capitation expenditures would be a real snoozer. However, in this day and age of blogs, I came across the Covert Rationing Blog and found myself repeatedly coming back to this blog. Dr. Fogoros, aka DrRich, has a clear and entertaining writing style and made this topic interesting on his blog, so I bought the book. I was not disappointed.

The organization of this book is well thought out. The first 50 or so pages (out of slightly over 300) function as a summary of much of the analysis, after which people can either abandon ship or read on. I found Dr. Fogoros’s GUTH – grand unification theory about healthcare - to be quite compelling and also easy to understand. GUTH divides healthcare in four quadrants, all four combinations of centralized vs. the individual, and low quality and high quality. In this summary part, there is description of an investor session from 2000 which Dr. Fogoros attended. Here, Jim Clark (founder of Netscape) discussed his then latest venture – WebMD. I could have benefitted from Dr. Fogoros’s insight as to why WebMD would fail in its original concept, as I was one of the naive investors (fortunately only dabbling in this one). Simplifying insurers’  transaction costs and procedures was Jim Clark’s pitch, but the insurers did not want this simplification as their goal was to take money in but make it as complicated as possible to pay out for claims.

In the rest of the book, Dr. Fogoros supplies more details. What is so compelling to me is that when Dr. Fogoros exposes the forces at play, everything falls into place. There are no evil people, just people doing what they do best within the rules of society. So a football player that smashes his opponent on the field is cheered – off the field, the same behavior would land him in jail. In this book, the relevant players are like football players making hits on the field – they are not portrayed as evil.

Some of the discussions that were of interest: everything about money, the whole idea of covert (vs. open) healthcare rationing, the principle (that America refuses to abandon) that there can be no limits to healthcare, the destruction of the doctor patient relationship, the history and way HMOs work, why eliminating fraud won’t solve the healthcare cost problem, randomized clinical trials.

Two major groups are discussed as trying to control healthcare – the “Gekkonians” –who believe that market forces will reduce cost and the “Wonkonians” – who believe regulation can lower cost, largely by decreasing fraud.

Dr Fogoros has an engaging writing style. It is as if he is telling us a story, subtle humor is present  but the book is not a joke-a-thon.  One example - to illustrate the importance of cost in solutions, he says that one could do a lot more to make a plane crash survivable, but would you pay 2.5 million dollars for a ticket to Cleveland. Dr. Fogoros relays a chilling account of his own run-in with regulators, an experience that would make most people think of retirement. Thankfully for us, one reaction of his was to become an expert in the topic and write this book.

My somewhat cynical view of healthcare insurance has been that you pay expensive premiums for many years, at some time develop a serious illness, and then your policy is abruptly cancelled. Does Fixing American Healthcare simply play to my previous bias? Perhaps, but one should know that I complain about everything I encounter if I find a problem. Often, these complaints are published and thus, they are peer reviewed complaints about peer reviewed articles (the one that I am most proud of refers to the most cited publication in The Lancet). I do complain about a point made in Fixing American Healthcare. But it is a tiny point and does not detract from the main message of the book.

One of the values of this book is that it espouses the values of transparency, and just as importantly explains healthcare so that it is transparent. Transparency is the enemy of those with hidden agendas. I remember the resistance to unit pricing in food stores – some characterized it as too confusing, but its value was simplifying things. 

Of course, for Dr. Fogoros to point out problems is important, but what one also wants is proposed solutions. There is a preview of the solution in the section on clinical trials – openly ration healthcare and provide services to those who need it most. As one gets into these final sections about solutions, everything made sense to me, but I must admit, I need to reread these sections and since this will take some time, I thought it was important to provide this partial review, because this book is so important.

Overall, this book is fabulous and I learned a lot. It deserves to sell out of its first printing. For subsequent printings, ok, one final complaint - larger print would be nice.

10/21/2007 FRACAS? – Never heard of it


I just got back from co-presenting a short course on medical device software verification and validation for AAMI. One of the topics that I discussed was the use of FRACAS to improve software reliability.

One of the first questions I asked was, has anyone heard of FRACAS? Only one person raised their hand.

I also asked – has anyone ever heard of software reliability growth management? Again, only one person raised their hand – in this case a different person. The rest of this entry is to try to explain these results.

Google returned 38,900 hits for the phrase “software reliability growth.” I assume that adding the word management did not make a difference. So people who have the responsibility to validate software for medical devices have not heard of (at least in this sample) a technique that is used by some and is written about. It’s not surprising that a similar result was found for FRACAS, which is really used to reduce errors from all sources – not just software. Here are my reasons regarding the lack of knowledge about FRACAS in the medical device industry.

1.       FRACAS is not required by the FDA. We live in a regulated world, where often, the prime quality goal of an organization is to stay out of trouble with regulators. This is an understandable goal and makes sense – the problem is that other important goals may be neglected and quality practices may be limited to those proscribed by the FDA. Product recalls, including those that have caused harm occur for approved products and not only at companies that get warning letters.

2.       Whereas reliability techniques associated  with preventing potential errors (FMEA) and preventing recurrence of observed errors (FRACAS) are both used in military programs, only FMEA seems to have made it into healthcare. In this course, most people raised their hand when asked – have you heard of FMEA? My take on this is that there is a bias towards FMEA, because it is associated with preventing potential errors. The notion that one can get anything useful by observing errors has been overlooked.

3.       This failure to recognize what has proved useful elsewhere (such as the defense industry) is perpetuated by various groups. For example, if one looks through the 2007 version of the ISO 14971 standard on risk management, there is not a single reference to FRACAS. The same results was found, using “Search” for websites for the Institute of Healthcare Improvement, National Quality Forum, and Leapfrog. Even using CAPA as a search term, yielded no results.

It's time to realize that during product development, observing errors and implementing corrective actions all before product release is a form of risk management.

10/21/2007 - Near Miss

William Marella writes about near misses in Patient Safety and Healthcare.  Much of what says makes sense but overall, the article itself is a near miss. Here’s why.




Mr. Marella reports that most hospitals follow regulators’ recommendations about reporting only about adverse events and not near misses. To understand the problem with this (beyond what Mr. Marella discusses), let’s look at FRACAS (Failure Reporting And Corrective Action System). With FRACAS, the steps are as follows (I’ve added emphasis as italics):

1.       Observe and report on all errors.

2.       Classify each error as to its severity and frequency of occurrence.

3.       Construct a Pareto chart.

4.       Implement corrective actions for the items at the top of the Pareto chart.

5.       Measure progress as an overall (e.g., combined) error rate.

6.       Continue steps 1-5 until the error rate goal is met.

So an immediate problem with what’s being done is that step #3 – constructing a Pareto chart is being handed down from regulators – and one can question the origin of this Pareto. Moreover, as Mr. Marella correctly points out, this Pareto chart is about adverse outcomes, not events in the process. To understand why this is a problem, consider the following chart about errors:

Error Detection Recovery 

When errors occur, there is an opportunity for them to be detected. If detection (and recovery) are successful, a more serious error event has been prevented. So in this chart, error event A when either undetected or with successful detection and a failed recovery leads to error event B and if the same steps occur, error event B leads to error event C with each higher letter having a more severe consequence. As a real example of this, there was the national news story of the Mexican teenage girl who came to the US for a heart lungs transplant. Organs of the wrong blood type were selected (error event A) – this error was undetected and these unsuitable organs were transplanted (error event B). The correct reason that the patient’s health declined was detected but the recovery failed and the patient died (error event C).

Let’s consider detection in more detail. In planned detection, a (detection) step is part of the process. So, in a clinical laboratory, a specimen is examined to see if its adequate. For example, a serum sample that is red has been hemolyzed and will give an erroneous potassium result, so detection results in this sample not being analyzed – at least not for potassium. This causes a “delayed result” error rather than sending an erroneous result to clinician, which is more serious. Typically, detection steps are optimized so that it is more or less guaranteed so that they will be effective. In some cases, people have gone overboard – in one report, the average number of detection steps to assess if the surgery site is correct is 12 – this is too many.

However, a salient feature of a near miss is accidental detection. This unplanned detection signifies that there is a problem with the process that requires correction. There is of course no guarantee that accidental detection will occur the next time and it is likely that it won’t occur, so typically, when accidental detection occurs, severity is associated with the more serious event, as if the detection did not occur. The corrective action may be to create a planned detection step or to make other changes to the process. This also points out the problem with regulators constructing their own Pareto. By not collecting all errors and then classifying them, high severity errors (near misses) will be neglected. So basically, steps #1 and #2 in a FRACAS have been omitted.

Another problem, is the lack of constructing an overall metric and measuring it.

Some things to know about error rates

  1. One should track only one (or in some cases a few) error rates.
  2. The (overall) error rate goal should not be zero.
  3. Resources are limited. One can only implement a limited number of mitigations.

The National Quality Forum (NQF) has identified 28 adverse events to be tracked, the so called “never events”. There is no way that one can establish allowable rates for each of these events and a “never event” implies an allowable rate of zero, which is meaningless. For those who have a problem with a zero error rate, one must understand, one is working with probabilities. For example, say one must have a blood gas result. Assume that one knows that the failure rate of a blood gas instrument is on average, once every 3 months, and when it fails, the blood gas system will be unavailable for one day. Say this failure rate is too frequent. One can address this by having 2, 3, or as many blood gas instruments as one wants – or can afford – with failure now occurring only when all blood gas instruments fail simultaneously. But no matter how many blood gas instruments one has, the estimated rate of failure is never zero, although it can be made low enough to be acceptable and perhaps so low that it can be assumed “never” to occur – although there is a big difference between the “never” used by the NQF and the estimated probability of failure. In fact, the difference between a calculated rate that is greater than zero but possible to occur in a one's lifetime and a calculated rate that translates to "never" could be a substantial difference in cost. The blood gas example uses redundancy to prevent error. The wrong site surgery example above uses detection, which is of course much cheaper than buying additional instruments. Each mitigation has its own cost. Computer physician order entry is an expensive mitigation to prevent medication errors due to illegible handwriting. Financially, all of this reduces to a kind of portfolio analysis. One must select from a basket of mitigations an optimal set to achieve the lowest possible overall error rate at an affordable cost.

This (portfolio) analysis only makes sense if one is combining errors. If error A causes patient death or serious injury and error B does the same, and there are many more such events, one can combine these errors to arrive at a single error rate for all error events that cause patient death or serious injury. This is similar to financial analysis, whereby there is one “bottom line”, the profitability of the overall business – individual product lines are combined to arrive a one number.

10/14/2007 - The "Axiom of Industry" applied to healthcare

industryOne of the most interesting blogs that I have come across is the Covert Rationing Blog. The author, DrRich (Richard N. Fogoros, MD) has written a book, “Fixing American Healthcare”, which I am in the process of reading. So far, it is a fabulous book, and I am learning a lot. I did take exception to a point that was made on DrRich’s blog and follow up on that here, based on getting to that section in his book.

His “axiom of industry” is that standardization of an industrial process reduces cost and improves outcomes. This industrial  idea is being applied to healthcare. DrRich gives a example where standardization applied to healthcare works (hip replacement) and where it doesn’t work (congestive heart failure - CHF). The reasons he provides – although not exactly so stated – are that for hip replacement, one has a high state of knowledge, and for CHF, one has an intermediate state of knowledge and when the state of knowledge is not high enough, standardization will not work.

This is where DrRich needs to continue with his industrial analogy. There are many processes in industry with a high state of knowledge as well as processes with an intermediate state of knowledge. Yes, in industry, one standardizes processes with a high state of knowledge, but this does not happen when the state of knowledge is inadequate.  Here, one uses a variety of approaches, including trial and error; that is, observing errors and then applying corrective actions. FRACAS (Failure Reporting And Corrective Action System) is a formal name for this method and believe it or not the acronym TAAF (Test Analyze And Fix) is also used. Whereas observing errors and then fixing them is not often admitted by quality managers as the method used, it is at times the best method to improve a process.

In healthcare, this method is often used as well. As patients, we are aware of the physician saying, let’s try treatment XYZ and see what happens, implying that if the treatment doesn’t work (an incorrect treatment decision) another treatment will be tried. If this actually happens and the second treatment works, one might not be happy but it is possible that the physician nevertheless followed a reasonable course of action. Moreover, for a disease condition one is not always in a “standardization” or “trial and error” situation. One often uses a mixture of the two. And, there is always the possibility that the state of knowledge for a disease may increase at some point to allow for standardization. I previously commented that standardization of a process that is not ready is likely to lock in unknown errors.

The other point that DrRich makes is that patients are not widgets. The implication is a little ominous here, namely; that morally deficient industrial managers given the chance, would discard patients as readily as widgets. I commented before, that one is optimizing a process – the correct analogy is to throw out an incorrect treatment – not a patient. Moreover, widgets are usually thought of as low cost items. No one considers a patient as low value. So here the analogy must be between patients and high cost widgets (of which there are many). In industry, as in medicine, loosing (discarding) a high cost item is not good.

One needs to ask, how many medical conditions are amenable to standardization (e.g., have a high state of knowledge). Covert rationing may well be responsible for patients being treated as widgets, including misapplying industrial processes, but these processes themselves can be applied to healthcare to benefit patients, although they will not solve healthcare costs.

One Comment
Dr. Richard Fogoros responds

10/5/07 - The problem with Joint Commission requirement to perform a FMEA and a suggestion on how to fix it

lawThe Joint Commission, which accredits hospitals, requires each member hospital to select at least one high risk process per year and perform proactive risk assessment on it (requirement LD.5.2). Typically, FMEA* is used to satisfy this requirement. The problems with this requirement are:





1.       Everyone knows something about risk management (e.g., skiing down that slope is too risky) but few people know how to properly conduct a FMEA. It is unlikely and impractical to require every hospital to acquire this expertise.

2.       To adequately perform a FMEA requires a significant effort besides having knowledge about FMEA techniques. Typically, one adds a fault tree to the FMEA and quantifies the fault tree. The two prior blog contributions describe issues when failing to quantify risks. To quantify risk of each process step requires data and modeling, not just a qualitative judgment.

3.       It is unlikely that each hospital will obtain the commitment to adequately staff a risk management activity – moreover, one can question whether Joint Commission inspectors have the knowledge to adequately evaluate each hospital’s results.

4.       All of the above will result in hospitals performing an activity to achieve a checkmark in a box, rather than actually reducing risk.

What makes more sense is to consider hospital processes as similar and to have standard groups perform a FMEA for each process. The results could then inform guidelines. This suggestion is also not without problems, which are listed as follows:

1.       A lot of people hate guidelines, so acceptance may be difficult. Some will argue that each hospital is different. To counter this, one could suggest that the hospital start with the guideline and adapt it to their process. This would be a manageable task.

2.       There is no guarantee that the guideline developed is the right one.

3.       Any guideline cannot guarantee freedom from errors – the guidelines themselves may not be 100% effective. Moreover, guidelines may cause one to relax vigilance about errors as in – "we’re following the guideline".

Examining an actual example, wrong site surgery has undergone a standards approach. The Joint Commission studied this error and came up with the Universal Protocol, which hospitals are required to follow. One issue is a report (1) that cites that in a set of hospitals there are on average 12 redundant checks to prevent wrong site surgery. This indicates that something has gone wrong. Perhaps, with quantification of risk, one could show that 12 checks is too many. The report also shows that the Universal Protocol would have been unable to prevent all wrong site surgeries (the study included surgeries performed before the Universal Protocol was required). This also highlights the need to maintain a FRACAS (Failure Reporting And Corrective Action System) to deal with observed errors. This too would benefit by being done nationally. The data collection part S.544 (2) is already law. What is needed is the complete FRACAS approach to this data.


1.       Mary R. Kwaan, MD, MPH; David M. Studdert, LLB, ScD; Michael J. Zinner, MD; Atul A. Gawande, MD, MPH. Incidence, Patterns, and Prevention of Wrong-Site Surgery. Arch Surg. 2006;141:353-358. See:

2.       See:

*Dr. Krouwer has a software product for hospitals that performs FMEA. He is looking to move on from this project and will consider offers to buy the source code.

10/03/07 - Medical Error FMEA Risk Grids – why they are a problem II

This blog entry summarizes the previous entry, Medical Error FMEA Risk Grids – why they are a problem.

1.       Risk grids are presented whereby each cell is severity by probability of occurrence.

2.       In the VA risk grid, the remote by catastrophic entry is problematic because, the remote definition is not infrequent enough (when coupled with catastrophic events) and this cell’s risk is labeled as “ok.”

a.       Although this would be solved by adding another probability of occurrence row with a lower probability of occurrence, the problem would still remain if one does not quantify probabilities*.

3.       The risk grids are often called semi-quantitative.  This is not really true as often, no measurements or data are taken to justify the location of events with respect to probability of occurrence.

4.       No matter how many mitigations are put in place, the risk of an adverse event is never zero.

a.       But one can lower the risk through mitigations so that the likelihood of occurrence is so low that it is acceptable. Hence, there must always be an “ok” cell, even for catastrophic events. In any case, one can’t keep on adding mitigations forever, because resources are limited.

5.       Without quantifying probability of occurrence, one is in danger of accepting risk as “ok” when it is not low enough.

6.       Quantifying probabilities for all events within a process is a massive amount of work.

*Example of not quantifying probabilities. At a FMEA meeting, regarding a specific event, “I think the likelihood of that event is going to be real low. Everyone agree?, … Yeah”

9/28/07 - Medical Error FMEA Risk Grids – why they are a problem

gridFMEA risk grids are presented as a small spreadsheet.  As an example, the VA HFMEA “scoring matrix” is shown below.


























This table is similar to those in the ISO standard on risk management, 14971. The idea is to classify all potential errors in a process as to their severity and probability of occurrence. Each potential error event will fall in one of the grid cells. Events that fall in the yellow cells are unacceptable and require action.

So what’s wrong with this? On the face of it, there is nothing wrong – it is a standard practice from other industries. The problems emerge as one looks into the details. Each of the row and column headings are defined by the VA as follows (for the sake of brevity - severity is limited to patient outcomes only):


Frequent -Likely to occur immediately or within a short period (may happen several times in one year)

Occasional -Probably will occur (may happen several times in 1 to 2 years)

Uncommon -Possible to occur (may happen sometime in 2 to 5 years)

Remote -Unlikely to occur (may happen sometime in 5 to 30 years)




Catastrophic  - Death or major permanent loss of function (sensory, motor, physiologic, or intellectual), suicide, rape, hemolytic transfusion reaction, Surgery/procedure on the wrong patient or wrong body part, infant abduction or infant discharge to the wrong family
Major -
Permanent lessening of bodily functioning (sensory, motor, physiologic, or intellectual), disfigurement, surgical intervention required, increased length of stay for 3 or more patients, increased level of care for 3 or more patients

Moderate - Increased length of stay or increased level of care for 1 or 2 patients

Minor - No injury, nor increased length of stay nor increased level of care


Now the problems can be seen:


If one focuses on catastrophic errors, almost all of them will be located in the remote cell – none will be in the frequent cell. The implication of the VA table is that one does not have to further examine a process that has remote catastrophic errors but this is clearly wrong. There are many examples of catastrophic errors whereby it is desired that they occur much less than once in 5 to 30 years. An example is the tragic case of the Mexican teenage girl, who was given organs with the wrong blood type and later died, a story which made the national news.


To understand what needs to be done, one must examine a potential error in more detail. For simplicity of this explanation, let’s eliminate human error and focus on machines. In the operating room, blood gas results are needed, so this hospital has a blood gas lab nearby with a blood gas system. But there is a possibility that this blood gas system will fail and blood gas results will be unavailable. One has reliable data from the manufacturer about the frequency of a blood gas system failing. The mitigation is to have a second blood gas system. Now, blood gas results will be available unless both blood gas systems fail simultaneously. So now the probability of unavailability of blood gas results is lower, but it is not zero. The hospital can keep on going and put in place as many blood gas systems as they see fit, with each additional system lowering the probability of this adverse event from occurring. From the standpoint of a risk grid, one will eventually arrive at a cell that has acceptable risk.


The point is that for many (all?) catastrophic errors, the desired goal is to never have them. Because one is dealing with probabilities, never is unattainable, so one must put in place mitigations which lower the probability to an acceptable level.


This also means that for each potential catastrophic error, one must quantify the probability of occurrence before and after any mitigations and this is the problem – this quantification is a monumental task (e.g., especially for human errors) and until it is tackled, one will have FMEAs performed to satisfy regulatory requirements, but with little improvement in reducing risk.


9/27/07 Myths of EQC (Equivalent Quality Control)

mythCMS has established EQC (Equivalent Quality Control) as a way for clinical laboratories to reduce the frequency that they perform quality control, provided they meet certain guidelines. I have previously commented on the problems with this: see my 4/20/07 blog “Beware of Equivalent Quality Control” at and also an AACC expert session at

The purpose of this entry is to deal with the fact that although the expert session dealt with myths of EQC – these myths persist in comments by CMS and by people who are preparing CLSI documents about risk management. Hence, I will repeat here some of the myths.



1.       Internal QC is new – What is meant by internal QC is algorithms and associated hardware to detect and prevent incorrect lab results. Internal QC has been around since the days of SMAC. Whereas it is always being improved, it is not new. “Internal QC is new” is often used as a justification for implementing EQC as in … because moderns analyzers are now using new, sophisticated …

2.       External QC is redundant to internal QC – It is often implied that one is justified in reducing external QC because it is redundant to internal QC. This is not always true. As an example, say an algorithm looks at the response to determine if the sample is too noisy. If so, the sample will be rejected. But algorithms such as these do not work 100% of the time. If the algorithm fails on a calibration, the calibration will go through and all subsequent samples could have a shift. External QC is different and when run will likely detect the shift. An example of redundancy is to have five blood gas systems in the laboratory. If one system fails, the other systems can be used. (See also #5, #6).

3.       External QC doesn’t work for unit use systems – Here, it is implied that each sample run is completely unique in unit use systems so that external QC can only inform about one sample. This is not true. Unit use systems are manufactured and the manufacturing process can have drift and bad lots, so that a batch of unit use devices are bad. External QC will detect this condition.

4.       Internal QC always works (is 100% effective) – See #2 – internal QC often has the properties of a medical test – there are some false positives and false negatives. One can see that this point is missed in writings about internal QC. Thus, if one has data from an internal QC experiment, such as success was achieved in 100 out of 100 tries, one must realize that whereas the point estimate is 100% (effective), the confidence interval is not. One has not proved 100% effectiveness. The value of external QC is that is uses a different mechanism and can catch errors that internal QC misses.

5.       Because one performs FMEA and other risk management tools, one has thought of everything  - Of course, there will be no associated internal QC for a failure that no one has thought of. But with external QC, one does not require knowledge of failures for external QC to work. One need only review the list if FDA recalls to see that manufacturers have not thought of all ways a system can fail. (see also #2, #6).

6.       No one makes human errors – In reviewing the list of FDA recalls, some errors are human errors – such as releasing a lot of reagent that has failed. Once again, external QC can catch these errors.

There will be many conditions where internal QC catches errors that would be missed by external QC, but there is no scientific evidence that one can reduce the frequency of external QC without increasing the risk of medical errors.

9/16/07 - A Blog (from someone else) Worth Reading


I have ranted about pay for performance (P4P). I recommend two blog contributions by DrRich.

The first has similar ideas to mine – probably one reason I like it so much. The second is about healthcare rationing and the financial aspects of P4P and is also very interesting reading.

However, I take issue with the DrRich’s comparison between widgets and patients in his first blog.

“P4P also relies on the Axiom of Industry - that the standardization of any process both improves quality and reduces cost. As DrRich has described elsewhere, the Axiom of Industry does not hold when the process involves actual human patients. This is because patients are not widgets. (While everyone agrees that patients are not widgets, the implication of this fact seems to have escaped many: What happens to the individual widget on an assembly line is immaterial - discarding even a high percentage of proto-widgets may be fine - as long as the ones that come out the other end are of sufficiently high quality as to yield the optimal price point in the market. Patients not being widgets, in theory we are supposed to care about what happens to the individual patient during the process.) Nonetheless, invoking the Axiom of Industry - equating reduced cost to improved quality - allows the central authorities to choose “quality measures” in their P4P efforts that will primarily reduce cost, and then to claim that their primary concern is for quality.”

There are several problems with this comparison. In a diagnostic process for patients, one would not throw out patients as implied by DrRich. One throws out (tentative) diagnoses that no longer meet evidence as it is collected. That is, one is dealing with the process of diagnosis (or the process of producing widgets). So this could mean that P4P would force one to accept an incorrect diagnosis which would harm a patient.

But the main issue is it is not whether one is dealing with patients or widgets but the state of knowledge one has (for either process). When the state of knowledge is high, then standardization* is appropriate. In DrRich’s site, a comment by bev M.D. reminds us that standardization works well for the process of transfusing blood. When the state of knowledge is not high enough, standardization does not work as well and other methods are needed and used. In reliability engineering, when the state of knowledge is insufficient, FMEA (Failure Mode Effects Analysis - a modeling method), is unable to predict all of the ways a process can fail and design errors will occur. Standardizing such a process would lock in design errors. Therefore FRACAS (Failure Reporting And Corrective Action System) is used, which is a “data-driven process improvement”, which corrects observed failures so they will not recur. These reliability engineering concepts are being applied to medicine and particularly medical errors.

*P4P could be viewed as a measure of compliance to standardization.

9/06/07 - Third time’s a charm


There is an essay here on Bland Altman plots ( I had submitted this essay to two journals, but it was rejected by both. Therefore, I put the essay on this web site. Since I wish to refer to this essay in a publication, I tried again for publication and this time the essay was accepted and will appear in Statistics in Medicine. Because of this, I will remove the essay from here in the near future. The title of the publication is: "Why Bland Altman plots should use X, not (Y+X)/2, when X is a reference method".

The Statistics in Medicine online version is at:

8/22/07 - Broken Record Department

broken record

A recent Letter in Clinical Chemistry (2007;53:1715-1716) is another in the series that advocates using the right model for assay analytical performance (see the essay Unfortunately, I didn’t see an error in the proofs. The sentence, “Because neither controls nor pooled samples are used in a proficiency survey, random patient interferences cannot be estimated.” Should be “Because either controls or pooled samples are used in a proficiency survey, random patient interferences cannot be estimated.”

This Letter is about creatinine and the authors replied (2007;53:1716-1717). All of this is quite similar to an earlier Letter about glucose in Clinical Chemistry (2001;47:1329-30) and the authors’ reply (2001;47:1330-31).

Using the right model creates some experimental difficulties, but it is still worth it.

8/21/07 - CAPA versus FRACAS

confusedCAPA - Corrective Action Preventative Action
FRACAS – Failure Reporting And Corrective Action
FMEA – Failure Mode effects Analysis



Unlike FMEA, some (probably many) people have never heard of FRACAS. When I was explaining FRACAS to some people, someone said “oh that’s CAPA, we do that now.” Although CAPA and FRACAS share features, there are key differences.

Timing – FRACAS deals with products before release for sale, and CAPA with products after release for sale. FRACAS however, can also be continued after products are released.

Responsibility – In medical device companies, FRACAS is usually conducted by R&D and CAPA by service and manufacturing. Whereas, this may not sound like a big difference, it is. For example, service is more concerned with keeping customers happy than with corrective action.  

Data source – In CAPA, there are two data sources, (customer) complaints and (manufacturing) nonconformities. This sets up the possibility for two CAPAs, which may not talk to each other; namely, a CAPA in manufacturing to deal with nonconformities and a CAPA by service to deal with customer complaints. In a FRACAS that is conducted before release for sale, the data source is “events”. An event is an observed action that has the potential to cause harm, increased cost, a return, a complaint, and so on. Note that not all events will lead to complaints. For example, a clinician may disregard an erroneous result and not complain about it.

Metrics – While anything is possible, the reliability growth management metrics associated with FRACAS are almost never used with CAPA.

Regulations – FDA requires medical device companies to have procedures in place to address nonconformities and complaints. This is traditionally handled by CAPA. There is no requirement for FRACAS.

To sum up, is CAPA the same as FRACAS? No, not by a long shot.

7/23/07 - Rick Miller


I was sorry to hear that Rick Miller passed away. I knew Rick since the late 70s when we were both at Technicon Instruments. Rick was in a quality role then, where he always put the interests of customers first. We both left Technicon, but I continued to see Rick at NCCLS (now CLSI) meetings. Rick was the chairholder of the CLSI subcommittee on uncertainty intervals. Rick asked me to become a member of that subcommittee, in spite of the fact that he knew I was against establishing uncertainty intervals for clinical laboratories using GUM (the ISO document on uncertainty intervals). He wanted me on the committee for my opposing opinions. I was impressed that he did this – it was the right thing to do, but not the easy thing to do. I think others would have taken the easy route – not Rick. I enjoyed being a part of that subcommittee, under Rick’s leadership – it was one of the most open subcommittee experiences I had, whereby the many different opinions were all allowed to be heard. It will be difficult to find a replacement for Rick on this subcommittee.



7/23/07 - A little math or some things you can do with regression equations


Given a regression equation for method comparison data, here are some simple things can one do, besides the usual. In what follows, Y is defined as the new method and X as the reference method.

If the slope is greater than 1.0 and the intercept greater than 0, Y will never equal X, when X is positive, Y will always be greater.

If the slope is less than 1.0 and the intercept less than 0, Y will never equal X, when X is positive, Y will always be less than X.

If in an Excel file, the slope is in cell A2 and the intercept in cell A3, the point where Y=X is given by the cell formula: = A3/(1-A2).

One can also prepare a table of biases for relevant regions. For example, for a sodium assay,






Pct. Bias









































































Thinking7/04/07 Patient safety – it’s not embryonic

I had occasion – thanks to a helpful reference librarian at the Lamar Soutter Library at U. Mass. Medical School - to read an entire issue of Clinical Chemistry and Laboratory Medicine devoted to “laboratory medicine and patient safety.”

One of the first things that struck me is that so many articles started out a reference to the Institute of Medicine report on patient safety (1). Hmmm, seems like one of my articles started out this way too (2). I’m getting tired of the use of this reference – in most cases it just boilerplate so that’s ok – but sometimes it’s not. For example, in a section that follows the reference, Donaldson says (3)

“Many adverse event detection systems are embryonic, particularly in the effective analysis of risks and hazards.”

 This makes one think that we are just getting started with tools and techniques to reduce preventable medical errors. This neglects the anesthesiology story.

Back in the 70s, anesthesiology had a high preventable medical error rate. Yet, without an Institute of Medicine report or regulations, a group at Massachusetts General Hospital studied why this error rate was so high (4-5), using techniques from aviation. So even 30 years ago, these techniques were not embryonic, they had just not been applied effectively to anesthesiology. Shortly after this initial work, prevention strategies were developed. The only outside event that occurred was a 20/20 television show about the dangers of anesthesiology that aired in 1982 and undoubtedly helped in more widespread implementation of the prevention strategies.


1.       Kohn LT, Corrigan JM, Donaldson MS, editors. To err is human: building a safer health system. Washington, DC: Institute of Medicine, National Academy Press, 2000.

2.       Krouwer JS Recommendation to treat continuous variable errors like attribute errors. Clin Chem Lab Med 2006;44(7):797–798.

3.       Donaldson L Foreword Clin Chem Lab Med 2007;45(6):697–699.

4.       See;jsessionid=GKGJw17GTqY0NMY8mN6RndvWspLF7n2SstK4FbQr2w2xwF7wTyJh!-9948752!181195628!8091!-1

5.       Cooper JB, Newbower RS, Long CD, McPeek B: Preventable anesthesia mishaps: A study of human factors. ANESTHESIOLOGY 1978; 49:399-406. An online version of Paper 5 can be found at

5/22/07 Why I'm still mad after four years

madI don’t brood over things, but recently I had occasion to revisit something which still bothers me. The CLSI (formerly NCCLS) document EP11 which is about uniformity of performance claims was a controversial document, which I previously discussed. There was pressure from CLSI management to cancel this document, which resulted in a CLSI strategy conference for Evaluation Protocols in 2003.

This conference was facilitated by Ed Kimmelman, who in my opinion did a poor job. Everything had to go through him which was then reissued by him, often in a garbled form. I called this a filter (scroll down to:  5/14/06 – Beware of the filter) not a facilitator.  This was a lost opportunity but what made me mad was a report circulated to the attendees – 30 or so influential people in the field – in which Kimmelman made the following statement on page 1 under “Background”:

"NCCLS management has the belief that the process of developing evaluation protocols within the NCCLS consensus process can be improved.

In recent years it has become apparent that there has been difficulty moving certain evaluation protocols through the consensus process due to a number of reasons, including

- dissatisfaction with the content of such protocols,
- dissatisfaction with the constituency representation on the Evaluation Protocols Area Committee, and
  perceived failure of NCCLS and area committee management to meet their responsibilities under the NCCLS Administrative Procedures"

I have highlighted the item in yellow that most irritated me, although the rest of the statements are also not true. So basically, my management – I was the area committee chairholder – was being questioned. Therefore, I put together some facts about – and have now updated – document status when I was chairholder (1999-2004), before and after.

Evaluation Protocol Project Activity: 1999-2004

Document - Last pre 1999 Action Formal title

Status 1999

Status 2004

After 2004

2002 Core: Ave. Sales / year






EP5T – 1992
Evaluation of Precision Performance of Clinical Chemistry Devices


EP5A2 published 2004



EP6P – 1986 - Evaluation of the Linearity of Quantitative Analytical Methods

No action for 13 years!

EP6A published 2003



EP7P – 1986 Interference Testing in Clinical Chemistry

No action for 13 years!

published 2002



EP9A – 1995 Method Comparison and Bias Estimation Using Patient Samples

No action for 7 years

published 2002



EP10A – 1998 Preliminary Evaluation of Quantitative Clinical Laboratory Methods


EP10 A2
published 2002

EP10 A3
published 2006


EP11P – 1995 Uniform Description of Claims for in Vitro Diagnostic Tests

No action for 8 years

EP11 cancelled by Board 2003



EP12 Project approved – 1986 User Protocol for Evaluation of Qualitative Test Performance

No action for 13 years!

published 2002



EP13R – 1995 Laboratory Statistics—Standard Deviation





EP14 Project approved - ? Evaluation of Matrix Effects


published 2001

published 2005


EP15P – 1998 User Demonstration of Performance for Precision and Accuracy


published 2001

published 2006


EP17 Project (see note) Protocols for Determination of Limits of Quantitation


published 2004



EP18 Project approved - ? Quality Management for Unit-Use Testing


published 2002



EP19 Project approved – 1997 A Framework for NCCLS Evaluation Protocols


published 2002



EP20 Project approved – 1998 Quality Goals for Acceptable Performance and Threshold Criteria for Outliers


cancelled 2003



EP21 Project approved – 1998 Total Error for Clinical Laboratory Methods


published 2003


No Data


EP10 - I was the chairholder of EP10A, EP10A2 and EP10A3.
EP17 - an earlier version had been cancelled in 2000, EP17 was re-approved 2001
EP19 - This was published as a "P" document in 2000, then as a "R" (Report)
EP20 - The chairholder resigned in 2001, a new subcommittee was started 2002, the project was cancelled by the area committee.
EP21 - I proposed and led EP21.
The last column refers to a financial tracking system, I put in place for all CLSI documents.
Core refers to sales - 1: most sales, 2: moderate sales, 3: least sales.

Of the 14 projects for which action was expected – a chairholder serves for six years – every document was either advanced or cancelled during my term. This included three documents for which no action had taken place each for 13 years! (e.g., more than two chairholder terms).

I called the head of NCCLS and requested an apology in writing to be distributed to correct the Kimmelman statement. After a heated exchange, I was later told that I might not serve my final year as chairholder and the next area committee meeting might be cancelled. Well, neither of those things happened. The area committee took place with the president and president-elect in attendance, who tried to get me again to cancel EP11. I didn’t – EP11 was advanced and the Board cancelled it.

Dan Tholen as the vice-chairholder and who supported me often in writing, was slated to replace me but was not invited by NCCLS to be chairholder. I think that NCCLS reasoned that they could put up with me for one more year to finish my term, but putting up with a new chairholder with similar views for an additional six years was to be avoided.

So in all, there are battles within organizations to get one’s way – that’s not the issue. What bothered me was that I was attacked publically for poor performance and this was not true. I never got my apology – the other people all won awards.

These days, I still contribute to CLSI. I gave a recent talk at their annual forum, am the chairholder of an existing project (EP18) and have just proposed a new project.

4/20/07 Beware of Equivalent Quality Control

car crashI attended the 2007 CLSI forum, which took place on April 20th in Baltimore. I got to speak about Evaluation Protocols. There were several highlights of the conference, one of which was to hear Neill Carey speak about Evaluation Protocols. His presentation is so clear that it is easy to see why his workshop at the AACC annual meeting is so popular. Another highlight was to hear Marina Kondratovich and Kristen Meier, both from the FDA, describe some of the statistical inadequacies that they encounter when reviewing FDA submissions.

I was also struck by the presentation of Judy Yost from CMS. She gave an update about equivalent (or alternative) quality control (EQC), which is a program which allows clinical laboratories under certain circumstances to reduce the frequency of external quality control to once a month. I presented an AACC session to show that this policy is not supported scientifically.

Ms. Yost ignored anything that I and others have said regarding EQC and went on to say that EQC has been a success story for clinical laboratories that have been using it, meaning that the inspection process for these laboratories has not uncovered any problems related to the reduced frequency of external quality control. This makes me think of an analogy. If someone took out the airbags in their car and stopped wearing seat belts and didn’t get into an accident, they might claim that one doesn’t need airbags or seatbelts because they have had no injuries without their use.

Changing the frequency of external quality control changes the risk of adverse events. Ms. Yost’s assertion that EQC is working in clinical laboratories that are using it because of successful inspections does not inform about the change in risk. Jim Westgard got up and questioned Ms. Yost about the lack of scientific basis of EQC. Okay, I have some differences with some of Jim Westgard’s writings, but not only am I in agreement with him on this issue, I applaud him for getting up and asking these questions. It’s the right thing to do and demonstrates leadership.

Two years ago, a new CLSI subcommittee was formed to provide a scientific basis for manufacturer’s recommendations for external quality control. Whereas the work of this subcommittee is still in progress, the scope of the subcommittee has been changed – it will not provide guidance for manufacturer’s recommendations for external quality control. This means that there will continue to be no scientific basis for EQC.

CLSI has many valuable Evaluation Protocol standards about analytical performance. They have an opportunity to develop and promote standards in risk management. These are sorely needed.

4/14/07 - Not good advice on how to conduct FMEA

I had occasion to view a presentation about FMEA, presented at the 2006 CLSI forum. It may be viewed at There are some serious issues with this advice on how to perform a FMEA, which can be summarized as follows.

Detection is listed as an item to be classified (added to severity and probability of occurrence). I have advised against this previously.

The RPN (risk priority number) is examined after mitigations have been put in place. See this essay, as to why this can cause problems.

And perhaps worst of all, patient safety events and potential non patient safety events are in the same classification scheme. For example,  10 = injury or death, 9 = regulatory non compliance. This means that in a Pareto chart, one could be worrying about documentation issues more so than killing someone – sorry but that’s a fact.

Severity = 10, probability of occurrence = 1, detection = 5, RPN = 50
Severity = 9, probability of occurrence = 8, detection = 5, RPN = 360

I covered this in detail in my book, Managing risk in hospitals.

4/13/07 - You get what you ask for

I have written before about the difference between horizontal and vertical standards. ISO/TC212 produces standards for the clinical laboratory. The following came from a talk by Dr. Stinshoff, who has headed the ISO/TC212 effort. The red highlights are from Dr. Stinshoff.

“ISO/TC 212 Strategies:

– Select new projects using the breadth and depth of the expertise gathered in ISO/TC 212; focus on horizontal standards; address topics that are generally applicable to all IVD devices; and, limit the activities of ISO/TC 212 to a level that corresponds to the resources that are available (time and funds of the delegates).

– Assign high preference to standards for developed technologies; assign high preference to performance-oriented standards; take the potential cost of implementation of a standard into consideration; and, solicit New Work Item ideas only according to perceived needs, which should be fully explained and supported by evidence.

– Globalize regional standards that have a global impact”


What is meant by performance-oriented standards?
“ISO Standardisation
Performance vs. Prescriptive Standards:

Whenever possible, requirements shall be expressed in terms of performance rather than design or descriptive characteristics. This approach leaves maximum freedom to technical development….

(Excerpt of Clause 4.2, ISO/IEC Directives, Part 2, 2004)”

So one reason ISO/TC212 produces horizontal standards is because that is their strategy.

4/5/2007 European and US clinical laboratory quality

I am somewhat skeptical about the statement in a recent Westgard essay which suggests that Europeans  who use ISO 15189 to help with accreditation are more likely to improve quality in their laboratories than US laboratories, who just try to meet minimum CLIA standards. ISO 15189 is much like ISO 9001, which  is used for businesses. I have previously written that ISO 9001 certification plays no role in improving quality for diagnostic companies (1). As an example of ISO 15189 guidance – albeit in the version I have which is from 2002 – under the section “Resolution of complaints”, ISO 15189 says the laboratory should have a policy and procedures for the resolution of complaints. In ISO 17025, which is a similar standard, virtually the identical passage occurs.

Westgard mentions that clinical laboratories need a way to estimate uncertainty that is more practical than the ISO GUM standard and mentions a CLSI subcommittee which is working on this. A more practical way will be unlikely. I was on that subcommittee. I didn’t want to participate at first, since I don’t agree that clinical laboratories should estimate uncertainty according to GUM (2). However, the chair holder wanted me for my contrarian stance, so I joined. I must say that I enjoyed being on the subcommittee, which had a lot of smart people and an open dialog. However, I was unable to convince anyone of my point of view and therefore resigned, because it would make no sense to be both an author of this document and reference 2. The last version of this document I saw was 80 pages long (half of it an Appendix) with many equations. This version will not be understood by most (any?) clinical laboratories. However, there is a CLSI document that allows one to estimate uncertainty intervals easily, EP21A, although not according to GUM.

What is needed to improve clinical laboratory quality anywhere? Policies that emphasize measuring error rates such as FRACAS (3).


  1. Krouwer JS: ISO 9001 has had no effect on quality in the in-vitro medical diagnostics industry. Accred. Qual. Assur. 2004;9:39-43.
  2. Krouwer JS: A Critique of the GUM Method of Estimating and Reporting Uncertainty in Diagnostic Assays Clin. Chem., 49:1818 -1821 (2003).
  3. Krouwer JS: Using a Learning Curve Approach to Reduce Laboratory Error, Accred. Qual. Assur., 7: 461-467 (2002).

3/30/2007 Better automation for clinical chemistry

I first heard Martin Hinckley speak at an AACC forum. That talk was published in Clinical Chemistry, 1997;43:873-879.

A new article is available at (I suspect this link will work for a limited time).

This article deals with automation and how it has not lived up to the expectation that it would greatly improve quality. Hinckley offers some interesting advice regarding how to improve the implementation of automation.

2/18/2007 - You're either part of the problem or part of the solution

Westgard bemoans the current process of establishing performance claims for assays.  He states that

“There is one major fault with this approach [precision, accuracy, linear range, reference range(s), etc.]. Manufacturers do not make any claim that a method or test system provides the quality needed for medical application of the test results, i.e., FDA clearance does not require a claim for quality! To do so, a manufacturer would need to state a quality specification, e.g., the allowable total error, the maximum allowable SD, or the maximum allowable bias, then demonstrate that the new method or test system has less error than specified by those allowable limits of error.”

You’re either part of the problem or part of the solution. In this case, Westgard is part of the problem. His suggestion of allowable total error as stated above sounds good, but as I have pointed out many times,

  • Westgard’s maximum allowable total error is for a specified percentage of results – often 95% - which allows for too many results to fail to meet clinical needs

  • Westgard’s suggested testing procedures as described by his quality control rules fail to include all contributions to total error

Thus, 5% of a million results means that there could be 50,000 medically unacceptable results – that’s not quality. When one tests with control samples, one cannot detect interferences, which is often a source of important clinical laboratory errors so all of Westgard’s control quality algorithms for total error are meaningless – they inform about a subset of total error.

Things are improving. In the FDA draft guidance for CLIA waiver applications, FDA requires use of error grids (such as those in use for glucose) and demonstration of lack of erroneous results as defined in those error grids in addition to total allowable error. Many of my essays stress the need to go beyond total allowable error – as used by Westgard – and to put in place procedures to estimate erroneous results (1).


  1.  Jan S. Krouwer: Recommendation to treat continuous variable errors like attribute errors. Clin Chem Lab Med 2006;44(7):797–798.

2/06/07 FMEA made simple

Sometimes, something is presented that is so clear that it is worth reproducing. This is FMEA in a nutshell from the following source.


 1/24/07 Whooping Cough and False Positives

There has been a recent incident that has set the quality folks abuzz. As reported in The New York Times, a hospital treated a number of its workers for whooping cough, due to a positive test for that condition. It was later determined that no one had whooping cough – all of the test results were false positives. In a standards committee, I cited this article as an example of why it is important to perform a FMEA (Failure Mode and Effects Analysis) as there has been some resistance that FMEAs are too complicated for hospital laboratories.

Westgard cited the Times article as a reason to stress the need for method validation skills. I agree with most of what he says although I suggest that in addition to performing a method validation, one must also consider pre- and post-analytical issues - a reason to perform FMEA.

However, I disagree with one of Westgard’s points:

“Finally, there are those damned statistics that get in the way of a practical understanding of experimental results. As evidence of this problem, Clinical Chemistry (the International Journal of Molecular Diagnostics and Laboratory Medicine) recommends that authors utilize the Bland-Altman approach (difference plot with t-test statistics) for analyzing method comparison data, in spite of the fact that regression techniques are usually much more informative, particularly in identifying proportional analytical errors that invalidate the error estimates from t-test analysis. Evidently, laboratory scientists are not sophisticated enough to understand and utilize regression analysis correctly. That again speaks to the inadequacy of our education and training programs and the lack of proper guidance in the validation of molecular tests, even by a leading international journal.”

This advice is incorrect. Total error is more informative than regression and a better first step in assessing assay performance. Proportional error does not invalidate the t-test. Among the methods for assessing total error are:




CLSI** Standard

Model:  combine systematic and random errors

Can discard outliers, model can be wrong, only accounts for 95% of results, specs are for components

Westgard, Peterson, others

None, but components based on EP5 and EP9

Model: GUM

Can discard outliers, model can be wrong, very complicated, only accounts for 95% of results


Clin. Chem. 51

Bland Altman

Can discard outliers?, normal data assumption can mislead, only accounts for 95% of results

Bland Altman


Mountain Plot

Need a lot of data

Krouwer, Monti, Lynch


*Or champions **Clinical and laboratory standards institute

If total error is unacceptable, further analysis may be warranted, such as regression.

10/22/06 - Potential Risks and Real Problems

I’ve never been that good at making up potential problems – so here’s a real one: “Roche Diagnostics Announces Nationwide Recall on Medical Device Used to Determine Blood Clotting Time” see - .

It’s not my intention to criticize Roche as problems occur across all companies. What’s more of interest is to illustrate some observations of standard’s groups, particularly as there is one standard in development (CLSI EP 22) which attempts to allow manufacturers to reduce QC (quality control) frequency of diagnostic assays by using risk management techniques.

Although (as an advisor to the group) I have not been to any of the face-to-face meetings, I have participated in EP22 teleconferences and have been at many other CLSI face-to-face meetings. Such meetings are often held in hotel conferences rooms, where there are comfortable chairs, an ample supply of fresh coffee, croissants, and the promise of lunch around the corner. As one relaxes, one might hear how through fault tree analysis, the potential problem ABC has mitigation XYZ applied to it so that it will never occur.

Yet in the real world, things are not that simple (or slow paced). Using the formal language of risk management, one must also consider

  • have all potential problems been identified
  • are all mitigations 100% effective – many are not
  • what about the softer, non technical issues that hardly ever appear in a fault tree regarding staff, materials, and the working environment

One of the advantages of quality control is that for some problems, quality control will expose problems, regardless of whether one has knowledge of the problem source. Ironically, as a consultant I have heard during discussions where a problem has occurred - "that shouldn't be an issue if customers run QC".

6/23/06/ - Medical diagnostics industry participates in fake news

You may (or may not) be aware that some news stories aired by television news stations are provided by companies and the news station fails to disclose this. Hence, this is often referred to as “fake news”. The medical diagnostic industry participates in fake news. For a segment on allergy testing provided by Quest Diagnostics and aired by KABC-7 (Los Angeles), go here.

6/11/06/ - 'Sick' Sigma and zero defects


Two recent articles in Quality Digest received a lot of comments. The articles are here:

Sick Sigma

Zero Defects

Here are my comments (1).

Re: Sick Sigma

Dr. Burns questions the origin of the 1.5 drift that is part of a six sigma process and pretty much implies that having a 1.5 sigma bias is intended as part of six sigma. Does he really believe that someone will build in a 1.5 sigma bias into their product to conform to six sigma? He then goes on to talk about control limits. This is irrelevant with respect to attribute defects in many cases, such as in medical errors. One can have attribute control limits, say to control the proportion of bad pixels in an LCD screen. But for wrong site surgery, control limits make no sense since the allowable number of defects is zero. Yet, one can still count defects and relate them to six sigma terms, and improve the process until the observed defect rate is zero. For more thoughts, go to


Dr Tony Burns responds

Unfortunately there are hundreds of thousands of people who do build in the 1.5 sigma bias. There are millions of people using 3.4 ppm based on the 1.5 drift. A quick google search revealed a quarter of a million sites promoting averages unavoidably drifting by 1.5 sigma. It all started with a theoretical error by Mikel Harry, that no one bothered to check. The erroneous theoretical drift over 24 hours then became an empirical "long term" drift. The 1.5 drift is nonsense. It has set the quality world backwards by many years. World class quality can only be "on target with minimum variation".

You are mistaken. Control limits do apply to attribute data. Attribute control charts are of 4 main types, "pn", "p", "c", "u". I suggest that you read any basic text on Statistical Process Control (SPC), such as "Analysis of Control and Variation" by John McConnell. Before speaking to your clients, I would strongly suggest reading more advanced texts such as Don Wheeler's "Understanding SPC" and "Advanced Topics in SPC". In the example that you quote, either a pn, p or an XmR chart might be used, depending on the details of the situation. The texts mentioned above will describe how to calculate and draw the appropriate control limits.

Jan replies

Thanks for your reply.

 As for people actually building in the bias, this is not my experience. I’m not sure how you can come to your conclusion from a Google search.

 Yes, you are right about control charts for attribute data. Perhaps I got carried away. But my point was based on reading your article. Let me state it differently. If one is producing LCD screens, one can set up attribute charts to control the number (or proportion) of bad pixels in a screen. But in my area of interest – medical errors – one does not do this. For example, the proportion of wrong site surgeries has been estimated at 0.00085%. This not an acceptable rate (which of course is zero) so there is no control chart that one can set up as one does not wish to control to an acceptable level of defects (e.g., >0). One continues to measure the rate and improve the process until one is observing a rate of zero. (After which, one still measures but does not change the process.) Six Sigma is sometimes used as a benchmark in medical error opinion articles. That is, one would rather have a six sigma than a three sigma process since less medical errors are implied. But for serious medical errors, a six sigma process is unacceptable.

 As it turns out, I am not a fan of Six Sigma and I am suspicious of all of these people who have no experience analyzing data all of a sudden becoming experts (black belts).

Dr Tony Burns responds

Regarding six sigma's 1.5 sigma bias, perhaps I should explain further.  Anyone who is using six sigma tables to calculate a "sigma level" for a process, is making the assumption of a 1.5 sigma bias.  Anyone quoting 3.4 DPMO has assumed the erroneous 1.5 sigma drift in averages.  Not only is the bias fallacious, but the assumption of process normality used in six sigma tables is grossly in error, as is using counts at the extreme tail of any distribution as an estimator of the distribution's dispersion (sigma).  I have drafted a second paper "Tail Wagging It's Dog" that has been submitted to Quality Digest, which describes the latter in more detail.  You may wish to read more in the various papers referenced at our site

Attribute control charts can fortunately still come to the rescue, even with rare events such as you suggest. Chapter 11.9 "Advanced topics in SPC" gives a lovely example of how to use an XmR chart for this purpose.  Zero wrong site surgeries may be a desirable target but chaos, human error and variation will inevitably occur, even in the most ideal system.
Comparisons such as "... rather have a six sigma than a three sigma process since ..." are meaningless. Six sigma relates to the specification, that is, the voice of the customer.  Three sigma relates to the voice of the process. The specification may be set at any level you wish, four, five, six, seven sigma, whatever. The voice of the process is always three sigma. 
I won't get started on black belts.  I feel quite sorry for these unfortunate people who are grabbed from the shop floor and expected to become overnight statisticians and process magicians.  It baffles me how they can be expected to understand Students-T and Box Wilson experimental design, when they clearly don't understand the even the meaning of sigma.
Jan replies

Well, I don’t belong to the “anyone” group, lol cause I don’t assume a 1.5 sigma bias when I calculate DPMO, but I get it and agree with you about the origin of the 1.5 sigma bias. This has always been mysterious to me – the 1.5 bias and Normal assumptions, because one can always calculate DPMO, without knowing anything about Six Sigma (or assumptions about the data) yet one can relate DMPO to the defects expressed by a table (e.g., 6 sigma = 3.4, 5 sigma=I forgot the number, etc.). So in this sense, six sigma is just a level of desirability with six sigma = very good, five sigma = pretty good, etc.

My other point was mainly to express the fact that one does not use quality control rules for a process whose desired defect level is zero. I realize that defects still may occur. So one uses risk analysis tools such as fault trees and calculates probability of failure events. If the probability of a failure event is low enough (the goal is never zero), then one can have both acceptable risk and zero defects (zero not theoretically, but zero for practical purposes – e.g., one failure event in the next million years).

I will look at your web site and thanks for stimulating me to think about things more.

Dr Tony Burns responds
Being able to compare processes by quoting a "sigma level" is appealing to management, however it simply doesn't work.  For example, consider comparing two processes, one of which has a histogram skewed to the left and the other to the right. They may both appear "very good" with an assumption of normality, however one might be far more poorly controlled than the other.  The situation is even worse because defects counts and "sigma levels" give no indication of process capability.  A process can only be "capable" (of producing product or service within specification) that is, "very good", if it is "in-control".  If the process average is drifting, as assumed in six sigma, the process is out of control and therefore unpredictable.  An unpredictable process is certainly not a "very good" thing.  Only control charts and histograms can give this information.
Thank you for your final comment.  My aim is to encourage people to question the accepted norms (no pun intended) rather than to accept them blindly.

Re: Zero Defects

Mr. Crosby would have us believe that a zero defects program is inexpensive, especially compared to six sigma. Well, I remember spending 5 days in Corning, NY as part of Total Quality training (which included zero defects, a la Crosby). Everyone in our company (Corning Medical, Medfield, MA) received training in quality. The cost of this program across Corning must have been substantial.

Mr. Crosby also talks about the zero defects concept as “Work right the first time and every time” and the performance standard is that “No defects are acceptable.” This gives one the impression that without this program, inept engineers and production staff are creating poor quality products and if only they had this quality training ... . Well, things aren’t that simple. The number of defects in a design relate to the state of knowledge of the technology. No defects are possible when the state of knowledge is high. However, for many systems, the state of knowledge is not high enough to design a product with no defects and a common and efficient development process is to go through a test and improvement loop until the number of problems reaches an acceptable level and this number is not zero. For more thoughts, go to  


Well I realize that in my comments about Sick Sigma I talk about processes whose defect rate should be zero (e.g., wrong site surgery) and in the comments about "Zero Defects" I talk about processes with allowable defects rates greater than zero (e.g., proportion of bad pixels in an LCD screen) but that's the real world.


1. Sick sigma comments amended after an email from Dr. Burns, the article's author, since I had provided comments to the journal in which his article appeared.

5/15/06 - Not a member of the club

A Wall Street Journal article discusses the role of the New England Journal of Medicine in the Vioxx affair (1). An aspect of the article that caught my attention was the attempt by a pharmacist, Dr. Jennifer Hrachovec, to make known the dangers of Vioxx.

She first tried to do this during a radio call in show which had as one of its guests, Jeffrey Drazen, the editor of the New England Journal of Medicine. He blew off her comments.

She next submitted a Letter to the editor to the New England Journal of Medicine. It was rejected.

Finally, she was able to get a Letter published in JAMA, the Journal of the American Medical Association.

I can relate to this sequence of events and suggest that part of the problem is that however relevant and correct a person is on an issue, the person’s issue may not be taken seriously if that person is not “a member of the club”. Journals such as the New England Journal of Medicine have so many submissions that they are always looking for ways to reject papers. I suspect that one criteria used is simply the status of the person submitting the paper. Fortunately, Dr. Hrachovec persisted. For me, when someone blows off my comments, it is a source of motivation, and I have had my share of rejected Letters.


1. Bitter Pill How the New England Journal Missed Warning Signs on Vioxx. David Armstrong Wall Street Journal, May 15, 2006, page A1.

5/14/06 – Beware of the filter


Facilitators play an important role in quality activities. For example, they often lead training and brainstorming sessions. Brainstorming is a key part of FMEA (Failure Mode Effects Analysis) and fault trees. Whereas this blog entry is not meant to be a summary of what makes a good facilitator, I was recently reminded of a problem with some facilitators; namely that of the filter.

Filters are people who while serving as facilitators, feel compelled to have all information go through them. The filter then re-releases the information but in a changed form. That is, whatever was originally submitted to the filter becomes changed into a form that the facilitator understands (which may or may not be the same as the person who had the original idea). In some cases the facilitator changes the way the information is presented by rewriting it or restating it (e.g., out loud). The latter must be familiar as one often hears, “now let me make sure I understand what you’ve said, you mean that, …., “.

There is nothing wrong with the concept of a filter, since in principle, a filter could make ideas more clear and if nothing else ensures that an idea is understood as intended. Whereas this is often useful - sometimes essential - between two people, the danger is when the filter is used in a group setting and the filter makes ideas less clear, changes ideas, or omits ideas.

I recall a CLSI (formerly NCCLS) strategy session a few years ago. I had prepared a list of issues which the facilitator rewrote. My list had already been read by the head of the organization with a few minor changes so this rewrite by the facilitator seemed completely unnecessary and more importantly, it failed to capture the issues as clearly as I had and at the same time dropped some issues. So the strategy session took place without the right list of issues and during the strategy session, all material went through the facilitator, as in “now let me make sure I understand what you’ve said, you mean that, …., “. The facilitator of course also wrote up the results of the meeting. In all, this was a lost opportunity, largely driven by a filter.

So a better way is for the facilitator to assemble all ideas through a consensus process. The final product may have some editing for readability but without the effects of a filter.

4/30/06 - Translocational research – Excuse me?


From a perspective on the AHRQ web site, Dr. Robert Wachter wrote in September of 2005: “It strikes me that much of the progress that we have made in the patient safety field over the past decade reflects a different kind of translational research: the translation not of basic research discoveries into clinical applications, but of insights and practices from non–health care fields into health care. To highlight the movement from non-medical fields into medicine, I propose that we call this translocational research.” (1).

I submitted the following comment about this perspective, but it got translocated into the recycle bin.

Dr. Wachter proposes that the mechanism of transferring practices from non healthcare fields into healthcare be called "translocational research". Yet, this practice already has a name: "technology transfer." If one puts the words "technology transfer" (with quotes) into the Google search engine, one gets over 32 million hits. Much has been written on the subject (2).

As someone who has spent time transferring engineering reliability tools into healthcare, I note whenever possible, the terminology should remain the same so that it is understandable to practitioners in the original field. Hence, I don't see the need for inventing a new term.


  1. See
  2. Davenport H and Prusak L. Working knowledge: How organizations manage what they know. Cambridge, MA: Harvard Business School Press, 1998. 

4/29/06A - “I so anoint you”

Don Berwick, the president of the Institute of Healthcare Improvement has referred to one of his coworkers several times as “Tom Nolan, one of the leading quality-improvement scholars of our time” (1-2). Now as for leading quality improvement scholars, I’ve heard of Deming, Juran, Crosby, and Taguchi but until I came across references 1 and 2, I hadn’t heard of Tom Nolan. You be the judge (3).


  1. Errors Today and Errors Tomorrow Berwick DM N Engl J Med 2003;348:2570-2572.
  2. Editorial in The Washington Post, July 29, 2003.
  3. See,

4/29/06B - “I so anoint myself”

“The following list presents 10 persons who have made a significant impact on the IVD industry.” This is how the magazine IVD Technology begins and then gives a short description of each of the 10 people (1). Two of the people listed in the top 10 happen to be on the editorial advisory board of IVD Technology (2). Hmmm…..

About half of the editorial advisory board are in regulatory affairs and four of the top 10 are also in regulatory affairs (including the two above). In case you’re wondering, Leonard Skeggs, the inventor of the auto-analyzer didn’t make the list! OK, to be fair, the text also says “Efforts were made to ensure that this list reflects contributions in both the regulatory and scientific areas.”, but the title and first sentence are misleading.


  1. Top 10 Persons in the IVD Industry IVD Technology April 2005, see  
  2. See,

4/29/06C - “No reaction”

In November of 1998, I was invited to attend the chairholder’s council of NCCLS (now called CLSI). This is a meeting of the leaders of the committees that produce clinical laboratory standards. During the meeting, NCCLS started a quality initiative kicked off with a keynote speech and rationale for the program by David Nevalainen, listed at the time as from the Abbott Quality Institute (1). He presented a quality system quite similar to ISO 9000. I commented at the presentation that in my experience, ISO 9000 (upon which the NCCLS quality system is based) has had virtually no impact on quality in industry. (I believe this is still true). There was no reaction to my comment.

One year latter, I was attending the November 1999 chairholder’s council. In the lobby of the hotel, I was reading the Wall Street Journal when I noticed that one of the top stories was about an FDA fine for Abbott quality problems. The fine was for 100 million dollars and ordered Abbott to stop selling certain assays (2). When I tried to point out to NCCLS senior management the connection among the NCCLS quality system, Nevalainen, and Abbott, I got no reaction.


  1. See, for example:   
  2. Abbott to pay $100 million in fine to U.S. The Wall Street Journal, November 3, 1999.

4/29/06D - “If it isn’t in ISO, it doesn’t exist”


There is a CLSI subcommittee that deals with risk management. One of the European participants had trouble with the word mitigation as in the term “risk mitigation.” It was pointed out that the ISO standard on risk management 14971 does not contain the term risk mitigation primarily because of translation difficulties and therefore, the CLSI standard should not use this term.

Now this translation problem baffles me as ISO standards are in English. Moreover if one does a search in Google for risk mitigation, one will get over 4 million hits.