Study Design & ProbabilityApril 18, 20265 min read

Q-Bank Breakdown: PPV & NPV — Why Every Answer Choice Matters

Clinical vignette on PPV & NPV. Explain correct answer, then systematically address each distractor. Tag: Biostatistics > Study Design & Probability.

You just missed a question on PPV/NPV, and the explanation says “depends on prevalence” — but the distractors all sounded kind of true. That’s exactly why these biostats items are sneaky: they’re not testing whether you can recite definitions, they’re testing whether you know which probability you’re being asked for and which variables actually move it.

Tag: Biostatistics > Study Design & Probability


The Vignette (Q-bank style)

A new rapid antigen test is used to screen for Disease X in an outpatient clinic.

  • Sensitivity: 90%
  • Specificity: 90%
  • Prevalence of Disease X in this clinic population: 1%

A patient’s test returns positive.

Question: What is the approximate positive predictive value (PPV) of this test in this population?

Answer choices

A. 9%
B. 50%
C. 90%
D. 99%
E. PPV cannot be determined without knowing the sample size


Step 1: Translate the Ask (what are they really asking?)

They gave you:

  • sensitivity and specificity (test characteristics)
  • prevalence (pretest probability in the population)
  • a positive test result

They are asking:

  • PPV = P(DiseaseTest+)P(\text{Disease} \mid \text{Test} +)

This is a post-test probability conditioned on a positive test.


Step 2: Solve It Fast (2×2 table method)

Assume a population of 10,000 (makes 1% clean).

  • Prevalence 1% → 100 truly diseased, 9,900 not diseased

Now apply sens/spec:

  • Sensitivity 90% → True positives (TP) = 90% of 100 = 90

  • False negatives (FN) = 10

  • Specificity 90% → True negatives (TN) = 90% of 9,900 = 8,910

  • False positives (FP) = 990

Now compute PPV:

PPV=TPTP+FP=9090+990=9010800.083=8.3%PPV = \frac{TP}{TP + FP} = \frac{90}{90 + 990} = \frac{90}{1080} \approx 0.083 = 8.3\%

So the best answer is A. 9% (rounding).


Why This Happens: The “False Positive Avalanche” in Low Prevalence

Even with a “pretty good” specificity (90%), if the disease is rare, the non-diseased group is massive, so a small false-positive rate creates many false positives.

High-yield takeaway:

  • Low prevalence → PPV drops, NPV rises (holding sens/spec constant)
  • High prevalence → PPV rises, NPV drops

The High-Yield Formulas (know what moves what)

MetricMeaningFormula
SensitivityP(T+D)P(T+ \mid D)TPTP+FN\frac{TP}{TP+FN}
SpecificityP(T¬D)P(T- \mid \neg D)TNTN+FP\frac{TN}{TN+FP}
PPVP(DT+)P(D \mid T+)TPTP+FP\frac{TP}{TP+FP}
NPVP(¬DT)P(\neg D \mid T-)TNTN+FN\frac{TN}{TN+FN}

Also useful:

  • Prevalence (pretest probability): TP+FNTotal\frac{TP+FN}{\text{Total}}
  • False positive rate: 1specificity1-\text{specificity}
  • False negative rate: 1sensitivity1-\text{sensitivity}

Distractor Autopsy: Why Every Wrong Answer Is Wrong (and tempting)

A. 9% — Correct

This reflects the low prevalence. With many more healthy people than sick people, most positives are false positives, even when specificity is decent.


B. 50% — Tempting if you “average” sensitivity and specificity

Students often see 90%/90% and assume “coin-flip-ish errors don’t happen,” or they mentally anchor to 50% as a generic probability.

Why it’s wrong:

  • PPV is not the average of sensitivity and specificity.
  • PPV depends strongly on prevalence.

Quick check:

  • We found FP (990) massively outweigh TP (90) → PPV can’t be anywhere near 50%.

C. 90% — Classic confusion: mixing up PPV with sensitivity

This choice is attractive because sensitivity is 90%, and people incorrectly think:

  • “If test is positive, 90% chance disease.”

But sensitivity is:

  • P(T+D)P(T+ \mid D) (probability test is positive given disease)

PPV is:

  • P(DT+)P(D \mid T+) (probability of disease given positive test)

High-yield phrase:

  • Sensitivity and specificity condition on disease status.
  • PPV and NPV condition on test result.

D. 99% — Classic confusion: mixing up PPV with specificity (or NPV)

Specificity is 90%, not 99%, but 99% often appears as a “very certain” distractor when prevalence is low.

What 99% does resemble here:

  • With low prevalence, NPV can get very high (often >99%) if sensitivity is decent.

If the question had asked NPV, you’d compute:

NPV=TNTN+FN=89108910+1099.9%NPV = \frac{TN}{TN+FN} = \frac{8910}{8910+10} \approx 99.9\%

So 99% is wrong for PPV, but it’s a clue that you might be mixing up which post-test probability they want.


E. PPV cannot be determined without knowing the sample size — Wrong, but reveals a key concept

Sample size is irrelevant to PPV if you know the prevalence, sensitivity, and specificity. You can pick any convenient denominator (like 10,000) because PPV is a ratio.

What you do need:

  • Prevalence (or pretest probability)
  • Sensitivity and specificity

High-yield caveat:

  • If prevalence is not given, then yes — you can’t compute PPV/NPV from sens/spec alone.

Study Design & Probability Tie-In (how USMLE likes to frame this)

USMLE often embeds PPV/NPV in real-world decision-making:

Screening vs diagnostic testing

  • Screening tests are frequently used in low-prevalence populations → PPV can be surprisingly low.
  • A positive screening test often requires a confirmatory test with higher specificity (to reduce false positives).

Bayes principle (conceptual)

You’re updating from pretest probability (prevalence) to post-test probability (PPV/NPV). You don’t need full Bayes math on exam day if you can do the 2×2 table quickly.


Rapid-Fire High-Yield Rules (memorize these)

  • Prevalence increases → PPV increases, NPV decreases
  • Prevalence decreases → PPV decreases, NPV increases
  • Sensitivity rules out (SnNout): highly sensitive test, negative result helps rule out disease
  • Specificity rules in (SpPin): highly specific test, positive result helps rule in disease
  • False positives explode when disease is rare, even with “good” specificity

Your Test-Day Playbook (30 seconds)

  1. Identify whether you need P(DT+)P(D \mid T+) (PPV) or P(¬DT)P(\neg D \mid T-) (NPV).
  2. Choose a fake population size (usually 10,000).
  3. Apply prevalence → split diseased vs not diseased.
  4. Apply sensitivity/specificity → fill TP, FP, TN, FN.
  5. Compute the asked ratio.

That’s it — and it immunizes you against almost every distractor.