Q-Bank Breakdown: Power & sample size — Why Every Answer Choice Matters | StepGenie Blog

Power and sample size questions are sneaky because they’re not really about memorizing one definition—they’re about knowing which knob to turn (effect size, $\alpha$ , power, variability, allocation ratio) and predicting what happens to Type I/II error and confidence intervals. On test day, every answer choice is basically a different “knob,” and your job is to recognize what it does.

Tag: Biostatistics > Study Design & Probability

The Vignette (Q-bank style)

A randomized controlled trial tests whether a new antiplatelet drug reduces 30-day mortality after acute MI compared with standard therapy. Investigators expect mortality to decrease from 10% to 8% (absolute risk reduction 2%). They plan a two-sided test with $\alpha = 0.05$ and want 80% power. However, after enrolling 600 patients total, an interim analysis shows no statistically significant difference. The investigators suspect the trial was underpowered.

Which of the following changes would most increase the power of this study (while keeping $\alpha = 0.05$ )?

A. Decrease the significance level to $\alpha = 0.01$
B. Increase the sample size
C. Use a two-sided test instead of a one-sided test
D. Increase the variability (standard deviation) of the outcome
E. Decrease the effect size the study is designed to detect

Step 1/2 Mindset: What “Power” Really Means

Power is the probability of detecting a true effect when it exists:

Power = $1 - \beta$
$\beta$ = probability of Type II error (false negative)

Power increases when you:

Increase sample size ( $n$ )
Increase effect size (bigger difference between groups)
Increase $\alpha$ (more tolerant of Type I error)
Decrease variability (less noise)
Use a one-sided test (if justified and direction is pre-specified)

A classic memory hook: Power goes up with “more signal” or “less noise.”

Correct Answer: B. Increase the sample size

If you want more power without changing $\alpha$ , the most straightforward fix is increasing $n$ .

Why increasing $n$ increases power

Bigger sample sizes:

Reduce standard error (tighter sampling distribution)
Narrow confidence intervals
Make it easier to detect a true difference (even if modest)

For many common tests, standard error scales like: $SE \propto \frac{1}{\sqrt{n}}$

So to meaningfully increase power, you often need a substantial increase in sample size (doubling $n$ does not double power, but it helps).

Now Eliminate the Distractors (Why Every Answer Choice Matters)

A. Decrease the significance level to $\alpha = 0.01$ ❌

Lowering $\alpha$ makes it harder to call a result significant.

$\alpha \downarrow$ → Type I error decreases
But the threshold becomes more stringent → $\beta$ increases → power decreases

High-yield: $\alpha$ and $\beta$ trade off (holding $n$ constant). If you demand stronger evidence (smaller $\alpha$ ), you’ll miss more true effects.

C. Use a two-sided test instead of a one-sided test ❌

This is backwards: two-sided tests reduce power compared to one-sided tests (all else equal).

Two-sided splits $\alpha$ across both tails (e.g., 0.025 each)
Requires a more extreme test statistic to reject $H_0$

High-yield nuance: A one-sided test can increase power only if:

The direction is pre-specified before data collection
Effects in the opposite direction are clinically irrelevant or implausible
Otherwise, one-sided testing is considered methodologically shady.

D. Increase the variability (standard deviation) of the outcome ❌

More variability = more noise = harder to see signal.

Variability $\uparrow$ → standard error $\uparrow$
Confidence intervals widen
Test statistic shrinks (on average) → power decreases

USMLE tie-in: Anything that increases “spread” (heterogeneous population, unreliable measurement, poor adherence) generally reduces power.

E. Decrease the effect size the study is designed to detect ❌

Smaller effect sizes are harder to detect.

Effect size $\downarrow$ → groups become more similar
You need a larger sample to detect the smaller difference
If $n$ stays fixed, power decreases

High-yield framing:

If the true effect is small, you can still get high power—but only with big $n$ .

Rapid-Fire High-Yield Table: What Happens to Power?

Change	Type I error ( $\alpha$ )	Type II error ( $\beta$ )	Power ( $1-\beta$ )
Increase $n$	—	↓	↑
Increase $\alpha$	↑	↓	↑
Decrease $\alpha$	↓	↑	↓
Increase effect size	—	↓	↑
Decrease effect size	—	↑	↓
Increase variability (SD)	—	↑	↓
Decrease variability (SD)	—	↓	↑
One-sided vs two-sided (same $\alpha$ )	—	↓	↑ (one-sided)

(“—” = not directly changed by that manipulation)

How This Shows Up on USMLE: The Classic Traps

Trap 1: Confusing power with p-value

Power is planned before the study (design stage).
p-value is calculated after data collection (analysis stage).

Low power often leads to:

“Negative study” even when a true effect exists
Wider CIs that include clinically important effects

Trap 2: Thinking a nonsignificant result means “no effect”

A nonsignificant result may reflect:

True lack of effect or
Underpowered study (small $n$ , small effect size, high variability)

High-yield language: “The study may have failed to detect a difference” ≠ “There is no difference.”

Trap 3: Mixing up absolute vs relative effect size

Sample size needs balloon when:

Baseline event rate is low
Absolute risk reduction is small (e.g., 2%)

In the vignette, 10% → 8% is a modest absolute change—often requiring a large $n$ .

A Quick “Knob-Turning” Strategy for Answer Choices

When you see power/sample size answers, translate each option into one of these knobs:

$n$ (sample size)
$\alpha$ (significance threshold)
Effect size (difference between groups)
Variability (SD / measurement noise)
One- vs two-sided hypothesis testing

Then ask: does this create more signal, less noise, or easier rejection of $H_0$ ?

Takeaway Cheat Sheet (What You Should Remember)

Power = $1-\beta$
To increase power (with $\alpha$ fixed): increase $n$ is the cleanest, most defensible move
Lower $\alpha$ → lower power
Two-sided tests → lower power than one-sided (given same $\alpha$ )
Higher variability → lower power
Smaller effect size → lower power (unless you increase $n$ )

Q-Bank Breakdown: Power & sample size — Why Every Answer Choice Matters