Module 5 · Chapter 5

A/B testing your sequences

9 min read

Every outreach campaign is built on assumptions. You assume your subject line is compelling, your opening hook is relevant, your CTA is clear, and your timing is right. A/B testing replaces assumptions with evidence. It is the discipline that transforms good outreach into great outreach over time.

This chapter covers what to test, in what order, the sample sizes you need for valid results, how to recognize statistical significance, and how to build an iterative testing culture that compounds improvements week over week.

The testing hierarchy: what to test first

Not all elements of your outreach are equally impactful. Testing the wrong thing wastes time and volume. Here is the order in which you should test, ranked by impact on your reply rate:

1. Subject lines (impact: very high)

The subject line determines whether your email gets opened. No amount of brilliant copy matters if the email sits unread. Subject line tests give you the fastest, most measurable feedback.

What to test within subject lines:

  • Length: Short (2-4 words) vs. medium (5-8 words). Shorter subject lines tend to outperform, but this varies by audience.
  • Personalization: Including the prospect's name, company, or role vs. generic. Test whether personalization in the subject lifts opens.
  • Question vs. statement: "Quick question about [topic]" vs. "[Company] + [your company]." Questions tend to generate curiosity.
  • Specificity: Vague ("Idea for you") vs. specific ("Cutting [metric] by 30% at [Company]"). Specificity usually wins.

2. Opening lines (impact: high)

The first line of your email appears in the preview pane alongside the subject line. Together, they form the one-two punch that determines whether the prospect reads further. Test different opening approaches:

  • Personalized observation vs. direct value statement. "Noticed your team just launched X" vs. "Most [role] teams waste 10 hours/week on Y."
  • Pain-focused vs. opportunity-focused. "Struggling with X?" vs. "There's a faster way to do X."
  • Question vs. statement. Leading with a question engages differently than leading with an assertion.

3. Call to action (impact: high)

The CTA is where many otherwise good emails fail. Test the ask itself:

  • Meeting request vs. question. "Free for 15 minutes this week?" vs. "Is this a priority for your team right now?"
  • Specific time vs. open-ended. "How about Thursday at 2pm?" vs. "When works for you?"
  • Calendar link vs. no link. Some audiences respond well to calendar links; others find them presumptuous.
  • Low commitment vs. direct. "Want me to send over a quick summary?" vs. "Let's hop on a call."

4. Email length (impact: moderate)

Test short (50-75 words) against medium (100-150 words). In most B2B contexts, shorter emails outperform longer ones, but this is not universal. Complex or technical products sometimes benefit from slightly more detail.

5. Send timing (impact: moderate)

Test different days of the week and times of day. Tuesday-Thursday mornings are the conventional wisdom, but your specific audience may differ. Test morning (8-10am) vs. early afternoon (1-3pm) vs. late afternoon (4-6pm).

Key insight

Test one variable at a time. If you change the subject line, opening, and CTA simultaneously, you cannot determine which change drove the result. Discipline in isolation is what makes A/B testing valuable.

Sample sizes and statistical significance

The most common mistake in outreach A/B testing is declaring a winner too early. If variant A gets 3 replies out of 20 sends and variant B gets 1 reply out of 20, that is not a meaningful difference — it is random noise.

Minimum sample sizes

For outreach A/B tests, you need larger samples than you might think because response rates are relatively low (typically 2-10%). Here are practical minimums:

200+

Per variant for open rates

500+

Per variant for reply rates

90%

Confidence level target

For subject line tests (measuring open rate): You need at least 200 sends per variant. Open rates are higher than reply rates, so differences are easier to detect with smaller samples.

For body copy and CTA tests (measuring reply rate): You need at least 500 sends per variant, ideally more. With a 5% baseline reply rate, it takes significant volume to distinguish a real improvement from chance.

For send timing tests: You need at least 300 sends per variant, and they need to run over multiple weeks to account for weekly variation.

Watch out

If you send fewer than 100 emails per variant, do not draw conclusions from the results. Treat any patterns you see as hypotheses to validate with a larger sample, not as facts to act on.

Understanding statistical significance

Statistical significance tells you the probability that the difference between your variants is real rather than random. For outreach testing, aim for 90% confidence — this is a practical threshold that balances accuracy with the realities of limited send volume.

Many outreach platforms include built-in significance calculators. If yours does not, there are free online tools that let you input your send count and response count for each variant and get a confidence level. Do not skip this step — gut feelings about which variant "seems better" are notoriously unreliable.

The testing process: a repeatable cycle

Effective A/B testing is not a one-time event. It is a continuous cycle of hypothesis, test, analysis, and iteration. Here is the process:

  • Step 1: Identify the weakest link. Look at your funnel metrics. Low open rate? Test subject lines. Good opens but low replies? Test body copy or CTA. Focus on the biggest bottleneck first.
  • Step 2: Form a hypothesis. "I believe a question-based subject line will increase open rates by 10% because our audience is curiosity-driven." Be specific about what you expect and why.
  • Step 3: Create your variants. Write the control (your current best) and the challenger (your hypothesis). Change only one element.
  • Step 4: Split your audience randomly. Ensure the test and control groups are comparable. Do not put all your best prospects in one variant — randomize the split.
  • Step 5: Run the test to completion. Do not peek and call a winner early. Let the full sample send before analyzing results.
  • Step 6: Analyze with statistical rigor. Check significance. Record the results in a testing log regardless of outcome — losses are as informative as wins.
  • Step 7: Implement and iterate. Roll out the winner as your new control. Then test the next variable.

How to iterate: the compounding effect

The real power of A/B testing is compounding. A 10% improvement in open rates, followed by a 15% improvement in reply rates, followed by a 10% improvement in conversion to meetings, yields a cumulative improvement of roughly 40% — from three modest wins.

Top-performing outreach teams run tests continuously. They maintain a testing backlog and prioritize tests by expected impact. Their campaigns improve every week, not every quarter.

"The team that runs 50 tests per year will always outperform the team that writes one 'perfect' email and runs it for six months. Outreach is an iteration game, not a perfection game."

Common A/B testing mistakes

  • Testing too many things at once: You cannot learn what works if you change five variables simultaneously. One variable per test, always.
  • Calling winners too early: Wait for statistical significance. Premature conclusions lead to false optimizations that can actually hurt performance.
  • Testing trivial differences: Testing "Hi [Name]" vs. "Hey [Name]" is unlikely to move the needle. Test meaningful differences in approach, angle, or format.
  • Not documenting results: If you do not record what you tested, what the results were, and what you learned, you will repeat tests and waste volume. Maintain a simple testing log.
  • Ignoring segment differences: A subject line that works for CTOs may fail with marketing directors. When possible, analyze test results by segment to uncover these differences.

A/B testing is the engine that keeps your outreach improving over time. Build the habit early, maintain discipline around sample sizes and statistical rigor, and you will steadily outpace competitors who rely on guesswork. The next chapter covers the final piece of the sequences puzzle: knowing when to stop and when to re-engage.