← Back to Writing

Property-Based Testing and Modern E2E: Hypothesis + Playwright

Testing has always been one of those things that feels like it should be straightforward but somehow never is. You write a few unit tests, maybe throw in some integration tests, and call it a day. Then your app breaks in production in ways you never imagined, and you're left wondering what went wrong.

The problem isn't that we're bad at testing. It's that traditional testing approaches have fundamental limitations that we've just accepted as normal. But what if there was a better way?

The Cracks in Traditional Testing

Most of us learned testing the same way: write tests that check if specific inputs produce expected outputs. If you're building an addition function, you test add(2, 3) returns 5. If you're testing a login form, you try valid credentials and invalid ones. This approach works, but it has a blind spot the size of Texas.

The issue is that we can only test the scenarios we think to test. Our imagination becomes the limiting factor. We might test happy paths and a few edge cases, but there are always scenarios lurking in the corners that we never considered. Real users have an uncanny ability to find these scenarios.

For end-to-end testing, the problems multiply. Tests become flaky, they're slow to run, and maintaining them feels like a full-time job. Many teams just give up on E2E testing entirely, which is understandable but unfortunate.

Enter the New Guard

Two tools have fundamentally changed how I approach testing: Playwright for end-to-end testing and Hypothesis for property-based testing. Together, they form a testing stack that actually delivers on the promise of catching bugs before users do.

Let's start with Hypothesis, because it solves the imagination problem I mentioned earlier.

Property-Based Testing with Hypothesis

Instead of testing specific examples, Hypothesis lets you describe the properties your code should have, then generates hundreds of test cases automatically. Rather than testing specific values, you might test that addition is commutative: add(a, b) == add(b, a) for any valid inputs.

Here's what this looks like in practice:

from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_sort_is_idempotent(lst):
    """Sorting a list twice should be the same as sorting it once"""
    assert sorted(sorted(lst)) == sorted(lst)

@given(st.lists(st.integers()))
def test_sort_preserves_length(lst):
    """Sorting should never change the length"""
    assert len(sorted(lst)) == len(lst)

When you run these tests, Hypothesis will generate random lists of integers and verify these properties hold. But here's the clever part: when it finds a failing case, it automatically minimizes the input to the smallest possible example that reproduces the bug.

I've seen Hypothesis catch bugs that would have taken months to surface naturally. It found a race condition in a caching system by generating a specific sequence of operations that happened to expose the issue. It discovered integer overflow bugs by generating numbers just large enough to cause problems. These aren't bugs you typically think to test for.

End-to-End Testing with Playwright

For E2E testing, Playwright has solved most of the problems that made this type of testing painful. Tests run fast because Playwright can run multiple browsers in parallel. They're more reliable because Playwright automatically waits for elements to be ready. And they're easier to debug because you can see exactly what happened when something goes wrong.

The developer experience is genuinely pleasant. You can record tests by just using your app normally, then Playwright generates the test code. You can run tests in headed mode to watch them execute. When tests fail, you get screenshots, videos, and detailed traces showing exactly what happened.

import { test, expect } from '@playwright/test';

test('user can complete checkout flow', async ({ page }) => {
  await page.goto('/products');
  await page.click('[data-testid="add-to-cart"]');
  await page.click('[data-testid="cart"]');
  await page.fill('[data-testid="email"]', 'test@example.com');
  await page.click('[data-testid="checkout"]');
  
  await expect(page.locator('[data-testid="success"]')).toBeVisible();
});

The tests read like natural language, and Playwright handles all the timing and synchronization issues that used to make E2E tests fragile.

Division of Labor

Here's the thing about Hypothesis and Playwright: they're not really tools you use together in the direct sense. They operate at completely different layers of your system and will likely live in separate test suites, run at different stages of your CI pipeline, and maybe even be written by different people on your team.

Hypothesis shines when you're testing the complex logic that powers your application. Think pricing algorithms, data transformations, parsing logic, or mathematical computations. These are the places where edge cases can hide and cause real damage. Hypothesis will run as part of your unit or integration test suite, probably early in your CI process.

Playwright, on the other hand, is all about the user experience. It's testing that your login form actually works, that users can complete a purchase, that your responsive design doesn't break on mobile. These tests run later in your pipeline, often against a deployed environment, and they're focused on workflows rather than individual functions.

What's powerful about having both is coverage. You get confidence that your core logic handles weird inputs correctly, and you also know that your users can actually accomplish what they came to do.

Take an e-commerce discount system as an example. Hypothesis would fuzz your discount calculation function with edge cases: negative quantities, percentage discounts over 100%, expired coupon codes, currency conversion edge cases. Meanwhile, Playwright would verify that a user can actually apply a discount code during checkout and see the correct total. Same feature, different failure modes, both critical.

Getting Started

The nice thing about both tools is that you can adopt them incrementally and independently. Pick whichever problem is more pressing for your team right now.

If you're dealing with complex algorithms or data processing, start with Hypothesis. Find one function that's critical but tricky to test thoroughly, and write property-based tests for it. You'll probably find bugs you didn't know existed.

If you're spending too much time manually testing user flows or dealing with flaky E2E tests, try Playwright. Pick your most important user journey and automate it. The improved reliability alone will save you time.

Don't feel like you need to adopt both at once or integrate them somehow. They solve different problems, and that's exactly why they're both valuable. Good testing isn't about having one perfect tool, it's about having the right tool for each type of problem you face.

The goal isn't perfect test coverage or zero bugs. It's building confidence that your code works correctly and will continue working as you make changes. These tools get you much closer to that goal than traditional approaches ever could.

Testing doesn't have to be an afterthought or a necessary evil. With the right tools, it becomes a powerful way to understand your code better and ship with confidence. Most bugs fall into one of these two categories anyway, so you're covering most of your risk. That's what good testing should feel like.