Nov 26, 2025
7 min read

How I Went from Zero Tests to Comprehensive Dagster Test Coverage

The journey from no tests to data-driven testing with pytest fixtures, mock resources, and proper asset validation.

I’m going to be honest - the first version of this pipeline had zero tests. Not a single one.

The pipeline worked. It ran in production. It processed real data. But every time I made a change, I held my breath and hoped nothing broke. That’s not how you build production data infrastructure.

The test overhaul happened over several sessions. Each one added a new layer of confidence. The git history tells the story: “test overhaul, test asset patients” → “test overhaul, test asset billing” → “test overhaul, test utils” → “test and doc improvements” → “code qa and tests round 2” → “data driven testing for billing data”.

By the end, I had comprehensive test coverage with patterns I could replicate for every new asset.

Session 1: The Patient Asset Tests

The first breakthrough was realizing I didn’t need to mock everything. I could create real test data with fixtures and call the asset functions directly.

I started with the patient assets because they’re simpler than billing - fewer transformations, clearer logic:

@pytest.fixture
def dummy_patients_df():
    """Generate dummy patient data for tests."""
    return generate_dummy_patients_data(num_records=25)

@pytest.fixture
def mock_context():
    """Create a mock Dagster context."""
    return MockContext()

class TestPatients:
    """Tests for the patients asset."""

    def test_patients_returns_dataframe(self, mock_context, dummy_patients_df):
        """Test that patients returns a pandas DataFrame."""
        mock_firebird = MockFirebirdResource(dummy_patients_df)
        result = patients(mock_context, mock_firebird)
        assert isinstance(result, pd.DataFrame)

    def test_patients_has_expected_columns(self, mock_context, dummy_patients_df):
        """Test that the returned DataFrame has expected columns."""
        mock_firebird = MockFirebirdResource(dummy_patients_df)
        result = patients(mock_context, mock_firebird)
        expected_columns = ["PATIENT_NR", "FIRST_NAME", "LAST_NAME", "DATE_OF_BIRTH"]
        for col in expected_columns:
            assert col in result.columns

The pattern was simple: create a fixture with test data, create a mock resource that returns that data, call the asset function, and verify the output.

No database required. No complex mocking. Just pure functions being tested.

Session 2: Scaling to Billing Assets

Billing assets were more complex. They had transformations, validations, grouping logic. But the same pattern worked:

class TestProcessedBillingData:
    """Tests for the processed_billing_data asset."""

    def test_processed_billing_data_adds_hash(self, mock_context, dummy_billing_df):
        """Test that processing adds PATIENT_NR_HASH column."""
        result = processed_billing_data(mock_context, dummy_billing_df)
        assert "PATIENT_NR_HASH" in result.columns

    def test_processed_billing_data_preserves_rows(self, mock_context, dummy_billing_df):
        """Test that processing preserves all rows."""
        result = processed_billing_data(mock_context, dummy_billing_df)
        assert len(result) == len(dummy_billing_df)

The key was testing one thing at a time. Does the processing add a hash column? Yes. Does it preserve row count? Yes. Does it validate the schema? Yes. Small, focused tests that are easy to understand and maintain.

Session 3: Test Utils

Once I had asset tests working, I realized the utility functions needed tests too. These functions do the actual work - hashing columns, validating data, grouping records.

The utilities were perfect for traditional unit tests:

def test_hash_column():
    """Test that hash_column adds a hashed column."""
    df = pd.DataFrame({"col1": ["a", "b", "c"]})
    result = hash_column(df, column_name="col1")
    assert "col1_HASH" in result.columns
    assert len(result) == len(df)

def test_process_billing_data():
    """Test that process_billing_data adds required columns."""
    billing_df = generate_dummy_billing_data(10)
    result = process_billing_data(billing_df)
    assert "YEAR" in result.columns
    assert "MONTH" in result.columns

Testing utilities separately from assets meant I could verify the low-level logic works before testing the high-level orchestration.

The Data-Driven Testing Breakthrough

The real game-changer was parametrized tests. Instead of writing separate tests for each edge case, I could write one test with multiple data scenarios:

@pytest.mark.parametrize("num_records", [0, 1, 10, 100])
def test_billing_data_handles_various_sizes(mock_context, num_records):
    """Test that billing_data handles datasets of various sizes."""
    dummy_df = generate_dummy_billing_data(num_records)
    mock_firebird = MockFirebirdResource(dummy_df)
    result = billing_data(mock_context, mock_firebird)
    assert len(result) == num_records

@pytest.mark.parametrize("source", ["INPATIENT", "OUTPATIENT", "PHARMACY"])
def test_billing_data_by_group_splits_correctly(mock_context, source):
    """Test that billing data is correctly split by SOURCE."""
    billing_df = pd.DataFrame({
        "PATIENT_NR": ["P001", "P002", "P003"],
        "SOURCE": [source, source, "OTHER"],
        "AMOUNT": [100, 200, 300],
    })
    processed_df = process_billing_data(billing_df)
    result = billing_data_by_group(mock_context, processed_df)
    assert source in result
    assert len(result[source]) >= 2

Parametrized tests made it easy to verify edge cases - empty datasets, single-row datasets, datasets with specific source values. The test code stayed clean while coverage expanded.

The Mock Resource Pattern

The biggest architectural improvement was standardizing the mock resource pattern. Instead of creating ad-hoc mocks for each test, I built reusable mock resources that mirror production:

class MockFirebirdResource:
    """Mock Firebird database resource for testing."""

    def __init__(self, test_data: pd.DataFrame):
        self.test_data = test_data

    def get_connection(self):
        return MockConnection(self.test_data)

class MockConnection:
    """Mock database connection for testing."""

    def __init__(self, test_data: pd.DataFrame):
        self.test_data = test_data

    def cursor(self):
        return MockCursor(self.test_data)

    def close(self):
        pass

The mock resource implements the same interface as the real resource. Tests use the mock, production uses the real one. The asset code doesn’t know or care which it’s using.

This pattern scaled beautifully when I started the medallion migration. The MockFirebirdParquetResource follows the exact same pattern - it reads from parquet files instead of a real database, but the interface is identical.

The Mock Context Evolution

Early tests had a simple MockContext:

class MockContext:
    """Mock Dagster context for testing."""

    def __init__(self):
        self._metadata = {}

    @property
    def log(self):
        return MagicMock()

    def add_output_metadata(self, metadata):
        self._metadata.update(metadata)

Later, when I started testing the medallion architecture, I needed more functionality:

class MockContext:
    """Mock Dagster context for testing."""

    def __init__(self):
        self.run_id = "test-run-123"
        self._metadata = {}
        self._logs = []

    @property
    def log(self):
        return self

    def info(self, msg):
        self._logs.append(("info", msg))

    def add_output_metadata(self, metadata):
        self._metadata.update(metadata)

Now the mock context captures logs and provides a deterministic run_id for testing. I can verify that assets log the right messages and include the right metadata.

What Good Test Coverage Feels Like

The difference is night and day. Before tests, every change was risky. Now I can refactor with confidence.

Adding a new asset? There’s a clear pattern to follow. Create fixtures for test data, write tests for the asset function, verify the output. The pattern repeats.

Refactoring transformation logic? The tests catch regressions immediately. I refactored the patient ID hashing logic twice - the tests failed, I fixed the code, the tests passed.

Migrating from Pandas to Polars? The test structure stayed the same. I just needed to update the assertions to work with Polars DataFrames instead of Pandas.

The test suite runs in seconds. No database setup. No network calls. Just fast, reliable verification that the code does what it’s supposed to do.

The Test File Structure

The test organization mirrors the source code:

healthcare_ETL/
├── assets/
│   ├── assets_billing_data.py
│   ├── assets_patients.py
│   └── assets_geriatric_assessment.py
└── utils/
    ├── database_utils.py
    ├── pandas_utils.py
    └── dagster_utils.py

healthcare_ETL_tests/
├── test_assets/
│   ├── test_assets_billing.py
│   ├── test_assets_patients.py
│   └── test_assets_geriatric_assessment.py
└── test_utils/
    ├── test_database_utils.py
    ├── test_pandas_utils.py
    └── test_dagster_utils.py

Each source file has a corresponding test file. When I create a new asset, I create its test file at the same time. It’s just part of the development workflow now.

Lessons Learned

Start with the simplest asset. I started with patients instead of billing. It gave me confidence before tackling the complex stuff.

Fixtures are your friends. pytest fixtures make test setup clean and reusable. Don’t repeat yourself - create fixtures for common test data.

Test one thing at a time. Small, focused tests are easier to understand and maintain than big tests that verify everything.

Mock at the right level. Don’t mock individual functions - mock resources. It aligns with Dagster’s architecture and makes tests more robust.

Parametrize edge cases. Instead of writing separate tests for empty data, single-row data, and multi-row data, use @pytest.mark.parametrize and test them all with one function.

Make test data realistic. Use Faker to generate fake patient names, dates, and amounts. It makes tests more realistic and catches edge cases you wouldn’t think of with simple test data like ["a", "b", "c"].

The test overhaul took several sessions. But it was worth every minute. Now when I make changes, I know immediately if something breaks. That’s how you build production data infrastructure.

References