Testing Guidelines#

This document explains how to write and run tests in the Haniwers project.

Quick Reference#

Three types of tests:

  • Unit tests - Fast (<1s), test one thing, always pass (default)

  • ⚠️ Local tests - Slow or probabilistic, marked with @pytest.mark.local

  • 🚀 Benchmarks - Performance measurements, marked with @pytest.mark.benchmark

When to Write Each Type#

Unit Tests (Default) ✅#

Write unit tests when:

  • Testing functionality correctness (does it work?)

  • Test runs in <1 second

  • Test is deterministic (same result every time)

  • Test verifies behavior, not performance

Examples:

  • Does load_events() return a list of MockEvent objects?

  • Does readline() return bytes?

  • Does close() set is_open=False?

  • Does field generation produce values in valid ranges?

Characteristics:

  • Fast execution (<1 second per test)

  • No randomness or statistical checks

  • No large datasets (prefer 10-100 rows)

  • Runs on every code change (CI/CD)

Local Tests (@pytest.mark.local) ⚠️#

Mark as local when:

  • Test takes >1 second to run

  • Test uses randomness (might occasionally fail)

  • Test requires large datasets (>1000 rows)

  • Test measures timing (might fail on slow CI)

  • Test requires external resources (memory profiling tools)

Examples:

  • Loading 10K rows to verify performance

  • Statistical distribution tests (might fail 1 in 1000 runs)

  • Boundary value tests requiring 2000+ samples

  • Stress tests with 1000+ events

Characteristics:

  • Slow execution (>1 second)

  • May use statistical assertions

  • May occasionally fail due to randomness

  • Skipped by default, run with pytest -m local

Benchmarks (@pytest.mark.benchmark) 🚀#

Use benchmarks when:

  • Specifically measuring performance (timing, speed, throughput)

  • Testing large-scale data (1000+ events)

  • Stress testing system limits

  • Profiling code execution

  • Comparing performance of different implementations

Examples:

  • RandomMocker at 100x speed (test_random_mocker_high_speed)

  • Generating 1000 events for stress testing (test_random_mocker_stress_test)

  • Measuring playback speed accuracy (test_mocker_speed_adjustment)

  • Testing timing variation bounds (test_mocker_jitter)

Characteristics:

  • Measures performance, not functionality

  • May include explicit timing assertions

  • Runs separately from functional tests

  • Skipped by default with task test:integrations

  • Grouped in separate benchmark test suite

Current benchmarks in project:

  • 5 benchmark tests in tests/v1/integrations/daq/

  • Performance measurement focus

  • Marked with @pytest.mark.benchmark

  • Can be run with task test:benchmark

How to Run Tests#

Quick Commands (Using Taskfile)#

# Run fast unit tests only (default)
task test

# Run all v1 tests (unit + integrations, skips benchmarks)
task test:full

# Run integration tests (skips benchmarks)
task test:integrations

# Run only benchmark tests (performance, timing, stress tests)
task test:benchmark

# Run only mocker unit tests
poetry run pytest tests/v1/unit/daq/mocker/

# Run only local tests
task test:local

# Run all tests including local and benchmark
task test:all

# Show performance profile (slowest 20 tests)
task test:perf:mocker

# Show all registered markers
task test:markers

Direct pytest Commands#

# Run fast unit tests (default - skips local tests)
pytest

# Run all tests including slow ones
pytest -m ""

# Run only local tests
pytest -m local

# Run only benchmark tests
pytest -m benchmark tests/v1/integrations/

# Run integration tests (skips benchmarks)
pytest -m "not benchmark" tests/v1/integrations/

# Run specific test file
pytest tests/v1/unit/daq/mocker/test_mocker.py

# Run with performance profiling
pytest tests/v1/unit/daq/mocker/ --durations=20

# Run with coverage
pytest tests/v1/ --cov=haniwers.v1 --cov-report=term

How to Mark Tests#

Marking a test as local#

import pytest

@pytest.mark.local
def test_large_dataset():
    """Test with 10K rows - marked local because it's slow.

    This test takes ~5 seconds to run, so we mark it as local-only.
    It still tests important functionality (loading large files) but
    doesn't need to run on every code change.

    Why marked local:
        - Slow: Takes >1 second to run
        - Resource-intensive: Generates 10K events
    """
    # Generate large dataset
    events = [generate_event() for _ in range(10000)]

    # Test functionality
    assert len(events) == 10000
    assert all(isinstance(e, MockEvent) for e in events)

Marking with multiple markers#

@pytest.mark.local
@pytest.mark.slow
@pytest.mark.statistical
def test_randomness_quality():
    """Test randomness distribution - marked local because statistical.

    Why marked local:
        - Slow: Takes >1 second with 1000 samples
        - Statistical: Might fail 1 in 1000 runs due to random variation
    """
    # Generate random samples
    samples = [generate_random_value() for _ in range(1000)]

    # Statistical assertion (might occasionally fail)
    mean = sum(samples) / len(samples)
    assert 45 < mean < 55  # Expect mean ~50 with some tolerance

Best Practices#

1. Keep unit tests fast#

Bad (slow test):

def test_processing():
    # Processes 10K rows just to test basic functionality
    data = load_large_file(10000)
    result = process(data)
    assert result is not None

Good (fast test):

def test_processing():
    # Tests with minimal data (10 rows)
    data = load_sample_data(10)
    result = process(data)
    assert len(result) == 10
    assert all(isinstance(r, ProcessedData) for r in result)

Rationale: If your test is slow, ask “can I test this with less data?”

2. Test behavior, not performance#

Bad (performance test in unit suite):

def test_load_speed():
    """This test only checks speed, not correctness."""
    start = time.time()
    data = load_events(path, 10000)
    duration = time.time() - start
    assert duration < 5.0  # Only tests speed

Good (functional test):

def test_load_correctness():
    """This test checks correctness with small dataset."""
    data = load_events(path, 100)
    assert len(data) == 100
    assert all(isinstance(e, MockEvent) for e in data)
    assert data[0].deltaT == 1.0  # Verifies behavior

Rationale: Unit tests verify correctness, not speed. Performance tests belong in benchmarks with proper statistical analysis.

3. Add clear docstrings#

Bad (no explanation):

@pytest.mark.local
def test_stress():
    # No explanation of why it's marked local
    ...

Good (clear explanation):

@pytest.mark.local
def test_stress_with_10k_events():
    """Stress test with 10K events - marked local because it's slow.

    Why marked local:
        - Slow: Takes ~5 seconds to run
        - Resource-intensive: Generates 10K events

    This test validates that the system can handle large workloads
    without crashes or memory leaks.
    """
    ...

Rationale: Future developers need to understand why a test is marked local.

4. Prefer small datasets#

Guideline for sample sizes:

  • Range checks: 30-50 samples sufficient

  • Basic functionality: 10-100 samples

  • Statistical tests: 200-500 samples (mark as local if >1s)

  • Stress tests: 1000+ samples (always mark as local)

Statistical rule of thumb: For simple range checks, 30-50 samples give 95% confidence. Going to 1000 adds negligible confidence at 20x the cost.

Pytest Markers Reference#

Custom Markers (Haniwers Project)#

@pytest.mark.local
# Tests that should only run locally (slow, probabilistic, or resource-intensive)

@pytest.mark.benchmark
# Tests that measure performance, timing, or stress characteristics (timing-sensitive)
# Skipped by default with task test:integrations
# Run with: task test:benchmark or pytest -m benchmark

@pytest.mark.slow
# Informational: Tests that take >1 second (usually also marked local)

@pytest.mark.statistical
# Informational: Tests involving randomness (usually also marked local)

Built-in Markers (pytest)#

@pytest.mark.skip(reason="...")
# Skip this test unconditionally

@pytest.mark.skipif(condition, reason="...")
# Skip this test if condition is True

@pytest.mark.parametrize("arg", [1, 2, 3])
# Run this test multiple times with different parameters

Configuration#

Pytest marker configuration is in pyproject.toml:

[tool.pytest.ini_options]
markers = [
    "local: tests that should only run locally (slow, probabilistic, or resource-intensive)",
    "slow: tests that take more than 1 second to run (informational)",
    "statistical: tests that involve randomness and may occasionally fail (informational)",
]

# Skip local-only tests by default
addopts = [
    "-m", "not local",
]

Test Organization#

Tests follow the source code structure:

src/haniwers/v1/daq/mocker.py     # Source module
tests/v1/unit/daq/mocker/          # Test directory
├── test_mocker.py                 # Tests for Mocker class
├── test_random_mocker.py          # Tests for RandomMocker class
└── test_load_events.py            # Tests for load_events() function

Naming conventions:

  • Test files: test_<module_name>.py

  • Test classes: Test<ClassName><Aspect>

  • Test functions: test_<what_it_tests>

Example:

# tests/v1/unit/daq/mocker/test_mocker.py

class TestMockerInitialization:
    """Test Mocker initialization behavior."""

    def test_init_with_csv_path(self):
        """Mocker should initialize with valid CSV path."""
        ...

    def test_init_without_csv_path_raises_error(self):
        """Mocker should raise ValueError if csv_path is None."""
        ...

CI/CD Behavior#

Default CI run:

pytest  # Runs only fast unit tests (skips local and benchmark tests)

Result:

  • Fast feedback (~10 seconds for unit tests)

  • Reliable (no flaky tests or timing-sensitive tests)

  • Runs on every commit/PR

Integration tests (without benchmarks):

task test:integrations  # Skips benchmark tests
# or
pytest tests/v1/integrations/ -m "not benchmark"

Result:

  • Quick functional integration tests (~25 seconds)

  • Validates multi-component workflows

  • Skips performance-focused tests

Local development (all tests):

# Before committing - run everything
pytest -m ""  # Run all tests including local and benchmark

# Or using Taskfile
task test:all

Optional: Performance analysis

# Run benchmarks separately (if investigating performance)
task test:benchmark

# Or show performance profile of all tests
pytest --durations=20

Common Questions#

Q: My test is slow but tests important functionality. What should I do?#

A: Mark it as local and add a docstring explaining why:

@pytest.mark.local
def test_important_but_slow():
    """Tests critical edge case - marked local because it's slow.

    Why marked local:
        - Slow: Takes 3 seconds due to large dataset
        - Important: Tests rare but critical edge case

    This test ensures correctness for boundary condition X.
    """
    ...

Q: Should I widen timing tolerances or mark as local?#

A: Both approaches are valid:

  1. Widen tolerances if testing behavior (e.g., “no double sleep”):

    def test_no_double_sleep():
        # We care that it sleeps ONCE, not TWICE
        # Exact timing doesn't matter (0.6-2.0s is fine)
        assert 0.6 < duration < 2.0
    
  2. Mark as local if testing precise timing:

    @pytest.mark.local
    def test_precise_timing():
        # This test needs precise timing measurements
        assert 0.95 < duration < 1.05
    

Q: How do I know if a test should be local or benchmark?#

Decision tree:

  1. Does it measure performance? (timing, speed, throughput, stress)

    • Mark as @pytest.mark.benchmark

  2. Does it take >1 second?

    • If performance-focused → Mark as @pytest.mark.benchmark

    • If functionality-focused → Mark as @pytest.mark.local

  3. Does it use randomness? → Mark as @pytest.mark.local

  4. Does it use >1000 rows/events?

    • If stress testing → Mark as @pytest.mark.benchmark

    • If data validation → Mark as @pytest.mark.local

  5. Otherwise → Keep as unit test

Key distinction:

  • Benchmark: “Does the system run fast enough?” (performance)

  • Local: “Does the system work correctly?” (functionality)

  • Unit: “Does this function work?” (fast, deterministic)

Q: Can I run only specific tests with markers?#

A: Yes, combine markers with file/test selection:

# Run only local tests in mocker suite
pytest tests/v1/unit/daq/mocker/ -m local

# Run specific local test
pytest tests/v1/unit/daq/mocker/test_random_mocker.py::test_value_distribution

# Run local tests matching pattern
pytest -m local -k "distribution"

# Run only benchmark tests
pytest tests/v1/integrations/ -m benchmark

# Run benchmark test matching pattern
pytest tests/v1/integrations/ -m benchmark -k "stress"

# Run everything EXCEPT benchmark tests
pytest tests/v1/integrations/ -m "not benchmark"

# Run benchmarks with verbose output
pytest tests/v1/integrations/ -m benchmark -v

Further Reading#

Summary#

Golden rules:

  1. Keep unit tests fast (<1 second)

  2. Test behavior, not performance

  3. Add clear docstrings to marked tests

  4. Prefer small datasets (10-100 rows)

  5. Separate performance tests from functional tests

When in doubt:

  • If it measures performance/timing → mark as @pytest.mark.benchmark

  • If it’s slow (>1s) and functional → mark as @pytest.mark.local

  • If it’s random → mark as @pytest.mark.local

  • If it tests correctness → unit test

  • If it tests system limits/stress → mark as @pytest.mark.benchmark

Test command cheat sheet:

task test              # Fast unit tests (default)
task test:integrations # Functional integration tests (no benchmarks)
task test:benchmark    # Performance/timing/stress tests only
task test:all          # Everything including local and benchmarks

Happy testing! 🎉