Testing Guidelines#
This document explains how to write and run tests in the Haniwers project.
Quick Reference#
Three types of tests:
✅ Unit tests - Fast (<1s), test one thing, always pass (default)
⚠️ Local tests - Slow or probabilistic, marked with
@pytest.mark.local🚀 Benchmarks - Performance measurements, marked with
@pytest.mark.benchmark
When to Write Each Type#
Unit Tests (Default) ✅#
Write unit tests when:
Testing functionality correctness (does it work?)
Test runs in <1 second
Test is deterministic (same result every time)
Test verifies behavior, not performance
Examples:
Does
load_events()return a list of MockEvent objects?Does
readline()return bytes?Does
close()setis_open=False?Does field generation produce values in valid ranges?
Characteristics:
Fast execution (<1 second per test)
No randomness or statistical checks
No large datasets (prefer 10-100 rows)
Runs on every code change (CI/CD)
Local Tests (@pytest.mark.local) ⚠️#
Mark as local when:
Test takes >1 second to run
Test uses randomness (might occasionally fail)
Test requires large datasets (>1000 rows)
Test measures timing (might fail on slow CI)
Test requires external resources (memory profiling tools)
Examples:
Loading 10K rows to verify performance
Statistical distribution tests (might fail 1 in 1000 runs)
Boundary value tests requiring 2000+ samples
Stress tests with 1000+ events
Characteristics:
Slow execution (>1 second)
May use statistical assertions
May occasionally fail due to randomness
Skipped by default, run with
pytest -m local
Benchmarks (@pytest.mark.benchmark) 🚀#
Use benchmarks when:
Specifically measuring performance (timing, speed, throughput)
Testing large-scale data (1000+ events)
Stress testing system limits
Profiling code execution
Comparing performance of different implementations
Examples:
RandomMocker at 100x speed (test_random_mocker_high_speed)
Generating 1000 events for stress testing (test_random_mocker_stress_test)
Measuring playback speed accuracy (test_mocker_speed_adjustment)
Testing timing variation bounds (test_mocker_jitter)
Characteristics:
Measures performance, not functionality
May include explicit timing assertions
Runs separately from functional tests
Skipped by default with
task test:integrationsGrouped in separate benchmark test suite
Current benchmarks in project:
5 benchmark tests in
tests/v1/integrations/daq/Performance measurement focus
Marked with
@pytest.mark.benchmarkCan be run with
task test:benchmark
How to Run Tests#
Quick Commands (Using Taskfile)#
# Run fast unit tests only (default)
task test
# Run all v1 tests (unit + integrations, skips benchmarks)
task test:full
# Run integration tests (skips benchmarks)
task test:integrations
# Run only benchmark tests (performance, timing, stress tests)
task test:benchmark
# Run only mocker unit tests
poetry run pytest tests/v1/unit/daq/mocker/
# Run only local tests
task test:local
# Run all tests including local and benchmark
task test:all
# Show performance profile (slowest 20 tests)
task test:perf:mocker
# Show all registered markers
task test:markers
Direct pytest Commands#
# Run fast unit tests (default - skips local tests)
pytest
# Run all tests including slow ones
pytest -m ""
# Run only local tests
pytest -m local
# Run only benchmark tests
pytest -m benchmark tests/v1/integrations/
# Run integration tests (skips benchmarks)
pytest -m "not benchmark" tests/v1/integrations/
# Run specific test file
pytest tests/v1/unit/daq/mocker/test_mocker.py
# Run with performance profiling
pytest tests/v1/unit/daq/mocker/ --durations=20
# Run with coverage
pytest tests/v1/ --cov=haniwers.v1 --cov-report=term
How to Mark Tests#
Marking a test as local#
import pytest
@pytest.mark.local
def test_large_dataset():
"""Test with 10K rows - marked local because it's slow.
This test takes ~5 seconds to run, so we mark it as local-only.
It still tests important functionality (loading large files) but
doesn't need to run on every code change.
Why marked local:
- Slow: Takes >1 second to run
- Resource-intensive: Generates 10K events
"""
# Generate large dataset
events = [generate_event() for _ in range(10000)]
# Test functionality
assert len(events) == 10000
assert all(isinstance(e, MockEvent) for e in events)
Marking with multiple markers#
@pytest.mark.local
@pytest.mark.slow
@pytest.mark.statistical
def test_randomness_quality():
"""Test randomness distribution - marked local because statistical.
Why marked local:
- Slow: Takes >1 second with 1000 samples
- Statistical: Might fail 1 in 1000 runs due to random variation
"""
# Generate random samples
samples = [generate_random_value() for _ in range(1000)]
# Statistical assertion (might occasionally fail)
mean = sum(samples) / len(samples)
assert 45 < mean < 55 # Expect mean ~50 with some tolerance
Best Practices#
1. Keep unit tests fast#
Bad (slow test):
def test_processing():
# Processes 10K rows just to test basic functionality
data = load_large_file(10000)
result = process(data)
assert result is not None
Good (fast test):
def test_processing():
# Tests with minimal data (10 rows)
data = load_sample_data(10)
result = process(data)
assert len(result) == 10
assert all(isinstance(r, ProcessedData) for r in result)
Rationale: If your test is slow, ask “can I test this with less data?”
2. Test behavior, not performance#
Bad (performance test in unit suite):
def test_load_speed():
"""This test only checks speed, not correctness."""
start = time.time()
data = load_events(path, 10000)
duration = time.time() - start
assert duration < 5.0 # Only tests speed
Good (functional test):
def test_load_correctness():
"""This test checks correctness with small dataset."""
data = load_events(path, 100)
assert len(data) == 100
assert all(isinstance(e, MockEvent) for e in data)
assert data[0].deltaT == 1.0 # Verifies behavior
Rationale: Unit tests verify correctness, not speed. Performance tests belong in benchmarks with proper statistical analysis.
3. Add clear docstrings#
Bad (no explanation):
@pytest.mark.local
def test_stress():
# No explanation of why it's marked local
...
Good (clear explanation):
@pytest.mark.local
def test_stress_with_10k_events():
"""Stress test with 10K events - marked local because it's slow.
Why marked local:
- Slow: Takes ~5 seconds to run
- Resource-intensive: Generates 10K events
This test validates that the system can handle large workloads
without crashes or memory leaks.
"""
...
Rationale: Future developers need to understand why a test is marked local.
4. Prefer small datasets#
Guideline for sample sizes:
Range checks: 30-50 samples sufficient
Basic functionality: 10-100 samples
Statistical tests: 200-500 samples (mark as local if >1s)
Stress tests: 1000+ samples (always mark as local)
Statistical rule of thumb: For simple range checks, 30-50 samples give 95% confidence. Going to 1000 adds negligible confidence at 20x the cost.
Pytest Markers Reference#
Custom Markers (Haniwers Project)#
@pytest.mark.local
# Tests that should only run locally (slow, probabilistic, or resource-intensive)
@pytest.mark.benchmark
# Tests that measure performance, timing, or stress characteristics (timing-sensitive)
# Skipped by default with task test:integrations
# Run with: task test:benchmark or pytest -m benchmark
@pytest.mark.slow
# Informational: Tests that take >1 second (usually also marked local)
@pytest.mark.statistical
# Informational: Tests involving randomness (usually also marked local)
Built-in Markers (pytest)#
@pytest.mark.skip(reason="...")
# Skip this test unconditionally
@pytest.mark.skipif(condition, reason="...")
# Skip this test if condition is True
@pytest.mark.parametrize("arg", [1, 2, 3])
# Run this test multiple times with different parameters
Configuration#
Pytest marker configuration is in pyproject.toml:
[tool.pytest.ini_options]
markers = [
"local: tests that should only run locally (slow, probabilistic, or resource-intensive)",
"slow: tests that take more than 1 second to run (informational)",
"statistical: tests that involve randomness and may occasionally fail (informational)",
]
# Skip local-only tests by default
addopts = [
"-m", "not local",
]
Test Organization#
Tests follow the source code structure:
src/haniwers/v1/daq/mocker.py # Source module
tests/v1/unit/daq/mocker/ # Test directory
├── test_mocker.py # Tests for Mocker class
├── test_random_mocker.py # Tests for RandomMocker class
└── test_load_events.py # Tests for load_events() function
Naming conventions:
Test files:
test_<module_name>.pyTest classes:
Test<ClassName><Aspect>Test functions:
test_<what_it_tests>
Example:
# tests/v1/unit/daq/mocker/test_mocker.py
class TestMockerInitialization:
"""Test Mocker initialization behavior."""
def test_init_with_csv_path(self):
"""Mocker should initialize with valid CSV path."""
...
def test_init_without_csv_path_raises_error(self):
"""Mocker should raise ValueError if csv_path is None."""
...
CI/CD Behavior#
Default CI run:
pytest # Runs only fast unit tests (skips local and benchmark tests)
Result:
Fast feedback (~10 seconds for unit tests)
Reliable (no flaky tests or timing-sensitive tests)
Runs on every commit/PR
Integration tests (without benchmarks):
task test:integrations # Skips benchmark tests
# or
pytest tests/v1/integrations/ -m "not benchmark"
Result:
Quick functional integration tests (~25 seconds)
Validates multi-component workflows
Skips performance-focused tests
Local development (all tests):
# Before committing - run everything
pytest -m "" # Run all tests including local and benchmark
# Or using Taskfile
task test:all
Optional: Performance analysis
# Run benchmarks separately (if investigating performance)
task test:benchmark
# Or show performance profile of all tests
pytest --durations=20
Common Questions#
Q: My test is slow but tests important functionality. What should I do?#
A: Mark it as local and add a docstring explaining why:
@pytest.mark.local
def test_important_but_slow():
"""Tests critical edge case - marked local because it's slow.
Why marked local:
- Slow: Takes 3 seconds due to large dataset
- Important: Tests rare but critical edge case
This test ensures correctness for boundary condition X.
"""
...
Q: Should I widen timing tolerances or mark as local?#
A: Both approaches are valid:
Widen tolerances if testing behavior (e.g., “no double sleep”):
def test_no_double_sleep(): # We care that it sleeps ONCE, not TWICE # Exact timing doesn't matter (0.6-2.0s is fine) assert 0.6 < duration < 2.0
Mark as local if testing precise timing:
@pytest.mark.local def test_precise_timing(): # This test needs precise timing measurements assert 0.95 < duration < 1.05
Q: How do I know if a test should be local or benchmark?#
Decision tree:
Does it measure performance? (timing, speed, throughput, stress)
Mark as
@pytest.mark.benchmark
Does it take >1 second?
If performance-focused → Mark as
@pytest.mark.benchmarkIf functionality-focused → Mark as
@pytest.mark.local
Does it use randomness? → Mark as
@pytest.mark.localDoes it use >1000 rows/events?
If stress testing → Mark as
@pytest.mark.benchmarkIf data validation → Mark as
@pytest.mark.local
Otherwise → Keep as unit test
Key distinction:
Benchmark: “Does the system run fast enough?” (performance)
Local: “Does the system work correctly?” (functionality)
Unit: “Does this function work?” (fast, deterministic)
Q: Can I run only specific tests with markers?#
A: Yes, combine markers with file/test selection:
# Run only local tests in mocker suite
pytest tests/v1/unit/daq/mocker/ -m local
# Run specific local test
pytest tests/v1/unit/daq/mocker/test_random_mocker.py::test_value_distribution
# Run local tests matching pattern
pytest -m local -k "distribution"
# Run only benchmark tests
pytest tests/v1/integrations/ -m benchmark
# Run benchmark test matching pattern
pytest tests/v1/integrations/ -m benchmark -k "stress"
# Run everything EXCEPT benchmark tests
pytest tests/v1/integrations/ -m "not benchmark"
# Run benchmarks with verbose output
pytest tests/v1/integrations/ -m benchmark -v
Further Reading#
Project constitution - Testing requirements
Taskfile.yml - Available test commands
Summary#
Golden rules:
Keep unit tests fast (<1 second)
Test behavior, not performance
Add clear docstrings to marked tests
Prefer small datasets (10-100 rows)
Separate performance tests from functional tests
When in doubt:
If it measures performance/timing → mark as
@pytest.mark.benchmarkIf it’s slow (>1s) and functional → mark as
@pytest.mark.localIf it’s random → mark as
@pytest.mark.localIf it tests correctness → unit test
If it tests system limits/stress → mark as
@pytest.mark.benchmark
Test command cheat sheet:
task test # Fast unit tests (default)
task test:integrations # Functional integration tests (no benchmarks)
task test:benchmark # Performance/timing/stress tests only
task test:all # Everything including local and benchmarks
Happy testing! 🎉