# Testing Guidelines

This document explains how to write and run tests in the Haniwers project.

## Quick Reference

**Three types of tests:**

- ✅ **Unit tests** - Fast (<1s), test one thing, always pass (default)
- ⚠️ **Local tests** - Slow or probabilistic, marked with `@pytest.mark.local`
- 🚀 **Benchmarks** - Performance measurements, marked with `@pytest.mark.benchmark`

## When to Write Each Type

### Unit Tests (Default) ✅

**Write unit tests when:**

- Testing functionality correctness (does it work?)
- Test runs in <1 second
- Test is deterministic (same result every time)
- Test verifies behavior, not performance

**Examples:**

- Does `load_events()` return a list of MockEvent objects?
- Does `readline()` return bytes?
- Does `close()` set `is_open=False`?
- Does field generation produce values in valid ranges?

**Characteristics:**

- Fast execution (<1 second per test)
- No randomness or statistical checks
- No large datasets (prefer 10-100 rows)
- Runs on every code change (CI/CD)

### Local Tests (@pytest.mark.local) ⚠️

**Mark as `local` when:**

- Test takes >1 second to run
- Test uses randomness (might occasionally fail)
- Test requires large datasets (>1000 rows)
- Test measures timing (might fail on slow CI)
- Test requires external resources (memory profiling tools)

**Examples:**

- Loading 10K rows to verify performance
- Statistical distribution tests (might fail 1 in 1000 runs)
- Boundary value tests requiring 2000+ samples
- Stress tests with 1000+ events

**Characteristics:**

- Slow execution (>1 second)
- May use statistical assertions
- May occasionally fail due to randomness
- Skipped by default, run with `pytest -m local`

### Benchmarks (@pytest.mark.benchmark) 🚀

**Use benchmarks when:**

- Specifically measuring performance (timing, speed, throughput)
- Testing large-scale data (1000+ events)
- Stress testing system limits
- Profiling code execution
- Comparing performance of different implementations

**Examples:**

- RandomMocker at 100x speed (test_random_mocker_high_speed)
- Generating 1000 events for stress testing (test_random_mocker_stress_test)
- Measuring playback speed accuracy (test_mocker_speed_adjustment)
- Testing timing variation bounds (test_mocker_jitter)

**Characteristics:**

- Measures performance, not functionality
- May include explicit timing assertions
- Runs separately from functional tests
- Skipped by default with `task test:integrations`
- Grouped in separate benchmark test suite

**Current benchmarks in project:**

- 5 benchmark tests in `tests/v1/integrations/daq/`
- Performance measurement focus
- Marked with `@pytest.mark.benchmark`
- Can be run with `task test:benchmark`

## How to Run Tests

### Quick Commands (Using Taskfile)

```bash
# Run fast unit tests only (default)
task test

# Run all v1 tests (unit + integrations, skips benchmarks)
task test:full

# Run integration tests (skips benchmarks)
task test:integrations

# Run only benchmark tests (performance, timing, stress tests)
task test:benchmark

# Run only mocker unit tests
poetry run pytest tests/v1/unit/daq/mocker/

# Run only local tests
task test:local

# Run all tests including local and benchmark
task test:all

# Show performance profile (slowest 20 tests)
task test:perf:mocker

# Show all registered markers
task test:markers
```

### Direct pytest Commands

```bash
# Run fast unit tests (default - skips local tests)
pytest

# Run all tests including slow ones
pytest -m ""

# Run only local tests
pytest -m local

# Run only benchmark tests
pytest -m benchmark tests/v1/integrations/

# Run integration tests (skips benchmarks)
pytest -m "not benchmark" tests/v1/integrations/

# Run specific test file
pytest tests/v1/unit/daq/mocker/test_mocker.py

# Run with performance profiling
pytest tests/v1/unit/daq/mocker/ --durations=20

# Run with coverage
pytest tests/v1/ --cov=haniwers.v1 --cov-report=term
```

## How to Mark Tests

### Marking a test as local

```python
import pytest

@pytest.mark.local
def test_large_dataset():
    """Test with 10K rows - marked local because it's slow.

    This test takes ~5 seconds to run, so we mark it as local-only.
    It still tests important functionality (loading large files) but
    doesn't need to run on every code change.

    Why marked local:
        - Slow: Takes >1 second to run
        - Resource-intensive: Generates 10K events
    """
    # Generate large dataset
    events = [generate_event() for _ in range(10000)]

    # Test functionality
    assert len(events) == 10000
    assert all(isinstance(e, MockEvent) for e in events)
```

### Marking with multiple markers

```python
@pytest.mark.local
@pytest.mark.slow
@pytest.mark.statistical
def test_randomness_quality():
    """Test randomness distribution - marked local because statistical.

    Why marked local:
        - Slow: Takes >1 second with 1000 samples
        - Statistical: Might fail 1 in 1000 runs due to random variation
    """
    # Generate random samples
    samples = [generate_random_value() for _ in range(1000)]

    # Statistical assertion (might occasionally fail)
    mean = sum(samples) / len(samples)
    assert 45 < mean < 55  # Expect mean ~50 with some tolerance
```

## Best Practices

### 1. Keep unit tests fast

**Bad (slow test):**

```python
def test_processing():
    # Processes 10K rows just to test basic functionality
    data = load_large_file(10000)
    result = process(data)
    assert result is not None
```

**Good (fast test):**

```python
def test_processing():
    # Tests with minimal data (10 rows)
    data = load_sample_data(10)
    result = process(data)
    assert len(result) == 10
    assert all(isinstance(r, ProcessedData) for r in result)
```

**Rationale:** If your test is slow, ask "can I test this with less data?"

### 2. Test behavior, not performance

**Bad (performance test in unit suite):**

```python
def test_load_speed():
    """This test only checks speed, not correctness."""
    start = time.time()
    data = load_events(path, 10000)
    duration = time.time() - start
    assert duration < 5.0  # Only tests speed
```

**Good (functional test):**

```python
def test_load_correctness():
    """This test checks correctness with small dataset."""
    data = load_events(path, 100)
    assert len(data) == 100
    assert all(isinstance(e, MockEvent) for e in data)
    assert data[0].deltaT == 1.0  # Verifies behavior
```

**Rationale:** Unit tests verify **correctness**, not **speed**. Performance tests belong in benchmarks with proper statistical analysis.

### 3. Add clear docstrings

**Bad (no explanation):**

```python
@pytest.mark.local
def test_stress():
    # No explanation of why it's marked local
    ...
```

**Good (clear explanation):**

```python
@pytest.mark.local
def test_stress_with_10k_events():
    """Stress test with 10K events - marked local because it's slow.

    Why marked local:
        - Slow: Takes ~5 seconds to run
        - Resource-intensive: Generates 10K events

    This test validates that the system can handle large workloads
    without crashes or memory leaks.
    """
    ...
```

**Rationale:** Future developers need to understand **why** a test is marked local.

### 4. Prefer small datasets

**Guideline for sample sizes:**

- **Range checks**: 30-50 samples sufficient
- **Basic functionality**: 10-100 samples
- **Statistical tests**: 200-500 samples (mark as local if >1s)
- **Stress tests**: 1000+ samples (always mark as local)

**Statistical rule of thumb:** For simple range checks, 30-50 samples give 95% confidence. Going to 1000 adds negligible confidence at 20x the cost.

## Pytest Markers Reference

### Custom Markers (Haniwers Project)

```python
@pytest.mark.local
# Tests that should only run locally (slow, probabilistic, or resource-intensive)

@pytest.mark.benchmark
# Tests that measure performance, timing, or stress characteristics (timing-sensitive)
# Skipped by default with task test:integrations
# Run with: task test:benchmark or pytest -m benchmark

@pytest.mark.slow
# Informational: Tests that take >1 second (usually also marked local)

@pytest.mark.statistical
# Informational: Tests involving randomness (usually also marked local)
```

### Built-in Markers (pytest)

```python
@pytest.mark.skip(reason="...")
# Skip this test unconditionally

@pytest.mark.skipif(condition, reason="...")
# Skip this test if condition is True

@pytest.mark.parametrize("arg", [1, 2, 3])
# Run this test multiple times with different parameters
```

## Configuration

Pytest marker configuration is in `pyproject.toml`:

```toml
[tool.pytest.ini_options]
markers = [
    "local: tests that should only run locally (slow, probabilistic, or resource-intensive)",
    "slow: tests that take more than 1 second to run (informational)",
    "statistical: tests that involve randomness and may occasionally fail (informational)",
]

# Skip local-only tests by default
addopts = [
    "-m", "not local",
]
```

## Test Organization

Tests follow the source code structure:

```
src/haniwers/v1/daq/mocker.py     # Source module
tests/v1/unit/daq/mocker/          # Test directory
├── test_mocker.py                 # Tests for Mocker class
├── test_random_mocker.py          # Tests for RandomMocker class
└── test_load_events.py            # Tests for load_events() function
```

**Naming conventions:**

- Test files: `test_<module_name>.py`
- Test classes: `Test<ClassName><Aspect>`
- Test functions: `test_<what_it_tests>`

**Example:**

```python
# tests/v1/unit/daq/mocker/test_mocker.py

class TestMockerInitialization:
    """Test Mocker initialization behavior."""

    def test_init_with_csv_path(self):
        """Mocker should initialize with valid CSV path."""
        ...

    def test_init_without_csv_path_raises_error(self):
        """Mocker should raise ValueError if csv_path is None."""
        ...
```

## CI/CD Behavior

**Default CI run:**

```bash
pytest  # Runs only fast unit tests (skips local and benchmark tests)
```

**Result:**

- Fast feedback (~10 seconds for unit tests)
- Reliable (no flaky tests or timing-sensitive tests)
- Runs on every commit/PR

**Integration tests (without benchmarks):**

```bash
task test:integrations  # Skips benchmark tests
# or
pytest tests/v1/integrations/ -m "not benchmark"
```

**Result:**

- Quick functional integration tests (~25 seconds)
- Validates multi-component workflows
- Skips performance-focused tests

**Local development (all tests):**

```bash
# Before committing - run everything
pytest -m ""  # Run all tests including local and benchmark

# Or using Taskfile
task test:all
```

**Optional: Performance analysis**

```bash
# Run benchmarks separately (if investigating performance)
task test:benchmark

# Or show performance profile of all tests
pytest --durations=20
```

## Common Questions

### Q: My test is slow but tests important functionality. What should I do?

**A:** Mark it as local and add a docstring explaining why:

```python
@pytest.mark.local
def test_important_but_slow():
    """Tests critical edge case - marked local because it's slow.

    Why marked local:
        - Slow: Takes 3 seconds due to large dataset
        - Important: Tests rare but critical edge case

    This test ensures correctness for boundary condition X.
    """
    ...
```

### Q: Should I widen timing tolerances or mark as local?

**A:** Both approaches are valid:

1. **Widen tolerances** if testing behavior (e.g., "no double sleep"):
   ```python
   def test_no_double_sleep():
       # We care that it sleeps ONCE, not TWICE
       # Exact timing doesn't matter (0.6-2.0s is fine)
       assert 0.6 < duration < 2.0
   ```

2. **Mark as local** if testing precise timing:
   ```python
   @pytest.mark.local
   def test_precise_timing():
       # This test needs precise timing measurements
       assert 0.95 < duration < 1.05
   ```

### Q: How do I know if a test should be local or benchmark?

**Decision tree:**

1. **Does it measure performance?** (timing, speed, throughput, stress)
   - Mark as `@pytest.mark.benchmark`

2. **Does it take >1 second?**
   - If performance-focused → Mark as `@pytest.mark.benchmark`
   - If functionality-focused → Mark as `@pytest.mark.local`

3. **Does it use randomness?** → Mark as `@pytest.mark.local`

4. **Does it use >1000 rows/events?**
   - If stress testing → Mark as `@pytest.mark.benchmark`
   - If data validation → Mark as `@pytest.mark.local`

5. **Otherwise** → Keep as unit test

**Key distinction:**
- **Benchmark:** "Does the system run fast enough?" (performance)
- **Local:** "Does the system work correctly?" (functionality)
- **Unit:** "Does this function work?" (fast, deterministic)

### Q: Can I run only specific tests with markers?

**A:** Yes, combine markers with file/test selection:

```bash
# Run only local tests in mocker suite
pytest tests/v1/unit/daq/mocker/ -m local

# Run specific local test
pytest tests/v1/unit/daq/mocker/test_random_mocker.py::test_value_distribution

# Run local tests matching pattern
pytest -m local -k "distribution"

# Run only benchmark tests
pytest tests/v1/integrations/ -m benchmark

# Run benchmark test matching pattern
pytest tests/v1/integrations/ -m benchmark -k "stress"

# Run everything EXCEPT benchmark tests
pytest tests/v1/integrations/ -m "not benchmark"

# Run benchmarks with verbose output
pytest tests/v1/integrations/ -m benchmark -v
```

## Further Reading

- [pytest documentation](https://docs.pytest.org/)
- [pytest markers](https://docs.pytest.org/en/stable/how-to/mark.html)
- [Project constitution](/.specify/memory/constitution.md) - Testing requirements
- [Taskfile.yml](/Taskfile.yml) - Available test commands

## Summary

**Golden rules:**

1. Keep unit tests fast (<1 second)
2. Test behavior, not performance
3. Add clear docstrings to marked tests
4. Prefer small datasets (10-100 rows)
5. Separate performance tests from functional tests

**When in doubt:**

- If it measures performance/timing → mark as `@pytest.mark.benchmark`
- If it's slow (>1s) and functional → mark as `@pytest.mark.local`
- If it's random → mark as `@pytest.mark.local`
- If it tests correctness → unit test
- If it tests system limits/stress → mark as `@pytest.mark.benchmark`

**Test command cheat sheet:**

```bash
task test              # Fast unit tests (default)
task test:integrations # Functional integration tests (no benchmarks)
task test:benchmark    # Performance/timing/stress tests only
task test:all          # Everything including local and benchmarks
```

Happy testing! 🎉