# ADR-020: Parallel Threshold Scanning Command

**Status**: Accepted

**Date**: 2025-10-27

## Context

Following ADR-019 (Serial Threshold Scanning), we need to support parallel scanning mode where multiple channels' thresholds are stepped simultaneously in coordinated fashion.

**Difference from Serial**:

- **Serial**: One channel at a time (slower, simpler logic)
  - Natural order: scan ch1 fully, then ch2, then ch3
  - Outer loop: channels; inner loop: threshold values
  - Clear separation reduces noise

- **Parallel**: All channels step together (faster, coordinated)
  - Step together: all channels at step 0, then step 1, etc.
  - Outer loop: steps; inner loop: channels
  - Captures cross-channel effects at each step
  - Requires all channels have same number of steps

**Use Cases**:

- Characterize how threshold changes on one channel affect others
- Simultaneous multi-channel optimization
- Faster characterization (single measurement per step)
- Statistical analysis of inter-channel correlations

**Key Requirement**: All channels must have identical step counts

- If ch1 has 21 steps but ch2 has 19 steps → error
- Enforced at CLI validation layer (fail fast)
- User can adjust nsteps to match all channels

## Decision

We implement parallel threshold scanning with strict validation and coordinated stepping:

### Layer 1: CLI Command Handler

**File**: `src/haniwers/v1/cli/threshold.py`
**Function**: `parallel()`

Responsibilities:

- Parse CLI arguments (same as serial: --thresholds, --nsteps, --step, --duration)
- Load configuration and apply CLI overrides
- **Validate that all channels have compatible threshold ranges**
- Delegate to business logic layer
- Display results to user

Validation (NEW for parallel):
```python
# Step 3: Validate threshold ranges are compatible
try:
    validate_threshold_ranges(cfg.sensors, scan_type="parallel")
except ValueError as e:
    typer.echo(f"[ERROR] {e}", err=True)
    raise typer.Exit(code=1)
```

### Layer 2: Business Logic Module

**File**: `src/haniwers/v1/threshold/parallel.py`
**Function**: `run_parallel_threshold_scan()`

Responsibilities:

- Implement parallel (coordinated) scanning algorithm
- Manage device lifecycle
- Execute coordinated threshold stepping loops
- Collect data once per step (all channels active)
- Save results to per-channel CSV files
- Error handling (if any channel fails, skip that step)

Algorithm:

```
1. Build all_ranges: {ch: [vth1, vth2, ...]} for each sensor
2. Validate (precondition): all ranges have same length
3. For each step_idx in range(num_steps):
   a. Get threshold_values: {ch: threshold_at_step} for all channels
   b. Apply all thresholds simultaneously
   c. If all succeeded: collect data once for all channels
   d. Save results for each channel
   e. If any failed: skip data collection, log warning
```

## Consequences

### Positive

- **Efficiency**: One measurement per step (vs. serial: one per channel×step)
  - Serial: 3 channels × 21 steps = 63 measurements
  - Parallel: 21 measurements (same duration, more multi-channel data)

- **Coordinated Data**: Captures simultaneous multi-channel response
  - Can analyze cross-channel correlations
  - More realistic detector behavior
  - Useful for inter-channel threshold balancing

- **Independent from Serial**: Separate implementation
  - No code duplication
  - Can evolve independently
  - Clear responsibilities

- **Strict Validation**: Fail-fast on configuration errors
  - User sees clear error message immediately
  - Prevents confusing "missing data for channel X at step Y" errors
  - Guides users to fix configuration (adjust nsteps)

### Negative

- **Configuration Constraint**: All channels must match
  - Users cannot have ch1 with 10 steps and ch2 with 5 steps
  - Workaround: adjust all to common value (e.g., 10)
  - Reduced flexibility vs. serial

- **Error Handling Complexity**: One channel failure skips step
  - If ch2 write fails at step 5, no data collected at step 5
  - Alternative: partial data collection (rejected - causes confusion)
  - Current approach: conservative, favors data consistency

- **Validator Function**: New `validate_threshold_ranges()` in helpers/validator.py
  - Added complexity to validator module
  - Only used by parallel (not general utility)
  - Could extract to threshold-specific module later

## Alternatives Considered

### 1. Allow Varying Step Counts (Flexible Parallelism)
**Decision**: REJECTED

- ✓ More flexible: ch1 with 21 steps, ch2 with 10 steps OK
- ✓ Better for heterogeneous channels
- ✗ Complex implementation: need iteration in columns
- ✗ Confusing results: missing data in some CSVs
- ✗ Hard to debug: unclear which channel stopped at which step
- ✗ Error-prone: subtle bugs in step iteration

### 2. Partial Data Collection on Failure
**Decision**: REJECTED

- ✓ Recovers from some failures
- ✗ Inconsistent results: ch1 has step 5 data, ch2 doesn't
- ✗ Hard to analyze: CSV has missing rows
- ✗ Silent failures: user might not notice missing channel

### 3. Combine into Single Function (run_threshold_scan with mode parameter)
**Decision**: REJECTED

- ✓ Less code duplication
- ✗ Violates Single Responsibility Principle
- ✗ Hard to test: mixed logic
- ✗ Harder to maintain: different algorithms in one function
- ✗ Future: can't deprecate one mode without affecting other

## Implementation Status

✅ **COMPLETE** - All core implementation tasks finished (2025-10-27)

- [x] Plan documented in `planning/cli-threshold-scan.md`
- [x] Validation function `validate_threshold_ranges()` added to `helpers/validator.py`
- [x] Create `src/haniwers/v1/threshold/parallel.py` with `run_parallel_threshold_scan()`
- [x] Add `parallel()` command to `src/haniwers/v1/cli/threshold.py`
- [x] Write unit tests for `run_parallel_threshold_scan()` (6 tests)
- [x] Write integration tests for `threshold parallel` command (3 tests)
- [x] Test validation error messages (4 validation tests)
- [x] Test with mock device (all 19 tests passing)
- [ ] Test with real hardware (optional, pending lab access)

## Testing Strategy

✅ **IMPLEMENTED** - 19 tests, 100% passing

### Unit Tests - Stepping Algorithm
**File**: `tests/v1/unit/threshold/parallel/test_stepping.py`

- [x] `test_generate_threshold_values_basic()` - threshold = center + step*size
- [x] `test_generate_threshold_values_clipping()` - clip to [1, 1023]
- [x] `test_all_channels_step_together()` - verify coordinated stepping

### Unit Tests - Result Files
**File**: `tests/v1/unit/threshold/parallel/test_result_files.py`

- [x] `test_csv_files_created_per_channel()` - one file per channel
- [x] `test_csv_append_mode()` - header once, data appended
- [x] `test_audit_log_created()` - threshold_operations.csv created

### Unit Tests - Validation
**File**: `tests/v1/unit/threshold/parallel/test_validation.py`

- [x] `test_matched_nsteps_passes()` - all channels with same nsteps → pass
- [x] `test_mismatched_nsteps_fails()` - mismatched nsteps → ValueError
- [x] `test_error_message_shows_breakdown()` - error shows step counts: {1: 8, 2: 10}
- [x] `test_validation_not_called_for_serial()` - serial mode skips validation

### Unit Tests - Error Handling
**File**: `tests/v1/unit/threshold/parallel/test_error_handling.py`

- [x] `test_clear_error_message()` - validator error is clear and helpful
- [x] `test_validation_called_before_scan()` - validation happens at orchestrator
- [x] `test_skipped_steps_tracked()` - ParallelScanResult tracks skipped steps
- [x] `test_error_messages_aggregated()` - all errors collected in result
- [x] `test_apply_threshold_retry_returns_bool()` - retry logic never raises
- [x] `test_apply_threshold_success_returns_true()` - success returns bool

### Integration Tests
**File**: `tests/v1/integrations/threshold/test_parallel_scan.py`

- [x] `test_basic_parallel_scan_with_mock()` - full end-to-end with mock device
- [x] `test_parallel_scan_with_failing_device()` - error recovery with simulated failures
- [x] `test_parallel_scan_produces_output_files()` - verifies all expected output files created

### Test Coverage Results

| Component | Target | Achieved |
|-----------|--------|----------|
| `run_parallel_threshold_scan()` | 90%+ | ✅ Complete with comprehensive tests |
| `parallel()` CLI handler | 80%+ | ✅ Integration tests verify CLI flow |
| `validate_threshold_ranges()` | 100% | ✅ 100% (validation critical) |
| **Overall** | **80%+** | ✅ **19/19 tests passing (100%)** |

## Related Decisions

- **ADR-019**: Serial Threshold Scanning (independent, complementary)
- **ADR-014**: Threshold Writer Refactoring (apply_threshold API)
- **ADR-015**: Common CLI Options Module (shared parameters)
- **ADR-007**: Mock Device Abstraction (testing support)

## References

**Implementation Guidance**:

- `planning/cli-threshold-scan.md`: Parallel Threshold Scanning Flow (code structure section)

**Validation Function**:

- `src/haniwers/v1/helpers/validator.py`: `validate_threshold_ranges(sensors, scan_type="parallel")`

**Related Source Files**:

- `src/haniwers/v1/cli/threshold.py`: CLI command entry point
- `src/haniwers/v1/threshold/parallel.py`: NEW - business logic module
- `src/haniwers/v1/threshold/writer.py`: apply_threshold() API
- `src/haniwers/v1/config/model.py`: SensorConfig.threshold_range() method

**Example Configuration** (All channels have same nsteps):

```toml
[sensors.ch1]
id = 1
center = 200
nsteps = 8        # All channels MUST have same nsteps
step_size = 5

[sensors.ch2]
id = 2
center = 300
nsteps = 8        # ← MUST match ch1
step_size = 5

[sensors.ch3]
id = 3
center = 250
nsteps = 8        # ← MUST match ch1, ch2
step_size = 5
```

**Usage Example**:

```bash
# Parallel scan - all channels step together
haniwers-v1 threshold parallel \
  --thresholds "1:200;2:300;3:250" \
  --nsteps 8 \
  --step 5 \
  --duration 5 \
  --mock

# Results (all channels at each step):
# Step 0: all channels at vth=160, collect data once, save for ch1/ch2/ch3
# Step 1: all channels at vth=165, collect data once, save for ch1/ch2/ch3
# ...
# Step 8: all channels at vth=200, collect data once, save for ch1/ch2/ch3
```

**Validation Error Example**:

```bash
$ haniwers-v1 threshold parallel --thresholds "1:200;2:300" --nsteps 10 --step 5

# Error: Parallel scanning requires all channels to have same number of steps.
# Got: {1: 21, 2: 19}.
# Ensure all channels have same center, nsteps, and step_size configuration.
```

## Notes

- **Independent Implementation**: Serial and Parallel are separate
  - Different algorithms (no shared code)
  - Can be tested independently
  - Can be deprecated separately in future

- **Validation at CLI Level**: Fail-fast approach
  - Better than discovering missing data during analysis
  - Clear error messages guide users to fix config
  - Validator reusable for other tools

- **Efficiency Gain**: Real benefit from parallelism
  - Multi-channel correlation analysis
  - Time savings for characterization
  - More realistic detector representation

- **Conservative Error Handling**: Skip step on failure
  - Ensures CSV consistency (all channels present or absent)
  - Simplifies analysis (no missing data surprises)
  - Alternative: retry logic (future enhancement)