haniwers.v1.preprocess.reader#

File I/O layer for loading raw detector CSV files.

This module handles reading detector data from CSV files using Polars for high performance. It includes validation to ensure required columns are present.

Design (Principle IX - SRP):

  • Single responsibility: Load CSV files and validate schema

  • Used by converter to load raw data before transformation

  • Returns polars DataFrame for fast I/O

References:

  • FR-001: Load multiple CSV files and combine into single dataset

  • ADR-012: Pure functions, easy to unit test

Module Contents#

Functions#

load_csv_files

Load multiple CSV files and combine into single DataFrame.

_has_valid_headers

Check if DataFrame has valid column headers.

validate_columns

Check that all required columns exist in DataFrame.

API#

haniwers.v1.preprocess.reader.load_csv_files(file_paths: List[pathlib.Path]) polars.DataFrame#

Load multiple CSV files and combine into single DataFrame.

Uses Polars for fast CSV parsing and combines all files into a single DataFrame for processing.

Automatically detects whether CSV files have headers or not:

  • Files with headers: Reads normally with column names from first row

  • Files without headers: Applies RAW_COLUMNS as column names

Args: file_paths: List of Path objects to CSV files containing detector data

Returns: Combined polars DataFrame with all rows from input files

Raises: ValueError: If any file is missing required columns

Example: >>> from pathlib import Path >>> files = [Path(“run93_001.csv”), Path(“run93_002.csv”)] >>> df = load_csv_files(files) >>> print(df.shape) (13856, 8)

haniwers.v1.preprocess.reader._has_valid_headers(df: polars.DataFrame) bool#

Check if DataFrame has valid column headers.

Validates that the first row contains expected column names rather than data values.

Args: df: polars DataFrame to check

Returns: True if columns match RAW_COLUMNS, False otherwise

haniwers.v1.preprocess.reader.validate_columns(df: polars.DataFrame) None#

Check that all required columns exist in DataFrame.

Validates that a polars DataFrame has all the columns expected from the OSECHI detector output before processing.

Args: df: polars DataFrame to validate

Raises: ValueError: If any required columns are missing