haniwers.v1.preprocess.reader#
File I/O layer for loading raw detector CSV files.
This module handles reading detector data from CSV files using Polars for high performance. It includes validation to ensure required columns are present.
Design (Principle IX - SRP):
Single responsibility: Load CSV files and validate schema
Used by converter to load raw data before transformation
Returns polars DataFrame for fast I/O
References:
FR-001: Load multiple CSV files and combine into single dataset
ADR-012: Pure functions, easy to unit test
Module Contents#
Functions#
Load multiple CSV files and combine into single DataFrame. |
|
Check if DataFrame has valid column headers. |
|
Check that all required columns exist in DataFrame. |
API#
- haniwers.v1.preprocess.reader.load_csv_files(file_paths: List[pathlib.Path]) polars.DataFrame#
Load multiple CSV files and combine into single DataFrame.
Uses Polars for fast CSV parsing and combines all files into a single DataFrame for processing.
Automatically detects whether CSV files have headers or not:
Files with headers: Reads normally with column names from first row
Files without headers: Applies RAW_COLUMNS as column names
Args: file_paths: List of Path objects to CSV files containing detector data
Returns: Combined polars DataFrame with all rows from input files
Raises: ValueError: If any file is missing required columns
Example: >>> from pathlib import Path >>> files = [Path(“run93_001.csv”), Path(“run93_002.csv”)] >>> df = load_csv_files(files) >>> print(df.shape) (13856, 8)
- haniwers.v1.preprocess.reader._has_valid_headers(df: polars.DataFrame) bool#
Check if DataFrame has valid column headers.
Validates that the first row contains expected column names rather than data values.
Args: df: polars DataFrame to check
Returns: True if columns match RAW_COLUMNS, False otherwise
- haniwers.v1.preprocess.reader.validate_columns(df: polars.DataFrame) None#
Check that all required columns exist in DataFrame.
Validates that a polars DataFrame has all the columns expected from the OSECHI detector output before processing.
Args: df: polars DataFrame to validate
Raises: ValueError: If any required columns are missing