haniwers.v1.schema#

Column definitions and schema constants for raw2csv data processing pipeline.

This module defines column names as constants to prevent duplication and ensure consistency across reader, transformer, and aggregator modules.

Design Note (Principle VIII - DRY):

  • All column names defined in one place

  • Imported by other modules to maintain consistency

  • Changes to schema only require updates here

Module Contents#

Functions#

validate_dataframe_schema

Check if DataFrame has all expected columns.

Data#

API#

haniwers.v1.schema.RAW_COLUMNS#

[‘timestamp’, ‘top’, ‘mid’, ‘btm’, ‘adc’, ‘tmp’, ‘atm’, ‘hmd’]

haniwers.v1.schema.PROCESSED_COLUMNS#

[‘datetime’, ‘top’, ‘mid’, ‘btm’, ‘adc’, ‘tmp’, ‘atm’, ‘hmd’, ‘hit_top’, ‘hit_mid’, ‘hit_btm’, 'hit_…

haniwers.v1.schema.RESAMPLED_COLUMNS#

[‘time’, ‘events’, ‘hit_top’, ‘hit_mid’, ‘hit_btm’, ‘hit_type’, ‘adc’, ‘tmp’, ‘atm’, ‘hmd’, 'adc_std…

haniwers.v1.schema.validate_dataframe_schema(df, expected_columns: list) bool#

Check if DataFrame has all expected columns.

This function validates that a DataFrame contains all required columns, which is essential for ensuring data integrity through the processing pipeline.

Args: df: pandas or polars DataFrame to validate expected_columns: list of column names to check for

Returns: True if all columns present, False otherwise

Example: >>> import pandas as pd >>> df = pd.DataFrame({“a”: [1, 2], “b”: [3, 4]}) >>> validate_dataframe_schema(df, [“a”, “b”]) True >>> validate_dataframe_schema(df, [“a”, “c”]) False