haniwers.v1.schema#
Column definitions and schema constants for raw2csv data processing pipeline.
This module defines column names as constants to prevent duplication and ensure consistency across reader, transformer, and aggregator modules.
Design Note (Principle VIII - DRY):
All column names defined in one place
Imported by other modules to maintain consistency
Changes to schema only require updates here
Module Contents#
Functions#
Check if DataFrame has all expected columns. |
Data#
API#
- haniwers.v1.schema.RAW_COLUMNS#
[‘timestamp’, ‘top’, ‘mid’, ‘btm’, ‘adc’, ‘tmp’, ‘atm’, ‘hmd’]
- haniwers.v1.schema.PROCESSED_COLUMNS#
[‘datetime’, ‘top’, ‘mid’, ‘btm’, ‘adc’, ‘tmp’, ‘atm’, ‘hmd’, ‘hit_top’, ‘hit_mid’, ‘hit_btm’, 'hit_…
- haniwers.v1.schema.RESAMPLED_COLUMNS#
[‘time’, ‘events’, ‘hit_top’, ‘hit_mid’, ‘hit_btm’, ‘hit_type’, ‘adc’, ‘tmp’, ‘atm’, ‘hmd’, 'adc_std…
- haniwers.v1.schema.validate_dataframe_schema(df, expected_columns: list) bool#
Check if DataFrame has all expected columns.
This function validates that a DataFrame contains all required columns, which is essential for ensuring data integrity through the processing pipeline.
Args: df: pandas or polars DataFrame to validate expected_columns: list of column names to check for
Returns: True if all columns present, False otherwise
Example: >>> import pandas as pd >>> df = pd.DataFrame({“a”: [1, 2], “b”: [3, 4]}) >>> validate_dataframe_schema(df, [“a”, “b”]) True >>> validate_dataframe_schema(df, [“a”, “c”]) False