Nan Transformer#
- class sdgx.data_processors.transformers.nan.NonValueTransformer[source]#
Bases:
TransformerA transformer class designed to handle missing values in a DataFrame. It can either drop rows with missing values or fill them with specified values.
- int_columns#
A set of column names that contain integer values.
- Type:
set
- float_columns#
A set of column names that contain float values.
- Type:
set
- column_list#
A list of all column names in the DataFrame.
- Type:
list
- fill_na_value_int#
The value to fill missing integer values with. Default is 0.
- Type:
int
- fill_na_value_float#
The value to fill missing float values with. Default is 0.0.
- Type:
float
- fill_na_value_default#
The value to fill missing values for non-numeric columns with. Default is ‘NAN_VALUE’.
- Type:
str
- drop_na#
A flag indicating whether to drop rows with missing values. If True, rows with missing values are dropped. If False, missing values are filled with specified values. Default is False.
- Type:
bool
- _fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#
Fit the data processor.
Called before
convertandreverse_convert.- Parameters:
metadata (Metadata, optional) – Metadata. Defaults to None.
- static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#
Attach additional columns to an existing DataFrame.
- Parameters:
tabular_data (-) – The original DataFrame.
new_columns (-) – The DataFrame containing additional columns to be attached.
- Returns:
The DataFrame with new_columns attached.
- Return type:
result_data (pd.DataFrame)
- Raises:
- ValueError – If the number of rows in tabular_data and new_columns are not the same.
- check_fitted()#
Check if the processor is fitted.
- Raises:
SynthesizerProcessorError – If the processor is not fitted.
- column_list: list#
A list of all column names in the DataFrame.
- convert(raw_data: DataFrame) DataFrame[source]#
Convert method to handle missing values in the input data.
- drop_na: bool#
A boolean flag indicating whether to drop rows with missing values or fill them with fill_na_value.
If True, rows with missing values will be dropped. If False, missing values will be filled with fill_na_value.
Currently, the default setting is False, which means rows with missing values are not dropped.
- fill_na_value_default: str#
The value to fill missing values for non-numeric columns with. Default is ‘NAN_VALUE’.
- fill_na_value_float: float#
The value to fill missing float values with. Default is 0.0.
- fill_na_value_int: int#
The value to fill missing integer values with. Default is 0.
- fit(metadata: Metadata | None = None, **kwargs: dict[str, Any])[source]#
Fit method for the transformer.
- fitted = False#
- float_columns: set#
A set of column names that contain float values.
- int_columns: set#
A set of column names that contain integer values.
- static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#
Remove specified columns from the input tabular data.
- Parameters:
tabular_data (-) – Processed tabular data
column_name_to_remove (-) – List of column names to be removed
- Returns:
Tabular data with specified columns removed
- Return type:
result_data (pd.DataFrame)