Nan Transformer#

class sdgx.data_processors.transformers.nan.NonValueTransformer[source]#

Bases: Transformer

A transformer class designed to handle missing values in a DataFrame. It can either drop rows with missing values or fill them with specified values.

int_columns#

A set of column names that contain integer values.

Type:

set

float_columns#

A set of column names that contain float values.

Type:

set

column_list#

A list of all column names in the DataFrame.

Type:

list

fill_na_value_int#

The value to fill missing integer values with. Default is 0.

Type:

int

fill_na_value_float#

The value to fill missing float values with. Default is 0.0.

Type:

float

fill_na_value_default#

The value to fill missing values for non-numeric columns with. Default is ‘NAN_VALUE’.

Type:

str

drop_na#

A flag indicating whether to drop rows with missing values. If True, rows with missing values are dropped. If False, missing values are filled with specified values. Default is False.

Type:

bool

_fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#

Fit the data processor.

Called before convert and reverse_convert.

Parameters:

metadata (Metadata, optional) – Metadata. Defaults to None.

static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#

Attach additional columns to an existing DataFrame.

Parameters:
  • tabular_data (-) – The original DataFrame.

  • new_columns (-) – The DataFrame containing additional columns to be attached.

Returns:

The DataFrame with new_columns attached.

Return type:

  • result_data (pd.DataFrame)

Raises:

- ValueError – If the number of rows in tabular_data and new_columns are not the same.

check_fitted()#

Check if the processor is fitted.

Raises:

SynthesizerProcessorError – If the processor is not fitted.

column_list: list#

A list of all column names in the DataFrame.

convert(raw_data: DataFrame) DataFrame[source]#

Convert method to handle missing values in the input data.

drop_na: bool#

A boolean flag indicating whether to drop rows with missing values or fill them with fill_na_value.

If True, rows with missing values will be dropped. If False, missing values will be filled with fill_na_value.

Currently, the default setting is False, which means rows with missing values are not dropped.

fill_na_value_default: str#

The value to fill missing values for non-numeric columns with. Default is ‘NAN_VALUE’.

fill_na_value_float: float#

The value to fill missing float values with. Default is 0.0.

fill_na_value_int: int#

The value to fill missing integer values with. Default is 0.

fit(metadata: Metadata | None = None, **kwargs: dict[str, Any])[source]#

Fit method for the transformer.

fitted = False#
float_columns: set#

A set of column names that contain float values.

int_columns: set#

A set of column names that contain integer values.

static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#

Remove specified columns from the input tabular data.

Parameters:
  • tabular_data (-) – Processed tabular data

  • column_name_to_remove (-) – List of column names to be removed

Returns:

Tabular data with specified columns removed

Return type:

  • result_data (pd.DataFrame)

reverse_convert(processed_data: DataFrame) DataFrame[source]#

Reverse_convert method for the transformer.

Does not require any action.