Numeric Transformer#

class sdgx.data_processors.transformers.numeric.NumericValueTransformer[source]#

Bases: Transformer

A transformer class for numeric data.

This class is used to transform numeric data by scaling it using the StandardScaler from sklearn.

standard_scale#

A flag indicating whether to scale the data using StandardScaler.

Type:: bool

int_columns#

A set of column names that are of integer type.

Type:: Set

float_columns#

A set of column names that are of float type.

Type:: Set

scalers#

A dictionary of scalers for each numeric column.

Type:: Dict

_covert_column(column_name: str, column_data: DataFrame)[source]#: Convert every numeric (include int and float) column.

_covert_column_scale(column_name: str, column_data: DataFrame)[source]#: Convert every numeric (include int and float) column using sklearn StandardScaler.

_fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#

Fit the data processor.

Called before convert and reverse_convert.

Parameters:: metadata (Metadata, optional) – Metadata. Defaults to None.

_fit_column(column_name: str, column_data: DataFrame) → ndarray[source]#: Fit every numeric (include int and float) column in _fit_column.

_fit_column_scale(column_name: str, column_data: DataFrame) → ndarray[source]#: Fit every numeric (include int and float) column using sklearn StandardScaler.

_reverse_convert_column(column_name: str, column_data: DataFrame)[source]#: Reverse convert method for each column.

_reverse_convert_column_scale(column_name: str, column_data: DataFrame)[source]#: Reverse convert method for input column using scale method.

static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) → DataFrame#

Attach additional columns to an existing DataFrame.

Parameters:

tabular_data (-) – The original DataFrame.
new_columns (-) – The DataFrame containing additional columns to be attached.

Returns:

The DataFrame with new_columns attached.

Return type:

result_data (pd.DataFrame)

Raises:

- ValueError – If the number of rows in tabular_data and new_columns are not the same.

check_fitted()#

Check if the processor is fitted.

Raises:: SynthesizerProcessorError – If the processor is not fitted.

convert(raw_data: DataFrame) → DataFrame[source]#: Convert method to handle missing values in the input data.

fit(metadata: Metadata | None = None, tabular_data: DataLoader | DataFrame | None = None, **kwargs: dict[str, Any])[source]#

The fit method.

Data columns of int and float types need to be recorded here (Get data from metadata).

fitted = False#

float_columns: Set#: A set of column names that are of float type. These columns will be considered for scaling if standard_scale is True.

int_columns: Set#: A set of column names that are of integer type. These columns will be considered for scaling if standard_scale is True.

static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) → DataFrame#

Remove specified columns from the input tabular data.

Parameters:

tabular_data (-) – Processed tabular data
column_name_to_remove (-) – List of column names to be removed

Returns:

Tabular data with specified columns removed

Return type:

result_data (pd.DataFrame)

reverse_convert(processed_data: DataFrame) → DataFrame[source]#: Reverse convert method, convert generated data into processed data.

scalers: Dict#: A dictionary of scalers for each numeric column. The keys are the column names and the values are the corresponding scalers.

standard_scale: bool = True#: A flag indicating whether to scale the data using StandardScaler. If True, the data will be scaled using StandardScaler. If False, the data will not be scaled.