Numeric Transformer#
- class sdgx.data_processors.transformers.numeric.NumericValueTransformer[source]#
Bases:
TransformerA transformer class for numeric data.
This class is used to transform numeric data by scaling it using the StandardScaler from sklearn.
- standard_scale#
A flag indicating whether to scale the data using StandardScaler.
- Type:
bool
- int_columns#
A set of column names that are of integer type.
- Type:
Set
- float_columns#
A set of column names that are of float type.
- Type:
Set
- scalers#
A dictionary of scalers for each numeric column.
- Type:
Dict
- _covert_column(column_name: str, column_data: DataFrame)[source]#
Convert every numeric (include int and float) column.
- _covert_column_scale(column_name: str, column_data: DataFrame)[source]#
Convert every numeric (include int and float) column using sklearn StandardScaler.
- _fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#
Fit the data processor.
Called before
convertandreverse_convert.- Parameters:
metadata (Metadata, optional) – Metadata. Defaults to None.
- _fit_column(column_name: str, column_data: DataFrame) ndarray[source]#
Fit every numeric (include int and float) column in _fit_column.
- _fit_column_scale(column_name: str, column_data: DataFrame) ndarray[source]#
Fit every numeric (include int and float) column using sklearn StandardScaler.
- _reverse_convert_column(column_name: str, column_data: DataFrame)[source]#
Reverse convert method for each column.
- _reverse_convert_column_scale(column_name: str, column_data: DataFrame)[source]#
Reverse convert method for input column using scale method.
- static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#
Attach additional columns to an existing DataFrame.
- Parameters:
tabular_data (-) – The original DataFrame.
new_columns (-) – The DataFrame containing additional columns to be attached.
- Returns:
The DataFrame with new_columns attached.
- Return type:
result_data (pd.DataFrame)
- Raises:
- ValueError – If the number of rows in tabular_data and new_columns are not the same.
- check_fitted()#
Check if the processor is fitted.
- Raises:
SynthesizerProcessorError – If the processor is not fitted.
- convert(raw_data: DataFrame) DataFrame[source]#
Convert method to handle missing values in the input data.
- fit(metadata: Metadata | None = None, tabular_data: DataLoader | DataFrame | None = None, **kwargs: dict[str, Any])[source]#
The fit method.
Data columns of int and float types need to be recorded here (Get data from metadata).
- fitted = False#
- float_columns: Set#
A set of column names that are of float type. These columns will be considered for scaling if standard_scale is True.
- int_columns: Set#
A set of column names that are of integer type. These columns will be considered for scaling if standard_scale is True.
- static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#
Remove specified columns from the input tabular data.
- Parameters:
tabular_data (-) – Processed tabular data
column_name_to_remove (-) – List of column names to be removed
- Returns:
Tabular data with specified columns removed
- Return type:
result_data (pd.DataFrame)
- reverse_convert(processed_data: DataFrame) DataFrame[source]#
Reverse convert method, convert generated data into processed data.
- scalers: Dict#
A dictionary of scalers for each numeric column. The keys are the column names and the values are the corresponding scalers.
- standard_scale: bool = True#
A flag indicating whether to scale the data using StandardScaler. If True, the data will be scaled using StandardScaler. If False, the data will not be scaled.