Discrete Transformer#

class sdgx.data_processors.transformers.discrete.DiscreteTransformer[source]#

Bases: Transformer

A transformer class for handling discrete values in the input data.

This class uses one-hot encoding to convert discrete values into a format that can be used by machine learning models.

discrete_columns#

A list of column names that are of discrete type.

Type:

list

one_hot_warning_cnt#

The warning count for one-hot encoding. If the number of new columns after one-hot encoding exceeds this count, a warning message will be issued.

Type:

int

one_hot_encoders#

A dictionary that stores the OneHotEncoder objects for each discrete column. The keys are the column names, and the values are the corresponding OneHotEncoder objects.

Type:

dict

one_hot_column_names#

A dictionary that stores the new column names after one-hot encoding for each discrete column. The keys are the column names, and the values are lists of new column names.

Type:

dict

onehot_encoder_handle_unknown#

The parameter to handle unknown categories in the OneHotEncoder. If set to ‘ignore’, new categories will be ignored. If set to ‘error’, an error will be raised when new categories are encountered.

Type:

str

fit(metadata

Metadata, tabular_data: DataLoader | pd.DataFrame): Fit the transformer to the input data.

_fit_column(column_name

str, column_data: pd.DataFrame): Fit a single discrete column.

convert(raw_data

pd.DataFrame) -> pd.DataFrame: Convert the input data using one-hot encoding.

reverse_convert(processed_data

pd.DataFrame) -> pd.DataFrame: Reverse the one-hot encoding process to get the original data.

_fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#

Fit the data processor.

Called before convert and reverse_convert.

Parameters:

metadata (Metadata, optional) – Metadata. Defaults to None.

_fit_column(column_name: str, column_data: DataFrame)[source]#

Fit every discrete column in _fit_column.

Parameters:
  • column_data (-) – A dataframe containing a column.

  • column_name (-) – str: column name.

static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#

Attach additional columns to an existing DataFrame.

Parameters:
  • tabular_data (-) – The original DataFrame.

  • new_columns (-) – The DataFrame containing additional columns to be attached.

Returns:

The DataFrame with new_columns attached.

Return type:

  • result_data (pd.DataFrame)

Raises:

- ValueError – If the number of rows in tabular_data and new_columns are not the same.

check_fitted()#

Check if the processor is fitted.

Raises:

SynthesizerProcessorError – If the processor is not fitted.

convert(raw_data: DataFrame) DataFrame[source]#

Convert method to handle discrete values in the input data.

discrete_columns: list#

Record which columns are of discrete type.

fit(metadata: Metadata, tabular_data: DataLoader | DataFrame)[source]#

Fit method for the DiscreteTransformer.

fitted = False#
one_hot_column_names: dict#

A dictionary that stores the new column names after one-hot encoding for each discrete column. The keys are the column names, and the values are lists of new column names.

one_hot_encoders: dict#

A dictionary that stores the OneHotEncoder objects for each discrete column. The keys are the column names, and the values are the corresponding OneHotEncoder objects.

one_hot_warning_cnt: int#

The warning count for one-hot encoding. If the number of new columns after one-hot encoding exceeds this count, a warning message will be issued.

onehot_encoder_handle_unknown: str#

The parameter to handle unknown categories in the OneHotEncoder. If set to ‘ignore’, new categories will be ignored. If set to ‘error’, an error will be raised when new categories are encountered.

static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#

Remove specified columns from the input tabular data.

Parameters:
  • tabular_data (-) – Processed tabular data

  • column_name_to_remove (-) – List of column names to be removed

Returns:

Tabular data with specified columns removed

Return type:

  • result_data (pd.DataFrame)

reverse_convert(processed_data: DataFrame) DataFrame[source]#

Reverse_convert method for the transformer.

Parameters:

processed_data (-) – A dataframe containing onehot encoded columns.

Returns:

inverse transformed processed data.

Return type:

  • pd.DataFrame