Discrete Transformer#
- class sdgx.data_processors.transformers.discrete.DiscreteTransformer[source]#
Bases:
TransformerA transformer class for handling discrete values in the input data.
This class uses one-hot encoding to convert discrete values into a format that can be used by machine learning models.
- discrete_columns#
A list of column names that are of discrete type.
- Type:
list
- one_hot_warning_cnt#
The warning count for one-hot encoding. If the number of new columns after one-hot encoding exceeds this count, a warning message will be issued.
- Type:
int
- one_hot_encoders#
A dictionary that stores the OneHotEncoder objects for each discrete column. The keys are the column names, and the values are the corresponding OneHotEncoder objects.
- Type:
dict
- one_hot_column_names#
A dictionary that stores the new column names after one-hot encoding for each discrete column. The keys are the column names, and the values are lists of new column names.
- Type:
dict
- onehot_encoder_handle_unknown#
The parameter to handle unknown categories in the OneHotEncoder. If set to ‘ignore’, new categories will be ignored. If set to ‘error’, an error will be raised when new categories are encountered.
- Type:
str
- fit(metadata
Metadata, tabular_data: DataLoader | pd.DataFrame): Fit the transformer to the input data.
- _fit_column(column_name
str, column_data: pd.DataFrame): Fit a single discrete column.
- convert(raw_data
pd.DataFrame) -> pd.DataFrame: Convert the input data using one-hot encoding.
- reverse_convert(processed_data
pd.DataFrame) -> pd.DataFrame: Reverse the one-hot encoding process to get the original data.
- _fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#
Fit the data processor.
Called before
convertandreverse_convert.- Parameters:
metadata (Metadata, optional) – Metadata. Defaults to None.
- _fit_column(column_name: str, column_data: DataFrame)[source]#
Fit every discrete column in _fit_column.
- Parameters:
column_data (-) – A dataframe containing a column.
column_name (-) – str: column name.
- static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#
Attach additional columns to an existing DataFrame.
- Parameters:
tabular_data (-) – The original DataFrame.
new_columns (-) – The DataFrame containing additional columns to be attached.
- Returns:
The DataFrame with new_columns attached.
- Return type:
result_data (pd.DataFrame)
- Raises:
- ValueError – If the number of rows in tabular_data and new_columns are not the same.
- check_fitted()#
Check if the processor is fitted.
- Raises:
SynthesizerProcessorError – If the processor is not fitted.
- convert(raw_data: DataFrame) DataFrame[source]#
Convert method to handle discrete values in the input data.
- discrete_columns: list#
Record which columns are of discrete type.
- fit(metadata: Metadata, tabular_data: DataLoader | DataFrame)[source]#
Fit method for the DiscreteTransformer.
- fitted = False#
- one_hot_column_names: dict#
A dictionary that stores the new column names after one-hot encoding for each discrete column. The keys are the column names, and the values are lists of new column names.
- one_hot_encoders: dict#
A dictionary that stores the OneHotEncoder objects for each discrete column. The keys are the column names, and the values are the corresponding OneHotEncoder objects.
- one_hot_warning_cnt: int#
The warning count for one-hot encoding. If the number of new columns after one-hot encoding exceeds this count, a warning message will be issued.
- onehot_encoder_handle_unknown: str#
The parameter to handle unknown categories in the OneHotEncoder. If set to ‘ignore’, new categories will be ignored. If set to ‘error’, an error will be raised when new categories are encountered.
- static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#
Remove specified columns from the input tabular data.
- Parameters:
tabular_data (-) – Processed tabular data
column_name_to_remove (-) – List of column names to be removed
- Returns:
Tabular data with specified columns removed
- Return type:
result_data (pd.DataFrame)