DataProcessor#

class sdgx.data_processors.base.DataProcessor[source]#

Bases: object

Base class for data processors.

_fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])[source]#

Fit the data processor.

Called before convert and reverse_convert.

Parameters:

metadata (Metadata, optional) – Metadata. Defaults to None.

static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame[source]#

Attach additional columns to an existing DataFrame.

Parameters:
  • tabular_data (-) – The original DataFrame.

  • new_columns (-) – The DataFrame containing additional columns to be attached.

Returns:

The DataFrame with new_columns attached.

Return type:

  • result_data (pd.DataFrame)

Raises:

- ValueError – If the number of rows in tabular_data and new_columns are not the same.

check_fitted()[source]#

Check if the processor is fitted.

Raises:

SynthesizerProcessorError – If the processor is not fitted.

convert(raw_data: DataFrame) DataFrame[source]#

Convert raw data into processed data.

Parameters:

raw_data (pd.DataFrame) – Raw data

Returns:

Processed data

Return type:

pd.DataFrame

fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])[source]#
fitted = False#
static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame[source]#

Remove specified columns from the input tabular data.

Parameters:
  • tabular_data (-) – Processed tabular data

  • column_name_to_remove (-) – List of column names to be removed

Returns:

Tabular data with specified columns removed

Return type:

  • result_data (pd.DataFrame)

reverse_convert(processed_data: DataFrame) DataFrame[source]#

Convert processed data into raw data.

Parameters:

processed_data (pd.DataFrame) – Processed data

Returns:

Raw data

Return type:

pd.DataFrame