Datetime Formatter#

class sdgx.data_processors.formatters.datetime.DatetimeFormatter[source]#

Bases: Formatter

A class for formatting datetime columns in a pandas DataFrame.

DatetimeFormatter is designed to handle the conversion of datetime columns to timestamp format and vice versa. It uses metadata to identify datetime columns and their corresponding datetime formats.

datetime_columns#

List of column names that are of datetime type.

Type:

list

datetime_formats#

Dictionary with column names as keys and datetime formats as values.

Type:

dict

dead_columns#

List of column names that are no longer needed or to be removed.

Type:

list

fitted#

Indicates whether the formatter has been fitted.

Type:

bool

fit(metadata

Metadata | None = None, **kwargs: dict[str, Any]): Fits the formatter by recording the datetime columns and their formats.

convert(raw_data

pd.DataFrame) -> pd.DataFrame: Converts datetime columns in raw_data to timestamp format.

reverse_convert(processed_data

pd.DataFrame) -> pd.DataFrame: Converts timestamp columns in processed_data back to datetime format.

_fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#

Fit the data processor.

Called before convert and reverse_convert.

Parameters:

metadata (Metadata, optional) – Metadata. Defaults to None.

static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#

Attach additional columns to an existing DataFrame.

Parameters:
  • tabular_data (-) – The original DataFrame.

  • new_columns (-) – The DataFrame containing additional columns to be attached.

Returns:

The DataFrame with new_columns attached.

Return type:

  • result_data (pd.DataFrame)

Raises:

- ValueError – If the number of rows in tabular_data and new_columns are not the same.

check_fitted()#

Check if the processor is fitted.

Raises:

SynthesizerProcessorError – If the processor is not fitted.

convert(raw_data: DataFrame) DataFrame[source]#

Convert method to convert datetime samples into timestamp.

Parameters:

raw_data (-) – Unprocessed table data

static convert_datetime_columns(datetime_column_list, datetime_formats, processed_data)[source]#

Convert datetime columns in processed_data from string to timestamp (int)

Parameters:
  • datetime_column_list (-) – List of columns that are date time type

  • processed_data (-) – Processed table data

Returns:

Processed table data with datetime columns converted to timestamp

Return type:

  • result_data (pd.DataFrame)

static convert_timestamp_to_datetime(timestamp_column_list, format_dict, processed_data)[source]#

Convert timestamp columns to datetime format in a DataFrame.

Parameters:
  • timestamp_column_list (-) – List of column names in the DataFrame which are of timestamp type.

  • datetime_column_dict (-) – Dictionary with column names as keys and datetime format as values.

  • processed_data (-) – DataFrame containing the processed data.

Returns:

DataFrame with timestamp columns converted to datetime format.

Return type:

  • result_data (pd.DataFrame)

Todo

if the value <0, the result will be No Datetime, try to fix it.

datetime_columns: list#

List to store the columns that are of datetime type.

datetime_formats: Dict#

Dictionary to store the datetime formats for each column, with default value as an empty string.

dead_columns: list#

List to store columns that are no longer needed or to be removed.

fit(metadata: Metadata | None = None, **kwargs: dict[str, Any])[source]#

Fit method for datetime formatter, the datetime column and datetime format need to be recorded.

If there is a column without format, the default format will be used for output (this may cause some problems).

Formatter need to use metadata to record which columns belong to datetime type, and convert timestamp back to datetime type during post-processing.

fitted = False#
static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#

Remove specified columns from the input tabular data.

Parameters:
  • tabular_data (-) – Processed tabular data

  • column_name_to_remove (-) – List of column names to be removed

Returns:

Tabular data with specified columns removed

Return type:

  • result_data (pd.DataFrame)

reverse_convert(processed_data: DataFrame) DataFrame[source]#

reverse_convert method for datetime formatter.

Does not require any action.