Datetime Formatter#
- class sdgx.data_processors.formatters.datetime.DatetimeFormatter[source]#
Bases:
FormatterA class for formatting datetime columns in a pandas DataFrame.
DatetimeFormatter is designed to handle the conversion of datetime columns to timestamp format and vice versa. It uses metadata to identify datetime columns and their corresponding datetime formats.
- datetime_columns#
List of column names that are of datetime type.
- Type:
list
- datetime_formats#
Dictionary with column names as keys and datetime formats as values.
- Type:
dict
- dead_columns#
List of column names that are no longer needed or to be removed.
- Type:
list
- fitted#
Indicates whether the formatter has been fitted.
- Type:
bool
- fit(metadata
Metadata | None = None, **kwargs: dict[str, Any]): Fits the formatter by recording the datetime columns and their formats.
- convert(raw_data
pd.DataFrame) -> pd.DataFrame: Converts datetime columns in raw_data to timestamp format.
- reverse_convert(processed_data
pd.DataFrame) -> pd.DataFrame: Converts timestamp columns in processed_data back to datetime format.
- _fit(metadata: Metadata | None = None, **kwargs: Dict[str, Any])#
Fit the data processor.
Called before
convertandreverse_convert.- Parameters:
metadata (Metadata, optional) – Metadata. Defaults to None.
- static attach_columns(tabular_data: DataFrame, new_columns: DataFrame) DataFrame#
Attach additional columns to an existing DataFrame.
- Parameters:
tabular_data (-) – The original DataFrame.
new_columns (-) – The DataFrame containing additional columns to be attached.
- Returns:
The DataFrame with new_columns attached.
- Return type:
result_data (pd.DataFrame)
- Raises:
- ValueError – If the number of rows in tabular_data and new_columns are not the same.
- check_fitted()#
Check if the processor is fitted.
- Raises:
SynthesizerProcessorError – If the processor is not fitted.
- convert(raw_data: DataFrame) DataFrame[source]#
Convert method to convert datetime samples into timestamp.
- Parameters:
raw_data (-) – Unprocessed table data
- static convert_datetime_columns(datetime_column_list, datetime_formats, processed_data)[source]#
Convert datetime columns in processed_data from string to timestamp (int)
- Parameters:
datetime_column_list (-) – List of columns that are date time type
processed_data (-) – Processed table data
- Returns:
Processed table data with datetime columns converted to timestamp
- Return type:
result_data (pd.DataFrame)
- static convert_timestamp_to_datetime(timestamp_column_list, format_dict, processed_data)[source]#
Convert timestamp columns to datetime format in a DataFrame.
- Parameters:
timestamp_column_list (-) – List of column names in the DataFrame which are of timestamp type.
datetime_column_dict (-) – Dictionary with column names as keys and datetime format as values.
processed_data (-) – DataFrame containing the processed data.
- Returns:
DataFrame with timestamp columns converted to datetime format.
- Return type:
result_data (pd.DataFrame)
Todo
if the value <0, the result will be No Datetime, try to fix it.
- datetime_columns: list#
List to store the columns that are of datetime type.
- datetime_formats: Dict#
Dictionary to store the datetime formats for each column, with default value as an empty string.
- dead_columns: list#
List to store columns that are no longer needed or to be removed.
- fit(metadata: Metadata | None = None, **kwargs: dict[str, Any])[source]#
Fit method for datetime formatter, the datetime column and datetime format need to be recorded.
If there is a column without format, the default format will be used for output (this may cause some problems).
Formatter need to use metadata to record which columns belong to datetime type, and convert timestamp back to datetime type during post-processing.
- fitted = False#
- static remove_columns(tabular_data: DataFrame, column_name_to_remove: list) DataFrame#
Remove specified columns from the input tabular data.
- Parameters:
tabular_data (-) – Processed tabular data
column_name_to_remove (-) – List of column names to be removed
- Returns:
Tabular data with specified columns removed
- Return type:
result_data (pd.DataFrame)