Datetime#
- class sdgx.data_models.inspectors.datetime.DatetimeInspector(user_formats: list[str] | None = None, *args, **kwargs)[source]#
Bases:
Inspector- PRESET_FORMAT_STRINGS = ['%Y-%m-%d', '%d %b %Y', '%b-%Y', '%Y/%m/%d']#
- _format_match_rate = 0.9#
When specifically check the datatime format, problems caused by missing values and incorrect values will inevitably occur. To fix this, we discard the .any() method and use the match_rate to increase the robustness of this inspector.
- _inspect_level: int = 20#
The inspect_level of DatetimeInspector is higher than DiscreteInspector.
Often, difficult-to-recognize date or datetime objects are also recognized as descrete types by DatetimeInspector, causing the column to be marked repeatedly.
- classmethod can_convert_to_datetime(input_col: Series)[source]#
Whether a df column can be converted to datetime.
- Parameters:
input_col (pd.Series) – A column of a dataframe.
- detect_datetime_format(series: Series)[source]#
Detects the datetime format of a pandas Series.
This method iterates over a list of user-defined and preset datetime formats, and attempts to parse each date in the series using each format. If all dates in the series can be successfully parsed with a format, that format is returned. If no format can parse all dates, an empty string is returned.
- Parameters:
series (pd.Series) – The pandas Series to detect the datetime format of.
- Returns:
The datetime format that can parse all dates in the series, or None if no such format is found.
- Return type:
str
- fit(raw_data: DataFrame, *args, **kwargs)[source]#
Fit the inspector.
Gets the list of discrete columns from the raw data.
- Parameters:
raw_data (pd.DataFrame) – Raw data
- property inspect_level#
the email column may be recognized as email, but it may also be recognized as the id column, and it may also be recognized by different inspectors at the same time identified as a discrete column, which will cause confusion in subsequent processing), the inspect_leve is used when determining the specific type of a column.
We will preset different inspector levels for different inspectors, usually more specific inspectors will get higher levels, and general inspectors (like discrete) will have inspect_level.
The value of the variable inspect_level is limited to 1-100. In baseclass and bool, discrete and numeric types, the inspect_level is set to 10. For datetime and id types, the inspect_level is set to 20.
Current inspect_level value will make it easier for developers to insert a custom inspector from the middle.
- Type:
Inspected level is a concept newly introduced in version 0.1.6. Since a single column in the table may be marked by different inspectors at the same time (for example
- pii = False#
PII refers if a column contains private or sensitive information.
- ready: bool = False#
Indicates whether the inspector has completed its inference.
When completed, ready == True.