Numeric#
- class sdgx.data_models.inspectors.numeric.NumericInspector(*args, **kwargs)[source]#
Bases:
InspectorA class for inspecting numeric data.
This class is a subclass of Inspector and is designed to provide methods for inspecting and analyzing numeric data. It includes methods for detecting int or float data type.
In August 2024, we introduced a new feature that will continue to judge the positivity or negativity after determining the type, thereby effectively improving the quality of synthetic data in subsequent processing.
- _inspect_level: int = 10#
Private variable used to store property inspect_level’s value.
- _is_int_column(col_series: Series) bool[source]#
Determine if a column contains predominantly integer values.
This method checks if the proportion of integer values in the given column exceeds a predefined threshold.
- Parameters:
col_series (pd.Series) – The column series to be inspected.
- Returns:
True if the column is predominantly integer, False otherwise.
- Return type:
bool
- _is_negative_column(col_series: Series) bool[source]#
Determine if a column contains predominantly negative values.
This method checks if the proportion of negative values in the given column exceeds a predefined threshold.
- Parameters:
col_series (pd.Series) – The column series to be inspected.
- Returns:
True if the column is predominantly negative, False otherwise.
- Return type:
bool
- _is_positive_column(col_series: Series) bool[source]#
Determine if a column contains predominantly positive values.
This method checks if the proportion of positive values in the given column exceeds a predefined threshold.
- Parameters:
col_series (pd.Series) – The column series to be inspected.
- Returns:
True if the column is predominantly positive, False otherwise.
- Return type:
bool
- _is_positive_or_negative_column(col_series: Series, threshold: float, comparison_func) bool[source]#
Determine if a column contains predominantly positive or negative values.
This method checks if the proportion of values that satisfy a given comparison function exceeds a predefined threshold.
- Parameters:
col_series (pd.Series) – The column series to be inspected.
threshold (float) – The proportion threshold for considering the column as positive or negative.
comparison_func (function) – A function that takes a numeric value and returns a boolean.
- Returns:
True if the column satisfies the condition, False otherwise.
- Return type:
bool
- fit(raw_data: DataFrame, *args, **kwargs)[source]#
Fit the inspector.
Gets the list of discrete columns from the raw data.
- Parameters:
raw_data (pd.DataFrame) – Raw data
- float_columns: set = {}#
A set of column names that contain float values.
- property inspect_level#
the email column may be recognized as email, but it may also be recognized as the id column, and it may also be recognized by different inspectors at the same time identified as a discrete column, which will cause confusion in subsequent processing), the inspect_leve is used when determining the specific type of a column.
We will preset different inspector levels for different inspectors, usually more specific inspectors will get higher levels, and general inspectors (like discrete) will have inspect_level.
The value of the variable inspect_level is limited to 1-100. In baseclass and bool, discrete and numeric types, the inspect_level is set to 10. For datetime and id types, the inspect_level is set to 20.
Current inspect_level value will make it easier for developers to insert a custom inspector from the middle.
- Type:
Inspected level is a concept newly introduced in version 0.1.6. Since a single column in the table may be marked by different inspectors at the same time (for example
- int_columns: set = {}#
A set of column names that contain integer values.
- negative_columns: set = {}#
A set of column names that contain only negative numeric values.
- negative_threshold: float = 0.95#
The threshold proportion of negative values in a column to consider it as a negative column.
- pii = False#
PII refers if a column contains private or sensitive information.
- pos_threshold: float = 0.95#
The threshold proportion of positive values in a column to consider it as a positive column.
- positive_columns: set = {}#
A set of column names that contain only positive numeric values.
- ready: bool = False#
Indicates whether the inspector has completed its inference.
When completed, ready == True.