CsvConnector#
- class sdgx.data_connectors.csv_connector.CsvConnector(path, sep=',', header='infer', **read_csv_kwargs)[source]#
Bases:
DataConnectorWraps csv file into DataConnector
- Parameters:
path (str) – Path to csv file
sep (str, optional) – Separator. Defaults to ‘,’.
header (str, optional) – Header. Defaults to ‘infer’.
read_csv_kwargs (dict, optional) – kwargs for pd.read_csv, please refer to https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
Example
from sdgx.data_connectors.csv_connector import CsvConnector connector = CsvConnector( path="data.csv", ) df = connector.read()
- _columns() list[str][source]#
Subclass should implement this for reading columns if there is an efficient way for peaking columns.
See
columnfor more details.
- _iter(offset: int = 0, chunksize: int = 1000) Generator[DataFrame, None, None][source]#
Subclass should implement this for reading data in chunk.
See
iterfor more details.
- _read(offset: int = 0, limit: int | None = None) DataFrame | None[source]#
Subclass must implement this for reading data.
See
readfor more details.
- columns() list[str]#
Interface for peaking columns.
- finalize()#
Finalize the data connector.
- property identity#
Identity of the data source is the sha256 of the file
- iter(offset: int = 0, chunksize: int = 0) Generator[DataFrame, None, None]#
Interface for reading data in chunk.
- Parameters:
offset (int, optional) – Offset for reading. Defaults to 0.
chunksize (int, optional) – Chunksize for reading. Defaults to 0.
- Returns:
Generator/Iterator for readed dataframe
- Return type:
Generator[pd.DataFrame, None, None]
- keys() list[str]#
Same as
columns.
- read(offset: int = 0, limit: int | None = None) DataFrame | None#
Interface for reading data.
- Parameters:
offset (int, optional) – Offset for reading. Defaults to 0.
limit (int, optional) – Limit for reading. Defaults to None. None is for reading all data and 0 is for reading no data(only header).
- Returns:
Readed dataframe
- Return type:
pd.DataFrame