CsvConnector#

class sdgx.data_connectors.csv_connector.CsvConnector(path, sep=',', header='infer', **read_csv_kwargs)[source]#

Bases: DataConnector

Wraps csv file into DataConnector

Parameters:

Example

from sdgx.data_connectors.csv_connector import CsvConnector
connector = CsvConnector(
    path="data.csv",
)
df = connector.read()
_columns() list[str][source]#

Subclass should implement this for reading columns if there is an efficient way for peaking columns.

See column for more details.

_iter(offset: int = 0, chunksize: int = 1000) Generator[DataFrame, None, None][source]#

Subclass should implement this for reading data in chunk.

See iter for more details.

_read(offset: int = 0, limit: int | None = None) DataFrame | None[source]#

Subclass must implement this for reading data.

See read for more details.

columns() list[str]#

Interface for peaking columns.

finalize()#

Finalize the data connector.

property identity#

Identity of the data source is the sha256 of the file

iter(offset: int = 0, chunksize: int = 0) Generator[DataFrame, None, None]#

Interface for reading data in chunk.

Parameters:
  • offset (int, optional) – Offset for reading. Defaults to 0.

  • chunksize (int, optional) – Chunksize for reading. Defaults to 0.

Returns:

Generator/Iterator for readed dataframe

Return type:

Generator[pd.DataFrame, None, None]

keys() list[str]#

Same as columns.

read(offset: int = 0, limit: int | None = None) DataFrame | None#

Interface for reading data.

Parameters:
  • offset (int, optional) – Offset for reading. Defaults to 0.

  • limit (int, optional) – Limit for reading. Defaults to None. None is for reading all data and 0 is for reading no data(only header).

Returns:

Readed dataframe

Return type:

pd.DataFrame