DataConnector#

class sdgx.data_connectors.base.DataConnector[source]#

Bases: object

DataConnector warps data source into pd.DataFrame.

For different data source, implement a specific subclass.

_columns() list[str][source]#

Subclass should implement this for reading columns if there is an efficient way for peaking columns.

See column for more details.

_iter(offset: int = 0, chunksize: int = 0) Generator[DataFrame, None, None][source]#

Subclass should implement this for reading data in chunk.

See iter for more details.

_read(offset: int = 0, limit: int | None = None) DataFrame | None[source]#

Subclass must implement this for reading data.

See read for more details.

columns() list[str][source]#

Interface for peaking columns.

finalize()[source]#

Finalize the data connector.

identity = None#

Identity of data source, e.g. table name, hash of content

iter(offset: int = 0, chunksize: int = 0) Generator[DataFrame, None, None][source]#

Interface for reading data in chunk.

Parameters:
  • offset (int, optional) – Offset for reading. Defaults to 0.

  • chunksize (int, optional) – Chunksize for reading. Defaults to 0.

Returns:

Generator/Iterator for readed dataframe

Return type:

Generator[pd.DataFrame, None, None]

keys() list[str][source]#

Same as columns.

read(offset: int = 0, limit: int | None = None) DataFrame | None[source]#

Interface for reading data.

Parameters:
  • offset (int, optional) – Offset for reading. Defaults to 0.

  • limit (int, optional) – Limit for reading. Defaults to None. None is for reading all data and 0 is for reading no data(only header).

Returns:

Readed dataframe

Return type:

pd.DataFrame