Command Line Interface#

Command Line Interface(CLI) is designed to simplify the usage of SDG and enable other programs to use SDG in a more convenient way.

There are tow main commands in the CLI:

  • fit: For fitting, finetuning, retraining… the model, which will save the final model to a specified path.

  • sample: Load existing model and sample synthetic data.

And as SDG supports plug-in system, users can list all available via list-{component} command.

Note

If you want to use SDG as a library, please refer to Use Synthetic Data Generator as a library.

If you want to extend SDG with your own components, please refer to Developer guides for extension.

CLI for synthetic single-table data#

sdgx#

sdgx [OPTIONS] COMMAND [ARGS]...

fit#

Fit the synthesizer or load a synthesizer for fitnuning/retraining/continue training…

sdgx fit [OPTIONS]

Options

--torchrun <torchrun>#

Use torchrun to run cli.

--torchrun_kwargs <torchrun_kwargs>#

[Json String] torchrun kwargs.

--save_dir <save_dir>#

Required The directory to save the synthesizer

--model <model>#

Required The name of the model.

--model_path <model_path>#

The path of the model to load

--model_kwargs <model_kwargs>#

[Json String] The kwargs of the model for initialization

--load_dir <load_dir>#

The directory to load the synthesizer, if it is specified, model_path will be ignored.

--metadata_path <metadata_path>#

The path of the metadata to load

--data_connector <data_connector>#

The name of the data connector to use

--data_connector_kwargs <data_connector_kwargs>#

[Json String] The kwargs of the data connector to use

--raw_data_loaders_kwargs <raw_data_loaders_kwargs>#

[Json String] The kwargs of the raw data loader to use

--processed_data_loaders_kwargs <processed_data_loaders_kwargs>#

[Json String] The kwargs of the processed data loader to use

--data_processors <data_processors>#

[Comma separated list] The name of the data processors to use, e.g. ‘processor_x,processor_y’

--data_processors_kwargs <data_processors_kwargs>#

[Json String] The kwargs of the data processors to use

--inspector_max_chunk <inspector_max_chunk>#

The max chunk of the inspector to load

--metadata_include_inspectors <metadata_include_inspectors>#

[Comma separated list] The name of the inspectors to include, e.g. ‘inspector_x,inspector_y’

--metadata_exclude_inspectors <metadata_exclude_inspectors>#

[Comma separated list] The name of the inspectors to exclude, e.g. ‘inspector_x,inspector_y’

--inspector_init_kwargs <inspector_init_kwargs>#

[Json String] The kwargs of the inspector to use

--model_fit_kwargs <model_fit_kwargs>#

[Json String] The kwargs of the model fit method

--dry_run <dry_run>#

Only init the synthesizer without fitting and save.

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.

list-cachers#

sdgx list-cachers [OPTIONS]

Options

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.

list-data-connectors#

sdgx list-data-connectors [OPTIONS]

Options

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.

list-data-exporters#

sdgx list-data-exporters [OPTIONS]

Options

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.

list-data-processors#

sdgx list-data-processors [OPTIONS]

Options

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.

list-models#

sdgx list-models [OPTIONS]

Options

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.

sample#

Load a synthesizer and sample.

load_dir should contain model and metadata. Please check Synthesizer’s load method for more details.

sdgx sample [OPTIONS]

Options

--torchrun <torchrun>#

Use torchrun to run cli.

--torchrun_kwargs <torchrun_kwargs>#

[Json String] torchrun kwargs.

--load_dir <load_dir>#

Required The directory to load the synthesizer.

--model <model>#

Required The name of the model.

--raw_data_loaders_kwargs <raw_data_loaders_kwargs>#

[Json String] The kwargs of the raw data loaders.

--processed_data_loaders_kwargs <processed_data_loaders_kwargs>#

[Json String] The kwargs of the processed data loaders.

--data_processors <data_processors>#

[Comma separated list] The name of the data processors, e.g. ‘data_processor_1,data_processor_2’.

--data_processors_kwargs <data_processors_kwargs>#

[Json String] The kwargs of the data processors.

--count <count>#

The number of samples to generate.

--chunksize <chunksize>#

The size of each chunk. If count is very large, chunksize is recommended.

--model_sample_args <model_sample_args>#

[Json String] The kwargs of the model.sample.

--data_exporter <data_exporter>#

Required The name of the data exporter.

--data_exporter_kwargs <data_exporter_kwargs>#

[Json String] The kwargs of the data exporter.

--export_dst <export_dst>#

The destination of the exported data.

--dry_run <dry_run>#

Dry run. Only initialize the synthesizer without sampling.

--json_output <json_output>#

Exit with json output.

--log_to_file <log_to_file>#

Log to file.