SDG: Synthetic Data Generator ==================================================================== .. raw:: html

Actions Status Documentation Status pre-commit.ci status LICENSE Releases Pre Releases Last Commit Python version contributors slack

The Synthetic Data Generator (SDG) is a specialized framework designed to generate high-quality structured tabular data. It incorporates a wide range of single-table, multi-table data synthesis algorithms and LLM-based synthetic data generation models. Synthetic data, generated by machines using real data, metadata, and algorithms, does not contain any sensitive information, yet it retains the essential characteristics of the original data. There is no direct correlation between synthetic data and real data, making it exempt from privacy regulations such as GDPR and ADPPA. This eliminates the risk of privacy breaches in practical applications. High-quality synthetic data can be safely utilized across various domains including data sharing, model training and debugging, system development and testing, etc. Our CODE/ISSUE/PULL REQUESTS are all hosted on `github `_. Feel free to contact us if you have any questions. Installation ==================================================================== You can install our python package with pip, .. code-block:: bash pip install sdgx Or use pre-built images to quickly experience the latest features. .. code-block:: bash docker pull idsteam/sdgx:latest In order to use the GPU for synthesis, you may need to refer to `Torch's GPU installation guide `_. Quick demo ==================================================================== .. code-block:: python """ Example for CTGAN """ from sdgx.data_connectors.csv_connector import CsvConnector from sdgx.models.ml.single_table.ctgan import CTGANSynthesizerModel from sdgx.synthesizer import Synthesizer from sdgx.utils import download_demo_data # This will download demo data to ./dataset dataset_csv = download_demo_data() # Create data connector for csv file data_connector = CsvConnector(path=dataset_csv) # Initialize synthesizer, use CTGAN model synthesizer = Synthesizer( model=CTGANSynthesizerModel(epochs=1), # For quick demo data_connector=data_connector, ) # Fit the model synthesizer.fit() # Sample sampled_data = synthesizer.sample(1000) print(sampled_data) We provided user guides with lots of examples for researchers, scientists and developers. Learn more if you are interested! .. toctree:: :maxdepth: 3 :titlesonly: User guides More details ==================================================================== .. toctree:: :maxdepth: 3 :titlesonly: Design Developer guides API Reference Indices and tables ==================================================================== * :ref:`genindex` * :ref:`search`