- English
- Русский
SDVG (Synthetic Data Values Generator) is a tool for generating synthetic data. It supports various run modes, data types for generation, and output formats.
Run modes:
- CLI - generate data, create configs, and validate them via the console;
- HTTP server - accepts generation requests through an HTTP API.
Data types:
- strings (english, russian);
- integers and floating-point numbers;
- dates with timestamps;
- UUID.
String subtypes:
- random strings;
- texts;
- first names;
- last names;
- phone numbers;
- patterns.
Each data type can be generated with the following options:
- specify percentage/number of unique values per column;
- ordered generation (sequence);
- foreign key reference;
- idempotent generation using a seed number;
- value generation from ranges with percentage-based distribution.
Output formats:
- devnull;
- CSV files;
- Parquet files;
- HTTP API;
- Tarantool Column Store HTTP API.
Here's an example of a data model that generates 10,000 user rows and writes them to a CSV file:
output: type: csv models: user: rows_count: 10000 columns: - name: id type: uuid - name: name type: string type_params: logical_type: first_nameSave this as simple_model.yml, then run:
./sdvg generate simple_model.ymlThis will create a CSV file with fake user data like id and name:
id,name c8a53cfd-1089-4154-9627-560fbbea2fef,Sutherlan b5c024f8-3f6f-43d3-b021-0bb2305cc680,Hilton 5adf8218-7b53-41bb-873d-c5768ca6afa2,Craggy ... To launch the generator in interactive mode:
./sdvgTo view available commands and arguments:
./sdvg -h ./sdvg --help ./sdvg generate -hMore information can be found in the user guide.