insta-integration - Integration Testing

Automated integration tests for any application/job.

Spin up any external services
Generate production-like data
Run data validations to ensure application/job works as expected

Problems it can help with:

Unreliable test environments
Dependencies on other teams
Simulate complex data flows

Usage

CLI

Install via npm install -g insta-integration
Create YAML file insta-integration.yaml to define your integration tests
1. Examples can be found here.
2. Use JSON schema to help guide you on available options
Run insta-integration

GitHub Action

Create YAML file .github/workflows/integration-test.yaml

name: Integration Test on: push: branches: - * jobs: integration-test: name: Integration Test runs-on: ubuntu-latest steps: - name: Run integration tests uses: data-catering/insta-integration@v3 with: data_caterer_version: 0.17.3

Create YAML file insta-integration.yaml to define your integration tests
1. Examples can be found here.
2. Use JSON schema to help guide you on available options
Push your code and the GitHub Action will run

Services

The following services are available to run alongside your application/job.

Click here

Service Type	Service	Supported
Change Data Capture	debezium	✅
Database	cassandra	✅
Database	cockroachdb	✅
Database	elasticsearch	✅
Database	mariadb	✅
Database	mongodb	✅
Database	mssql	✅
Database	mysql	✅
Database	neo4j	✅
Database	postgres	✅
Database	spanner	✅
Database	sqlite	✅
Database	opensearch	❌
Data Catalog	marquez	✅
Data Catalog	unitycatalog	✅
Data Catalog	amundsen	❌
Data Catalog	datahub	❌
Data Catalog	openmetadata	❌
Distributed Coordination	zookeeper	✅
Distributed Data Processing	flink	✅
HTTP	httpbin	✅
Identity Management	keycloak	✅
Job Orchestrator	airflow	✅
Job Orchestrator	dagster	✅
Job Orchestrator	mage-ai	✅
Job Orchestrator	prefect	✅
Messaging	activemq	✅
Messaging	kafka	✅
Messaging	rabbitmq	✅
Messaging	solace	✅
Notebook	jupyter	✅
Object Storage	minio	✅
Query Engine	duckdb	✅
Query Engine	flight-sql	✅
Query Engine	presto	✅
Query Engine	trino	✅
Real-time OLAP	clickhouse	✅
Real-time OLAP	doris	✅
Real-time OLAP	druid	✅
Real-time OLAP	pinot	✅
Test Data Management	data-caterer	✅
Workflow	temporal	✅

Generation and Validation

Since it uses data-caterer behind the scenes to help with data generation and validation, check the following pages for discovering what options are available.

Data Sources

The following data sources are available to generate/validate data.

Click here

Data Source Type	Data Source	Support
Cloud Storage	AWS S3	✅
Cloud Storage	Azure Blob Storage	✅
Cloud Storage	GCP Cloud Storage	✅
Database	BigQuery	✅
Database	Cassandra	✅
Database	MySQL	✅
Database	Postgres	✅
Database	Elasticsearch	❌
Database	MongoDB	❌
Database	Opensearch	❌
File	CSV	✅
File	Delta Lake	✅
File	JSON	✅
File	Iceberg	✅
File	ORC	✅
File	Parquet	✅
File	Hudi	❌
HTTP	REST API	✅
Messaging	Kafka	✅
Messaging	RabbitMQ	✅
Messaging	Solace	✅
Messaging	ActiveMQ	❌
Messaging	Pulsar	❌
Metadata	Data Contract CLI	✅
Metadata	Great Expectations	✅
Metadata	Marquez	✅
Metadata	OpenAPI/Swagger	✅
Metadata	OpenMetadata	✅
Metadata	Open Data Contract Standard (ODCS)	✅
Metadata	Amundsen	❌
Metadata	Datahub	❌
Metadata	Solace Event Portal	❌

Examples

Simple Example

services: [] run: - command: ./my-app/run-app.sh test: generation: parquet: - options: path: /tmp/parquet/accounts fields: - name: account_id validation: parquet: - options: path: /tmp/parquet/accounts validations: - expr: ISNOTNULL(account_id) - aggType: count aggExpr: count == 1000

Full Example

services: - name: postgres #define external services data: my-data/sql #initial service setup (i.e. schema/tables, topics, queues) run: - command: ./my-app/run-postgres-extract-app.sh #how to run your application/job env: #environment variables for your application/job POSTGRES_URL: jdbc:postgresql://postgres:5432/docker test: env: #environment variables for data generation/validation POSTGRES_URL: jdbc:postgresql://postgres:5432/docker mount: #volume mount for data validation - ${PWD}/example/my-app/shared/generated:/opt/app/shared/generated relationship: #generate data with same values used across different data sources postgres_balance.account_number: #ensure account_number in balance table exists when transaction created - postgres_transaction.account_number generation: #define data sources for data generation postgres: - name: postgres_transaction #give it a name to use in relationship definition options: #configuration on specific data source dbtable: account.transactions count: #how many records to generate (1,000 by default) perField: #generate 5 records per account_number fieldNames: [account_number] count: 5 fields: #fields of the data source - name: account_number #default data type is string - name: create_time type: timestamp - name: transaction_id - name: amount type: double - name: postgres_balance options: dbtable: account.balances fields: - name: account_number options: #additional metadata for data generation isUnique: true regex: ACC[0-9]{10} - name: create_time type: timestamp - name: account_status options: oneOf: [open, closed] - name: balance type: double validation: csv: #define data source for data validations - options: path: /opt/app/shared/generated/balances.csv header: true validations: #list of validation to run, can be basic SQL, aggregations, upstream data source or column name validations - expr: ISNOTNULL(account_number) - aggType: count aggExpr: count == 1000 - options: path: /opt/app/shared/generated/transactions.csv header: true validations: - expr: ISNOTNULL(account_number) - aggType: count aggExpr: count == 5000 - groupByCols: [account_number] aggType: count aggExpr: count == 5

GitHub Action Options

Input

Optional configurations to alter the files and folders used by the GitHub Action can be found below.

Name	Description	Default
configuration_file	File path to configuration file	`insta-integration.yaml`
insta_infra_folder	Folder path to insta-infra (this repository)	`${HOME}/.insta-integration/insta-infra`
base_folder	Folder path to use for execution files	`${HOME}/.insta-integration`
data_caterer_version	Version of data-caterer Docker image	`0.17.3`

To use these configurations, alter your .github/workflows/integration-test.yaml.

name: Integration Test on: push: branches: - * jobs: integration-test: name: Integration Test runs-on: ubuntu-latest steps: - name: Run integration tests uses: data-catering/insta-integration@v1 with: configuration_file: my/custom/folder/insta-integration.yaml insta_infra_folder: insta-infra/folder base_folder: execution/folder data_caterer_version: 0.17.3

Output

If you want to use the output of the GitHub Action, the following attributes are available:

Name	Description
num_records_generated	Total number of records generated.
num_success_validations	Total number of successful validations.
num_failed_validations	Total number of failed validations.
num_validations	Total number of validations.
validation_success_rate	Success rate of validations (i.e. 0.75 = 75% success rate).
full_result	All result details as JSON (data generation and validation).

For example, you can print out the results like below:

- name: Run integration tests id: test-action uses: data-catering/insta-integration@v6 - name: Print Output id: output run: |  echo "Records generated: ${{ steps.test-action.outputs.num_records_generated }}"  echo "Successful validations: ${{ steps.test-action.outputs.num_success_validations }}"  echo "Failed validations: ${{ steps.test-action.outputs.num_failed_validations }}"  echo "Number of validations: ${{ steps.test-action.outputs.num_validations }}"  echo "Validation success rate: ${{ steps.test-action.outputs.validation_success_rate }}"

JSON Schema for insta-integration.yaml

A JSON Schema has been created to help guide users on what is possible in the insta-integration.yaml. The links below show how you can import the schema in your favourite IDE:

Validate JSON Schema

Using the following tool ajv.

Validate the JSON Schema:

ajv compile --spec=draft2019 -s schema/insta-integration-config-latest.json

Validate YAML file

You can run npm run validate-yaml and it will validate all the YAML files under the examples directory.

Otherwise, if you have a different pathway, validate via ajv:

ajv validate --spec=draft2019 -s schema/insta-integration-config-latest.json -d example/postgres-to-csv.yaml

Example Flows

Examples can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 365 Commits
.devcontainer		.devcontainer
.github		.github
__tests__		__tests__
badges		badges
dist		dist
example		example
schema		schema
src		src
.dockerignore		.dockerignore
.eslintignore		.eslintignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.node-version		.node-version
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENSE		LICENSE
README.Docker.md		README.Docker.md
README.md		README.md
action.yml		action.yml
compose.yaml		compose.yaml
insta-integration.yaml		insta-integration.yaml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

insta-integration - Integration Testing

Usage

CLI

GitHub Action

Services

Generation and Validation

Data Sources

Examples

Simple Example

Full Example

GitHub Action Options

Input

Output

JSON Schema for insta-integration.yaml

Validate JSON Schema

Validate YAML file

Example Flows

About

Uh oh!

Releases 6

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

insta-integration - Integration Testing

Usage

CLI

GitHub Action

Services

Generation and Validation

Data Sources

Examples

Simple Example

Full Example

GitHub Action Options

Input

Output

JSON Schema for insta-integration.yaml

Validate JSON Schema

Validate YAML file

Example Flows

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages