Transition to business glossary on Dataplex Universal Catalog

This document provides instructions for migrating in a single step from the preview version of business glossary, which supported Data Catalog metadata, to the generally available version of business glossary, which supports Dataplex Universal Catalog metadata.

Before you begin

  1. Install gcloud or python packages. Authenticate your user account and the Application Default Credentials (ADC) that the Python libraries use. Run the following commands and follow the browser-based prompts:

    gcloud init gcloud auth login gcloud auth application-default login 
  2. Enable the following APIs:

  3. Create one or several Cloud Storage buckets in any of your projects. The buckets will be used as a temporary location for the import files. The more buckets you provide, the faster the import is. Grant the Storage Admin IAM role to the service account running the migration:

    service-MIGRATION_PROJECT_ID@gcp-sa-dataplex.iam.gserviceaccount.com

    Replace MIGRATION_PROJECT_ID with the project from which you are migrating the glossaries.

  4. Set up the repository:

    1. Clone the repository:

      git clone https://github.com/GoogleCloudPlatform/dataplex-labs.git cd dataplex-labs/dataplex-quickstart-labs/00-resources/scripts/python/business-glossary-import 
    2. Install the required packages:

      pip3 install -r requirements.txt cd migration 

Required roles

Run the migration script

python3 run.py --project=MIGRATION_PROJECT_ID --user-project=USER_PROJECT_ID --buckets=BUCKET1,BUCKET2

Replace the following:

  • USER_PROJECT_ID: the project ID of the project to be migrated.
  • BUCKET1 and BUCKET2: the Cloud Storage bucket IDs to be used for the import.

Scope glossaries in migration

To migrate only specific glossaries, define their scope by providing their respective URLs.

python3 run.py --project=MIGRATION_PROJECT_ID --user-project=USER_PROJECT_ID --buckets=BUCKET1,BUCKET2 --glossaries="GLOSSARY_URL1","GLOSSARY_URL2"

Replace GLOSSARY_URL1 (and GLOSSARY_URL2) with the URLs of the glossaries you are migrating.

Resume migration for import job failures

The presence of files after the migration indicates that some import jobs have failed. To resume the migration, run the following command:

python3 run.py --project=MIGRATION_PROJECT_ID --user-project=USER_PROJECT_ID --buckets=BUCKET1,BUCKET2 --resume-import