0

Given a parquet file how can I create the table associated with it into my redshift database? Oh the format of the parquet file is snappy.

3
  • Amazon Redshift Can Now COPY from Parquet and ORC File Formats Commented Apr 15, 2021 at 1:12
  • Is your real problem that you don't know what columns are stored in the file? Commented Apr 15, 2021 at 12:30
  • 1
    @Parsifal yes, I don't want to guess on the column types but I can't COPY the data unless I create the table first. Commented Apr 15, 2021 at 14:52

1 Answer 1

3

If you're dealing with multiple files, especially over a long term, then I think the best solution is to upload them to an S3 bucket and run a Glue crawler.

In addition to populating the Glue data catalog, you can also use this information to configure external tables for Redshift Spectrum, and create your on-cluster tables using create table as select.

If this is just a one-off task, then I've used parquet-tools in the past. The version that I've used is a Java library, but I see that there's also a version on PyPi.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.