How can I create table from parquet file

Question

Given a parquet file how can I create the table associated with it into my redshift database? Oh the format of the parquet file is snappy.

Amazon Redshift Can Now COPY from Parquet and ORC File Formats — John Rotenstein
– John Rotenstein, Commented Apr 15, 2021 at 1:12
Is your real problem that you don't know what columns are stored in the file? — Parsifal
– Parsifal, Commented Apr 15, 2021 at 12:30
@Parsifal yes, I don't want to guess on the column types but I can't COPY the data unless I create the table first. — BugCatcherJoe
– BugCatcherJoe, Commented Apr 15, 2021 at 14:52

Parsifal · Accepted Answer · 2021-04-16 12:55:11Z

If you're dealing with multiple files, especially over a long term, then I think the best solution is to upload them to an S3 bucket and run a Glue crawler.

In addition to populating the Glue data catalog, you can also use this information to configure external tables for Redshift Spectrum, and create your on-cluster tables using create table as select.

If this is just a one-off task, then I've used parquet-tools in the past. The version that I've used is a Java library, but I see that there's also a version on PyPi.

Collectives™ on Stack Overflow

How can I create table from parquet file

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related