0

I would like to export data from an Amazon Redshift table into an external table stored in Amazon S3. Every hour, I want to export rows from the Redshift source into the external table target.

What kind of options exist in AWS to achieve this?

I know that there is the UNLOAD command that allows me to export data to S3, but I think it would not work to store the data into an external table (which is partitioned too). Or is Amazon EMR probably the only method to get this working?

2 Answers 2

1

Amazon Redshift Spectrum external tables are read-only. You cannot update them from Redshift (eg via INSERT commands).

Therefore, you would need a method to create the files directly in S3.

UNLOAD can certainly do this, but it cannot save the data in a partition structure.

Amazon EMR would, indeed, be a good option. These days it is charged per-second, so it would only need to run long enough to export the data. You could use your preferred tool (eg Hive or Spark) to export the data from Redshift, then write it into a partitioned external table.

For example, see: Powering Amazon Redshift Analytics with Apache Spark and Amazon Machine Learning | AWS Big Data Blog

spark-redshift

Another option might be AWS Glue. I'm not too familiar with it, but it can output into partitions, so this might be an even easier method to accomplish your goal!

See: Managing Partitions for ETL Output in AWS Glue - AWS Glue

Sign up to request clarification or add additional context in comments.

1 Comment

thx for the hints I guess I will go for the AWS Glue option. Might be possible to get some lines of spark code running with it.
1

Its now possible to Insert into external tsble , since June 2020 i think: https://aws.amazon.com/about-aws/whats-new/2020/06/amazon-redshift-now-supports-writing-to-external-tables-in-amazon-s3/

And heres documentation: https://docs.aws.amazon.com/redshift/latest/dg/r_INSERT_external_table.html

Basically theres 2 ways:

  • INSERT INTO external_schema.table_name { select_statement } Or
  • CREATE EXTERNAL TABLE AS { SELECT }

Typically you specify in redshift external schema of yours (ex my_stg) the glu database name, so any external table you create inside redshift external schema already knows glue catalog database name. Thats good news since op question is from 2018 👍

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.