1

When attempting to extract data from a BigQuery table requiring a partition filter, the extract job fails.

Here is a simple example creating a table and running an extract job.

package com.example; import com.google.cloud.bigquery.*; public class BigQueryExtractTest { private static final String PROJECT_ID = "my-project-id"; private static final String DATASET_ID = "test_dataset"; private static final String GCS_LOCATION = "gs://my-bucket/path/to/files/part-*"; public static void main(String[] args) throws Exception { // create BigQuery client BigQuery bigQuery = BigQueryOptions.newBuilder().setProjectId(PROJECT_ID).build().getService(); // create dataset and table that requires partition filter bigQuery.create(DatasetInfo.of(DATASET_ID)); bigQuery.query(QueryJobConfiguration.of( String.format("CREATE TABLE %s.table1 (\n", DATASET_ID) + "stringColumn STRING,\n" + "timeColumn TIMESTAMP\n" + ") PARTITION BY DATE(timeColumn)\n" + "OPTIONS(\n" + "require_partition_filter=true\n" + ")")); // extract table Job job = bigQuery.getTable(TableId.of(DATASET_ID, "table1")) .extract("NEWLINE_DELIMITED_JSON", GCS_LOCATION) .waitFor(); // throw exception on error if (job != null && job.getStatus().getError() != null) { throw new Exception(job.getStatus().getError().toString()); } } } 

The code snippet above produces the following error

Exception in thread "main" java.lang.Exception: BigQueryError{reason=invalidQuery, location=query, message=Cannot query over table 'my-project-id.test_dataset.table1' without a filter that can be used for partition elimination} at com.example.BigQueryExtractTest.main(BigQueryExtractTest.java:34) 

The google-cloud-bigquery maven dependency used for this example is shown below.

<dependency> <groupId>com.google.cloud</groupId> <artifactId>google-cloud-bigquery</artifactId> <version>1.23.0</version> </dependency> 

The example also threw the exception using the dependency version 0.34.0-beta

How can a partition filter be specified when running an extract job?

5
  • Sounds like a bug. Can you submit a bug report with this information (or link to this post?) As a temporary workaround, you can maybe toggle the setting off, extract, and then turn the setting back on. Commented Mar 27, 2018 at 21:35
  • Thanks @ElliottBrossard . Just submitted https://issuetracker.google.com/issues/76453364. How can the require partition filter setting be toggled on/off? Commented Mar 27, 2018 at 21:41
  • This worked for me: bq update --time_partitioning_type=DAY --norequire_partition_filter <table name>. I think you need to specify --time_partitioning_field instead if you're using column-based partitioning though. Commented Mar 27, 2018 at 21:45
  • I added this information as an answer below to make it more obvious in case someone else hits this same problem. Commented Mar 27, 2018 at 22:00
  • yep that worked @ElliottBrossard. thanks! Commented Mar 27, 2018 at 22:07

1 Answer 1

3

This is a bug, and there is now a bug report tracking the problem. To work around the limitation, you can update the table to allow queries without a partition filter using the bq command-line tool, perform the export, and then update the table to require it again. For example, with an ingestion-time partitioned table:

bq update --time_partitioning_type=DAY --norequire_partition_filter \ <dataset_name>.<table name> ... (perform the export) bq update --time_partitioning_type=DAY --require_partition_filter \ <dataset_name>.<table name> ... (resume querying the table) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.