2

I’m using GridDB Cloud with the official Python client (griddb_python) to insert large batches of rows into a TimeSeries container.

I’m calling put_multiple_rows() for efficiency, but occasionally I get a GSException when one or more rows in the batch violate a schema constraint (e.g., NULL in a NOT NULL column or a duplicate primary key).

According to the docs, put_multiple_rows() aborts the entire batch on error. However, I want to:

  • Insert all valid rows in the batch that precede the first error row in order.

  • Skip the invalid rows without re-inserting already committed rows.

  • Retry failed rows individually in a separate error-handling routine.

My current pseudo-code:

import griddb_python as griddb conInfo = griddb.ContainerInfo( name="weather_data", column_info_list=[ ("ts", griddb.Type.TIMESTAMP), ("temperature", griddb.Type.FLOAT) ], type=griddb.ContainerType.TIME_SERIES, row_key=True ) rows = [...] # large batch of tuples try: container.put_multiple_rows(rows) except griddb.GSException as e: # Need a safe, efficient retry mechanism here 

Question:

What’s the correct and efficient way to detect the failing row index in a put_multiple_rows() call in GridDB Cloud’s Python client so I can commit the valid prefix, skip the invalid rows, and retry them individually, without re-inserting already stored rows or breaking TimeSeries ordering?

I’m looking for a concrete explanation of:

  • How i can get the index of the row that caused the batch failure.

  • What is the recommended minimal-overhead pattern to achieve this in Python while using GridDB Cloud.

What I have tried:

  • Checked the GridDB Python API reference but could not find any documented way to retrieve the specific row index causing a put_multiple_rows() failure.

  • Attempted to catch GSException and parse the exception message for clues about the failing row, but the error message only provides a generic description without row-level detail:

except griddb.GSException as e: for i in range(e.get_error_stack_size()): print(f"[{i}] {e.get_error_code(i)}: {e.get_message(i)}") 

This lists the error code and message, but no batch index.

  • Tried splitting the batch manually and inserting row-by-row to identify the failure, but this negates the performance benefits of put_multiple_rows().

  • Experimented with smaller chunk sizes (e.g., 100 rows per batch) to isolate errors faster, but it’s still inefficient for large datasets in GridDB Cloud.

2 Answers 2

1

I don't think that Python API currently provides a way to get failing row indices. The way I can think of is to implement chunked retries. Also currently the GridDB Python client does not provide a direct method to retrieve the index of the failing row in a put_multiple_rows() batch.

The only workaround is to split your data into progressively smaller batches until you isolate the problematic row.

Pseudocode approach:

def safe_batch_insert(container, rows): try: container.put_multiple_rows(rows) except griddb.GSException: if len(rows) == 1: log_error(rows[0]) return mid = len(rows) // 2 safe_batch_insert(container, rows[:mid]) safe_batch_insert(container, rows[mid:]) 

This minimizes re-inserts while still finding the error row efficiently.

-1

The second option could be to Use put_row in a fallback pass after a failed bulk insert.

  • Insert large batches with put_multiple_rows().

  • If a failure occurs, retry only the failed batch with individual put_row() calls, skipping rows that already exist (by catching duplicate key errors).

  • The overhead is minimal if failures are rare.

1
  • This is what i have tried and seems good in cases where the failures are rare. Commented Aug 15 at 9:05

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.