Subscribe to RSS

Question 1

Issue: Flink application throws Thread 'jobmanager-io-thread-25' produced an uncaught exception. java.lang.OutOfMemoryError: Direct buffer memory and terminates after running for 2-3 days. No matter ...

Question 2

I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior). When I ...

Question 3

I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...

Question 4

I am connecting to an EMR cluster through SageMaker Unified Studio(JupyterLab). My EMR cluster is configured with Delta Lake support, and I have the following Spark properties set on the cluster: ...

Question 5

I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...

Question 6

I'm trying to connect to an existing EMR cluster from SageMaker Unified Studio to run SQL queries via JupyterLab. SageMaker requires that the EMR cluster be runtime role-enabled to integrate with ...

Question 7

I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...

Question 8

I have successfully implemented the IBM S3 Shuffle Plugin v0.9.6 (https://github.com/IBM/spark-s3-shuffle) on EMR on EKS (Spark 3.5.0) and the shuffle operations are working correctly with S3 storage. ...

Question 9

I am writing data into s3 and table format is Iceberg in Glue Catalog. I see the /data and /metadata folders are getting created. However when I am writing data, it's creating 001/002 kind of folders. ...

Question 10

I want to install external Python packages on EMR with an EC2 setup, but currently, apart from bootstrap actions, nothing else seems to be working. The problem with this setup is that if I want to ...

Question 11

Having trouble getting dynamic allocation to properly terminate idle executors when using FSx Lustre for shuffle persistence on EMR 7.8 (Spark 3.5.4) on EKS. Trying this strategy out to battle cost ...

Question 12

I am exploring data write into glue Table (Iceberg Table format). I have been using saveAsTable method mentioned as option1 . However is there any difference between two methods. Iceberg stores ...

Question 13

I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...

Question 14

I have an EMR cluster configured with the following SecurityConfiguration: "AuthenticationConfiguration": { "IdentityCenterConfiguration": { "EnableIdentityCenter":...

Question 15

Gives the below JSON: { "environments": [ {"env": "dev", "description": "dev environment"}, {"env": "dev01", "...

Collectives™ on Stack Overflow

Flink Job Manager Direct Buffer Memory gets exhausted when checkpointing enabled

Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries

EMR Spark Job Fails to Connect to MSK with IAM Auth - Timeout Waiting for Node Assignment Error

Sagemaker Unified Studio overriding delta lake configuration to iceberg on EMR

How do you expire snapshot from Iceberg Glue Table

Unable to connect to EMR cluster from SageMaker Unified Studio using runtime role – credentials are null

Unable to register database/table in aws glue when hudi job is submitted from emrserverless

Spark Dynamic Resource Allocation Configuration while using IBM S3 Shuffle Plugin on EMR on EKS

Why Iceberg load is creating many folders in s3?

Installing external python packages on EMR on EC2

EMR on EKS: Dynamic Allocation + FSx Lustre -- Executors with shuffle data won't terminate despite idle timeout

Data write into Iceberg Glue Table (saveAsTable vs option("path", s3_output_path))

Can not read from S3 with AssumedRoleCredentialProvider after upgrade from EMR serverless 6.9 to 7.5

Unable to access Livy after enabling IAM Identity Center (SSO) on my EMR cluster

How to extract a string which contains a digit followed by a letter?

Hot Network Questions