4,988 questions
-3 votes
1 answer
145 views
Flink Job Manager Direct Buffer Memory gets exhausted when checkpointing enabled
Issue: Flink application throws Thread 'jobmanager-io-thread-25' produced an uncaught exception. java.lang.OutOfMemoryError: Direct buffer memory and terminates after running for 2-3 days. No matter ...
0 votes
0 answers
76 views
Unexpected Write Behavior when using MERGE INTO/INSERT INTO Iceberg Spark Queries
I am observing different write behaviors when executing queries on EMR Notebook (correct behavior) vs when using spark-submit to submit a spark application to EMR Cluster (incorrect behavior). When I ...
0 votes
0 answers
70 views
EMR Spark Job Fails to Connect to MSK with IAM Auth - Timeout Waiting for Node Assignment Error
I am running an Apache Spark job on Amazon EMR that needs to connect to an Amazon MSK cluster configured with IAM authentication. The EMR cluster has an IAM role with full MSK permissions, and I can ...
1 vote
0 answers
67 views
Sagemaker Unified Studio overriding delta lake configuration to iceberg on EMR
I am connecting to an EMR cluster through SageMaker Unified Studio(JupyterLab). My EMR cluster is configured with Delta Lake support, and I have the following Spark properties set on the cluster: ...
0 votes
0 answers
61 views
How do you expire snapshot from Iceberg Glue Table
I have one Iceberg table in Glue Catalog. I am unable to runw a select * as one of metadata file is missing. I am trying to point to latest metadata file. How can I do that? I am using EMR 7.7 with ...
2 votes
0 answers
166 views
Unable to connect to EMR cluster from SageMaker Unified Studio using runtime role – credentials are null
I'm trying to connect to an existing EMR cluster from SageMaker Unified Studio to run SQL queries via JupyterLab. SageMaker requires that the EMR cluster be runtime role-enabled to integrate with ...
0 votes
1 answer
59 views
Unable to register database/table in aws glue when hudi job is submitted from emrserverless
I am using emr 6.15 and hudi 0.14 I submitted following hudi job which should create a database and a table in aws glue. IAM Role assigned to EMR serverless has all neccessary permissions of s3 and ...
1 vote
0 answers
56 views
Spark Dynamic Resource Allocation Configuration while using IBM S3 Shuffle Plugin on EMR on EKS
I have successfully implemented the IBM S3 Shuffle Plugin v0.9.6 (https://github.com/IBM/spark-s3-shuffle) on EMR on EKS (Spark 3.5.0) and the shuffle operations are working correctly with S3 storage. ...
0 votes
1 answer
141 views
Why Iceberg load is creating many folders in s3?
I am writing data into s3 and table format is Iceberg in Glue Catalog. I see the /data and /metadata folders are getting created. However when I am writing data, it's creating 001/002 kind of folders. ...
0 votes
0 answers
40 views
Installing external python packages on EMR on EC2
I want to install external Python packages on EMR with an EC2 setup, but currently, apart from bootstrap actions, nothing else seems to be working. The problem with this setup is that if I want to ...
3 votes
1 answer
104 views
EMR on EKS: Dynamic Allocation + FSx Lustre -- Executors with shuffle data won't terminate despite idle timeout
Having trouble getting dynamic allocation to properly terminate idle executors when using FSx Lustre for shuffle persistence on EMR 7.8 (Spark 3.5.4) on EKS. Trying this strategy out to battle cost ...
0 votes
0 answers
41 views
Data write into Iceberg Glue Table (saveAsTable vs option("path", s3_output_path))
I am exploring data write into glue Table (Iceberg Table format). I have been using saveAsTable method mentioned as option1 . However is there any difference between two methods. Iceberg stores ...
0 votes
1 answer
104 views
Can not read from S3 with AssumedRoleCredentialProvider after upgrade from EMR serverless 6.9 to 7.5
I have a pyspark script that reads data from S3 in a different AWS account, using AssumedRoleCredentialProvider , it is working on emr serverless 6.9 but when I upgrade to EMR Serverless 7.5 it fails ...
0 votes
0 answers
33 views
Unable to access Livy after enabling IAM Identity Center (SSO) on my EMR cluster
I have an EMR cluster configured with the following SecurityConfiguration: "AuthenticationConfiguration": { "IdentityCenterConfiguration": { "EnableIdentityCenter":...
0 votes
0 answers
59 views
How to extract a string which contains a digit followed by a letter?
Gives the below JSON: { "environments": [ {"env": "dev", "description": "dev environment"}, {"env": "dev01", "...