Subscribe to RSS

Question 1

I have MongoDB collections forms and submissions where forms define dynamic UI components (textfield, checkbox, radio, selectboxes, columns, tables, datagrids) and submissions contain the user data in ...

Question 2

I want to call a REST based microservice URL using GET/POST method and display the API response in Databricks using pyspark. Currently I am able to achieve both using python. Here is my python script ...

Question 3

Is there any way to set decimals in the describe() function? I want the result to only show 2 decimals.

Question 4

Sorry, for same type of question. I saw so many post in SO for stage failure. But none of those were able to resolve my issue. So I'm posting it again. I'm running in databricks,Runtime 7.3 LTS. I ...

Question 5

When we execute a dagrun using the Airflow UI, the "Graph View" displays detailed information about each job run. JobID is a unique identifier for each job run, typically formatted as ...

Question 6

In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program: from pyspark.sql import SparkSession ...

Question 7

I would like to implement a pandasUDF function in PySpark that returns a matrix of float numbers. I have tried, but the below appeared. RuntimeError: ('Exception thrown when converting pandas.Series (...

Question 8

I have a spark job that runs daily to load data from S3. These data are composed of thousands of gzip files. However, in some cases, there is one or two corrupted files in S3, and it causes the whole ...

Question 9

I have the following setup: Kubernetes cluster with Spark Connect 4.0.1 and MLflow tracking server 3.5.0 MLFlow tracking server should serve all artifacts and is configured this way: --backend-store-...

Question 10

I recently switched EMR to the label 7.0.0. Part of my workload is doing some updates to big Iceberg tables using pyspark. I moved all my s3 paths to the s3 schema instead of s3a as suggested here. ...

Question 11

I work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like to obtain two list containing mvv values and ...

Question 12

is there anyway of reading files located in my local machine other than navigating to 'Data'> 'Add Data' on Databricks. in my past experience using Databricks, when using s3 buckets, I was able to ...

Question 13

Currently my spark console prints like this, which is not very readable: I want it to print each StructField item on a new line, so that it's easier to read. What should I do? Thanks. Update: I'm ...

Question 14

Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...

Question 15

I am trying to create PySpark dataframe by using the following code #!/usr/bin/env python # coding: utf-8 import pyspark from pyspark.sql.session import SparkSession import pyspark.sql.functions as f ...

Collectives™ on Stack Overflow

Pyspark - Flatten nested structure

How to call a REST based API from Databricks using pyspark?

Round df.describe() results

org.apache.spark.SparkException: Job aborted due to stage failure in databricks

How to get the JobID for the airflow dag runs?

Connectivity issues in standalone Spark 4.0

Pyspark, PandasUDF; How to return a matrix using Pyspark.PandasUDF?

Handle corrupted files in spark load()

How log model in mlflow using Spark Connect

When using Iceberg with EMR 7.0.0 with s3 I got awssdk SdkClientException: Timeout waiting for connection from pool

Convert spark DataFrame column to python list

How to read/load local files in Databricks?

How to let Spark output print on separate new lines?

Why do I get List index out of range error when writing a sharepoint list to azure delta lake using pyspark on Azure Databricks?

Pyspark job aborted due to stage failure

Hot Network Questions