Skip to main content
0 votes
0 answers
24 views

I have MongoDB collections forms and submissions where forms define dynamic UI components (textfield, checkbox, radio, selectboxes, columns, tables, datagrids) and submissions contain the user data in ...
1 vote
1 answer
9k views

I want to call a REST based microservice URL using GET/POST method and display the API response in Databricks using pyspark. Currently I am able to achieve both using python. Here is my python script ...
2 votes
4 answers
1k views

Is there any way to set decimals in the describe() function? I want the result to only show 2 decimals.
1 vote
1 answer
11k views

Sorry, for same type of question. I saw so many post in SO for stage failure. But none of those were able to resolve my issue. So I'm posting it again. I'm running in databricks,Runtime 7.3 LTS. I ...
23 votes
1 answer
59k views

When we execute a dagrun using the Airflow UI, the "Graph View" displays detailed information about each job run. JobID is a unique identifier for each job run, typically formatted as ...
-1 votes
2 answers
46 views

In Azure VM, I have installed standalone Spark 4.0. On the same VM I have Python 3.11 with Jupyter deployed. In my notebook I submitted the following program: from pyspark.sql import SparkSession ...
1 vote
1 answer
3k views

I would like to implement a pandasUDF function in PySpark that returns a matrix of float numbers. I have tried, but the below appeared. RuntimeError: ('Exception thrown when converting pandas.Series (...
0 votes
1 answer
50 views

I have a spark job that runs daily to load data from S3. These data are composed of thousands of gzip files. However, in some cases, there is one or two corrupted files in S3, and it causes the whole ...
2 votes
0 answers
27 views

I have the following setup: Kubernetes cluster with Spark Connect 4.0.1 and MLflow tracking server 3.5.0 MLFlow tracking server should serve all artifacts and is configured this way: --backend-store-...
2 votes
1 answer
3k views

I recently switched EMR to the label 7.0.0. Part of my workload is doing some updates to big Iceberg tables using pyspark. I moved all my s3 paths to the s3 schema instead of s3a as suggested here. ...
199 votes
12 answers
485k views

I work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like to obtain two list containing mvv values and ...
4 votes
1 answer
29k views

is there anyway of reading files located in my local machine other than navigating to 'Data'> 'Add Data' on Databricks. in my past experience using Databricks, when using s3 buckets, I was able to ...
1 vote
3 answers
3k views

Currently my spark console prints like this, which is not very readable: I want it to print each StructField item on a new line, so that it's easier to read. What should I do? Thanks. Update: I'm ...
0 votes
0 answers
25 views

Writing a SharePoint list to delta file format and I get this error- list index out of range. I have included all the required columns to be fetched from sharepoint and check the datatype when writing ...
0 votes
2 answers
17k views

I am trying to create PySpark dataframe by using the following code #!/usr/bin/env python # coding: utf-8 import pyspark from pyspark.sql.session import SparkSession import pyspark.sql.functions as f ...

15 30 50 per page
1
2 3 4 5
2735