2

I have a dataframe (input_dataframe), which looks like as below:

id test_column 1 0.25 2 1.1 3 12 4 test 5 1.3334 6 .11 

I want to add a column result, which put values 1 if test_column has a decimal value and 0 if test_column has any other value. data type of test_column is string. Below is the expected output:

id test_column result 1 0.25 1 2 1.1 1 3 12 0 4 test 0 5 1.3334 1 6 .11 1 

Can we achieve it using pySpark code?

0

1 Answer 1

4

You can parse decimal token with decimal.Decimal()

Here we are binding the code inside a UDF then using df.withColumn

import decimal from pyspark.sql.types import IntType def is_valid_decimal(s): try: # return (0 if decimal.Decimal(val) == int(decimal.Decimal(val)) else 1) return (0 if decimal.Decimal(val)._isinteger() else 1) except decimal.InvalidOperation: return 0 # register the UDF for usage sqlContext.udf.register("is_valid_decimal", is_valid_decimal, IntType()) # Using the UDF df.withColumn("result", is_valid_decimal("test_column")) 
Sign up to request clarification or add additional context in comments.

5 Comments

this solution is not working for value decimal.Decimal("12"), it should return 0 in such cases.
Ohh. I did not check that!! I updated the answer now. @rajatsaxena
Its perfectly matching the above mentioned scenario, just wanted to check if we can have solution for value 12.0, in this case it should be considered as decimal, but current solution does not recognize it as decimal.
@mrsrinivas this looks like not working for me, i am getting AttributeError: 'decimal.Decimal' object has no attribute '_isinteger' Which version of pyspark are you using and which python version, i am using latest spark2.2 and python3.6.3 version.
python version is 2.7.2

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.