Checking whether a column has proper decimal number

Question

I have a dataframe (input_dataframe), which looks like as below:

id test_column 1 0.25 2 1.1 3 12 4 test 5 1.3334 6 .11

I want to add a column result, which put values 1 if test_column has a decimal value and 0 if test_column has any other value. data type of test_column is string. Below is the expected output:

id test_column result 1 0.25 1 2 1.1 1 3 12 0 4 test 0 5 1.3334 1 6 .11 1

Can we achieve it using pySpark code?

mrsrinivas · Accepted Answer · 2017-10-06 10:21:08Z

4

You can parse decimal token with decimal.Decimal()

Here we are binding the code inside a UDF then using df.withColumn

import decimal from pyspark.sql.types import IntType def is_valid_decimal(s): try: # return (0 if decimal.Decimal(val) == int(decimal.Decimal(val)) else 1) return (0 if decimal.Decimal(val)._isinteger() else 1) except decimal.InvalidOperation: return 0 # register the UDF for usage sqlContext.udf.register("is_valid_decimal", is_valid_decimal, IntType()) # Using the UDF df.withColumn("result", is_valid_decimal("test_column"))

edited Oct 6, 2017 at 10:21

answered Oct 6, 2017 at 5:43

mrsrinivas

35.7k13 gold badges133 silver badges132 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

rajat saxena Over a year ago

this solution is not working for value decimal.Decimal("12"), it should return 0 in such cases.

mrsrinivas Over a year ago

Ohh. I did not check that!! I updated the answer now. @rajatsaxena

rajat saxena Over a year ago

Its perfectly matching the above mentioned scenario, just wanted to check if we can have solution for value 12.0, in this case it should be considered as decimal, but current solution does not recognize it as decimal.

Usman Azhar Over a year ago

@mrsrinivas this looks like not working for me, i am getting AttributeError: 'decimal.Decimal' object has no attribute '_isinteger' Which version of pyspark are you using and which python version, i am using latest spark2.2 and python3.6.3 version.

mrsrinivas Over a year ago

python version is 2.7.2

Collectives™ on Stack Overflow

Checking whether a column has proper decimal number

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related