1

My company has support for jupyter notebooks running on Spark that can talk to data in S3. The details of that aren't important outside of the fact that I'm trying to do a SQL like command to bring in data. It looks something like this

df = spark.sql("""CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1);""").toPandas() 

However in doing so I get the following syntax error

Py4JJavaError: An error occurred while calling o42.sql. : org.apache.spark.sql.catalyst.parser.ParseException: extraneous input ';' expecting {<EOF>, 'ORDER', 'LIMIT', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 5, pos 16) == SQL == CREATE TEMPORARY VIEW my_view AS ( SELECT thing1, thing2 FROM table1); ----------------^^^ 

It's clear to me what it's complaining about but I don't get WHY it doesn't like that syntax, it seems to be pretty standard. I have a subsequent block SQL block that I want to execute that will do a left join with that view from another table

UPDATE One comment suggested removing the ;. However when doing that, I can't do any subsequent SELECT statements. Here is an updated example for that case

df = spark.sql("""CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1) SELECT view.thing1, view.thing2, table2.thing3 FROM view LEFT JOIN table3 ON table3.thing2 = view.thing2 """).toPandas() 

But this yields

Py4JJavaError: An error occurred while calling o42.sql. : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'SELECT' expecting {<EOF>, 'ORDER', 'LIMIT', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 7, pos 4) == SQL == CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1) SELECT ----^^^ view.thing1, view.thing2, table2.thing3 FROM view LEFT JOIN table3 ON table3.thing2 = view.thing2 
2
  • Remove semicolon ; not required in spark sql Commented Jun 9, 2020 at 22:03
  • 1
    @Srinivas I've tried that as well, but that still doesn't allow for subsequent statements. Perhaps that can't be done in one SQL statement on spark SQL? Commented Jun 9, 2020 at 22:10

1 Answer 1

3

With ; will fail with below exception

spark.sql("""CREATE TEMPORARY VIEW view AS (SELECT thing1, thing2 FROM table1);""")

Exception

u"\nextraneous input ';' expecting (line 1, pos 52)\n\n== SQL ==\nCREATE TEMPORARY VIEW view AS (SELECT thing1, thing2 FROM table1); \n----------------------------------------------------^^^\n"

Removing ; Query will work.

spark.sql("""create temporary view view as (select * from vtable)""") # without ;

Note : It's always good to split sql query into multiple parts else it's difficult to debug.

Change below query

df = spark.sql("""CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1) SELECT view.thing1, view.thing2, table2.thing3 FROM view LEFT JOIN table3 ON table3.thing2 = view.thing2 """).toPandas() 

to

spark.sql("""CREATE TEMPORARY VIEW view AS (SELECT thing1, thing2 FROM table1)""") df = spark.sql("""SELECT view.thing1, view.thing2, table2.thing3 FROM view LEFT JOIN table3 ON table3.thing2 = view.thing2""").toPandas() 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.