My company has support for jupyter notebooks running on Spark that can talk to data in S3. The details of that aren't important outside of the fact that I'm trying to do a SQL like command to bring in data. It looks something like this
df = spark.sql("""CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1);""").toPandas() However in doing so I get the following syntax error
Py4JJavaError: An error occurred while calling o42.sql. : org.apache.spark.sql.catalyst.parser.ParseException: extraneous input ';' expecting {<EOF>, 'ORDER', 'LIMIT', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 5, pos 16) == SQL == CREATE TEMPORARY VIEW my_view AS ( SELECT thing1, thing2 FROM table1); ----------------^^^ It's clear to me what it's complaining about but I don't get WHY it doesn't like that syntax, it seems to be pretty standard. I have a subsequent block SQL block that I want to execute that will do a left join with that view from another table
UPDATE One comment suggested removing the ;. However when doing that, I can't do any subsequent SELECT statements. Here is an updated example for that case
df = spark.sql("""CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1) SELECT view.thing1, view.thing2, table2.thing3 FROM view LEFT JOIN table3 ON table3.thing2 = view.thing2 """).toPandas() But this yields
Py4JJavaError: An error occurred while calling o42.sql. : org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'SELECT' expecting {<EOF>, 'ORDER', 'LIMIT', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 7, pos 4) == SQL == CREATE TEMPORARY VIEW view AS ( SELECT thing1, thing2 FROM table1) SELECT ----^^^ view.thing1, view.thing2, table2.thing3 FROM view LEFT JOIN table3 ON table3.thing2 = view.thing2