340

I am using spark-csv to load data into a DataFrame. I want to do a simple query and display the content:

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").load("my.csv") df.registerTempTable("tasks") results = sqlContext.sql("select col from tasks"); results.show() 

The col seems truncated:

scala> results.show(); +--------------------+ | col| +--------------------+ |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:15:...| |2015-11-06 07:15:...| |2015-11-16 07:15:...| |2015-11-16 07:21:...| |2015-11-16 07:21:...| |2015-11-16 07:21:...| +--------------------+ 

How do I show the full content of the column?

18 Answers 18

582

results.show(20, false) will not truncate, note that Python would require False rather than false which is valid in Scala/Java/Spark shell.

Check the source

20 is the default number of rows displayed when show() is called without any arguments.

Sign up to request clarification or add additional context in comments.

9 Comments

Not OP but this is indeed the right answer : Minor correction, boolean should be False, not false.
It would be "False" in python, but "false" in scala/java
it's false (not False) in spark-shell
the equivalent for writing to stream in console mode is dataFrame.writeStream.outputMode("append").format("console").option("truncate", "false").start()
what is so special about 20? Why 20?
|
71

If you put results.show(false) , results will not be truncated

5 Comments

I imagine that the comment on TomTom101's answer about false applies here, too.
@Narendra Parmar the syntax should be results.show(20, False). The one you have mentioned will give error.
@ Jai Prakash , i have given this answer for scala and you are talking about python,
@NarendraParmar sorry you are correct. In scala both the options are valid. results.show(false) and results.show(20, false)
@JaiPrakash -- in ASA, "false" has to have a capital f: "False" is ok, but "false" gives an error.
43

Below code would help to view all rows without truncation in each column

df.show(df.count(), False) 

3 Comments

same questio i asked the prior answerer: does this cause df to be collected twice?
@javadba yes, I think count() will go through df once, and show() will collect df twice.
As an alternative, you could give a very large number as the first parameter instead of df.count() in order to save on the requirement to persist. For example, if the row count of df is 1000, you could do df.show(1000000, false) and it will work. Tried the following and it worked: scala> println(df.count) res2: Long = 987 scala> df.show(990)
23

The other solutions are good. If these are your goals:

  1. No truncation of columns,
  2. No loss of rows,
  3. Fast and
  4. Efficient

These two lines are useful ...

 df.persist df.show(df.count, false) // in Scala or 'False' in Python 

By persisting, the 2 executor actions, count and show, are faster & more efficient when using persist or cache to maintain the interim underlying dataframe structure within the executors. See more about persist and cache.

Comments

13

In Pyspark we can use:

  • df.show(truncate=False) this will display the full content of the columns without truncation.

  • df.show(5,truncate=False) this will display the full content of the first five rows.

Comments

12

results.show(20, False) or results.show(20, false) depending on whether you are running it on Java/Scala/Python

Comments

10

The following answer applies to a Spark Streaming application.

By setting the "truncate" option to false, you can tell the output sink to display the full column.

val query = out.writeStream .outputMode(OutputMode.Update()) .format("console") .option("truncate", false) .trigger(Trigger.ProcessingTime("5 seconds")) .start() 

Comments

7

In Spark Pythonic way, remember:

  • if you have to display data from a dataframe, use show(truncate=False) method.
  • else if you have to display data from a Stream dataframe view (Structured Streaming), use the writeStream.format("console").option("truncate", False).start() methods with option.

Hope it could helps someone.

Comments

4

Within Databricks you can visualize the dataframe in a tabular format. With the command:

display(results) 

It will look like

enter image description here

1 Comment

how with display() show only, for example, first 5 rows?
4

In c# Option("truncate", false) does not truncate data in the output.

StreamingQuery query = spark .Sql("SELECT * FROM Messages") .WriteStream() .OutputMode("append") .Format("console") .Option("truncate", false) .Start(); 

Comments

4

Try

df.show(20,False) 

Notice that if you do not specify the number of rows you want to show, it will show 20 rows but will execute all your dataframe which will take more time !

Comments

3

try this command :

df.show(df.count()) 

3 Comments

Try this: df.show(some no) will work but df.show(df.count()) will not work df.count gives output type long which is not accepted by df.show() as it accept integer type.
Example use df.show(2000). It will retrieve 2000 rows
does this cause df to be collected twice?
3

results.show(false) will show you the full column content.

Show method by default limit to 20, and adding a number before false will show more rows.

Comments

3

results.show(20,false) did the trick for me in Scala.

Comments

3

Tried this in pyspark

df.show(truncate=0) 

Comments

1

PYSPARK

In the below code, df is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as False.

df.show(df.count(),False) 

enter image description here


SCALA

In the below code, df is the name of dataframe. 1st parameter is to show all rows in the dataframe dynamically rather than hardcoding a numeric value. The 2nd parameter will take care of displaying full column contents since the value is set as false.

df.show(df.count().toInt,false) 

enter image description here

Comments

1

PYSPARK

ds.show(df.count(),truncate=0) 

The first parameter helps us to show all records The second parameter will help for column expansion.

Note: observed a behaviour difference between using truncate=False and truncate=0, 0 actually expands the column data while False doesn't

Comments

0

Try this in scala:

df.show(df.count.toInt, false) 

The show method accepts an integer and a Boolean value but df.count returns Long...so type casting is required

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.