How do I get the number of rows of a pandas dataframe df?
20 Answers
For a dataframe df, one can use any of the following:
len(df.index)df.shape[0]df[df.columns[0]].count()(== number of non-NaN values in first column)
Code to reproduce the plot:
import numpy as np import pandas as pd import perfplot perfplot.save( "out.png", setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)), n_range=[2**k for k in range(25)], kernels=[ lambda df: len(df.index), lambda df: df.shape[0], lambda df: df[df.columns[0]].count(), ], labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"], xlabel="Number of rows", ) 19 Comments
shape in interactive work, instead of len(df): Trying out different filtering, I often need to know how many items remain. With shape I can see that just by adding .shape after my filtering. With len() the editing of the command-line becomes much more cumbersome, going back and forth.df.empty is the best option.df.shape[0] faster than len(df) or len(df.columns)? Since 1 ns (nanosecond) = 1000 µs (microsecond), therefore 1.17µs = 1170ns, which means it's roughly 3 times slower than 381nsSuppose df is your dataframe then:
count_row = df.shape[0] # Gives number of rows count_col = df.shape[1] # Gives number of columns Or, more succinctly,
r, c = df.shape 7 Comments
len(df.index) will workshape is not a function is an attribute you can discover that comparing df.shape with df.shape() in your df.Use len(df) :-).
__len__() is documented with "Returns length of index".
Timing info, set up the same way as in root's answer:
In [7]: timeit len(df.index) 1000000 loops, best of 3: 248 ns per loop In [8]: timeit len(df) 1000000 loops, best of 3: 573 ns per loop Due to one additional function call, it is of course correct to say that it is a bit slower than calling len(df.index) directly. But this should not matter in most cases. I find len(df) to be quite readable.
Comments
How do I get the row count of a Pandas DataFrame?
This table summarises the different situations in which you'd want to count something in a DataFrame (or Series, for completeness), along with the recommended method(s).
Footnotes
DataFrame.countreturns counts for each column as aSeriessince the non-null count varies by column.DataFrameGroupBy.sizereturns aSeries, since all columns in the same group share the same row-count.DataFrameGroupBy.countreturns aDataFrame, since the non-null count could differ across columns in the same group. To get the group-wise non-null count for a specific column, usedf.groupby(...)['x'].count()where "x" is the column to count.
Minimal Code Examples
Below, I show examples of each of the methods described in the table above. First, the setup -
df = pd.DataFrame({ 'A': list('aabbc'), 'B': ['x', 'x', np.nan, 'x', np.nan]}) s = df['B'].copy() df A B 0 a x 1 a x 2 b NaN 3 b x 4 c NaN s 0 x 1 x 2 NaN 3 x 4 NaN Name: B, dtype: object Row Count of a DataFrame: len(df), df.shape[0], or len(df.index)
len(df) # 5 df.shape[0] # 5 len(df.index) # 5 It seems silly to compare the performance of constant time operations, especially when the difference is on the level of "seriously, don't worry about it". But this seems to be a trend with other answers, so I'm doing the same for completeness.
Of the three methods above, len(df.index) (as mentioned in other answers) is the fastest.
Note
- All the methods above are constant time operations as they are simple attribute lookups.
df.shape(similar tondarray.shape) is an attribute that returns a tuple of(# Rows, # Cols). For example,df.shapereturns(8, 2)for the example here.
Column Count of a DataFrame: df.shape[1], len(df.columns)
df.shape[1] # 2 len(df.columns) # 2 Analogous to len(df.index), len(df.columns) is the faster of the two methods (but takes more characters to type).
Row Count of a Series: len(s), s.size, len(s.index)
len(s) # 5 s.size # 5 len(s.index) # 5 s.size and len(s.index) are about the same in terms of speed. But I recommend len(df).
Note
sizeis an attribute, and it returns the number of elements (=count of rows for any Series). DataFrames also define a size attribute which returns the same result asdf.shape[0] * df.shape[1].
Non-Null Row Count: DataFrame.count and Series.count
The methods described here only count non-null values (meaning NaNs are ignored).
Calling DataFrame.count will return non-NaN counts for each column:
df.count() A 5 B 3 dtype: int64 For Series, use Series.count to similar effect:
s.count() # 3 Group-wise Row Count: GroupBy.size
For DataFrames, use DataFrameGroupBy.size to count the number of rows per group.
df.groupby('A').size() A a 2 b 2 c 1 dtype: int64 Similarly, for Series, you'll use SeriesGroupBy.size.
s.groupby(df.A).size() A a 2 b 2 c 1 Name: B, dtype: int64 In both cases, a Series is returned. This makes sense for DataFrames as well since all groups share the same row-count.
Group-wise Non-Null Row Count: GroupBy.count
Similar to above, but use GroupBy.count, not GroupBy.size. Note that size always returns a Series, while count returns a Series if called on a specific column, or else a DataFrame.
The following methods return the same thing:
df.groupby('A')['B'].size() df.groupby('A').size() A a 2 b 2 c 1 Name: B, dtype: int64 Meanwhile, for count, we have
df.groupby('A').count() B A a 2 b 1 c 0 ...called on the entire GroupBy object, vs.,
df.groupby('A')['B'].count() A a 2 b 1 c 0 Name: B, dtype: int64 Called on a specific column.
3 Comments
s.shape[0] work for row count in a series.df.index.size as an alternative to len(df.index) to your table.TL;DR use len(df)
len() returns the number of items(the length) of a list object(also works for dictionary, string, tuple or range objects). So, for getting row counts of a DataFrame, simply use len(df). For more about len function, see the official page.
Alternatively, you can access all rows and all columns with df.index, and df.columns,respectively. Since you can use the len(anyList) for getting the element numbers, using the len(df.index) will give the number of rows, and len(df.columns) will give the number of columns.
Or, you can use df.shape which returns the number of rows and columns together (as a tuple) where you can access each item with its index. If you want to access the number of rows, only use df.shape[0]. For the number of columns, only use: df.shape[1].
3 Comments
len works well for getting row counts. Here is the script onecompiler.com/python/3xc9nuvrxdf.shape isn't faster than len as it just have to get the shape attribute and not call the function __len__Apart from the previous answers, you can use df.axes to get the tuple with row and column indexes and then use the len() function:
total_rows = len(df.axes[0]) total_cols = len(df.axes[1]) 1 Comment
For a dataframe df:
When you're still writing your code:
len(df)df.shape[0]
Fastest once your code is done:
len(df.index)
At normal data sizes each option will finish in under a second. So the "fastest" option is actually whichever one lets you work the fastest, which can be len(df) or df.shape[0] if you already have a subsetted df and want to just add .shape[0] briefly in an interactive session.
In final optimized code, the fastest runtime is len(df.index).
df[df.columns[0]].count() was omitted in the above discussion because no commenter has identified a case where it is useful. It is exponentially slow, and long to type. It provides the number of non-NaN values in the first column.
Code to reproduce the plot:
pip install pandas perfplot
import numpy as np import pandas as pd import perfplot perfplot.save( "out.png", setup=lambda n: pd.DataFrame(np.arange(n * 3).reshape(n, 3)), n_range=[2**k for k in range(25)], kernels=[ lambda df: len(df.index), lambda df: len(df), lambda df: df.shape[0], lambda df: df[df.columns[0]].count(), ], labels=["len(df.index)", "df.shape[0]", "df[df.columns[0]].count()"], xlabel="Number of rows", ) 1 Comment
len(df) nor any purpose for df[df.columns[0]].count()....building on Jan-Philip Gehrcke's answer.
The reason why len(df) or len(df.index) is faster than df.shape[0]:
Look at the code. df.shape is a @property that runs a DataFrame method calling len twice.
df.shape?? Type: property String form: <property object at 0x1127b33c0> Source: # df.shape.fget @property def shape(self): """ Return a tuple representing the dimensionality of the DataFrame. """ return len(self.index), len(self.columns) And beneath the hood of len(df)
df.__len__?? Signature: df.__len__() Source: def __len__(self): """Returns length of info axis, but here we use the index """ return len(self.index) File: ~/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py Type: instancemethod len(df.index) will be slightly faster than len(df) since it has one less function call, but this is always faster than df.shape[0]
2 Comments
len() you would execute len??I come to Pandas from an R background, and I see that Pandas is more complicated when it comes to selecting rows or columns.
I had to wrestle with it for a while, and then I found some ways to deal with:
Getting the number of columns:
len(df.columns) ## Here: # df is your data.frame # df.columns returns a string. It contains column's titles of the df. # Then, "len()" gets the length of it. Getting the number of rows:
len(df.index) # It's similar. 1 Comment
df.shape. It returns the number of rows and columns respectively.You can do this also:
Let’s say df is your dataframe. Then df.shape gives you the shape of your dataframe i.e (row,col)
Thus, assign the below command to get the required
row = df.shape[0], col = df.shape[1] 1 Comment
row, col = df.shape instead if you need to get both at the same them (it's shorter and you do not have to care about indexes).In case you want to get the row count in the middle of a chained operation, you can use:
df.pipe(len) Example:
row_count = ( pd.DataFrame(np.random.rand(3,4)) .reset_index() .pipe(len) ) This can be useful if you don't want to put a long statement inside a len() function.
You could use __len__() instead but __len__() looks a bit weird.
1 Comment
count = len(df.reset_index()) than count = df.reset_index().pipe(len). The former is just an attribute lookup without the function call.Either of this can do it (df is the name of the DataFrame):
Method 1: Using the len function:
len(df) will give the number of rows in a DataFrame named df.
Method 2: using count function:
df[col].count() will count the number of rows in a given column col.
df.count() will give the number of rows for all the columns.
1 Comment
len(df) is the simplest, and also the fastest for large dataframe df.
import time import numpy as np import pandas as pd a = np.zeros(3*10**9) b = np.zeros(3*10**9) a[100:300] = 2 b[100:210] = 1 df = pd.DataFrame({'a':pd.arrays.SparseArray(a, fill_value=0), 'b':pd.arrays.SparseArray(b, fill_value=0)}) start = time.time() row_count = len(df.index) end = time.time() print("len(df.index) takes: " +str(end-start)+ " seconds") start = time.time() row_count = df.shape[0] end = time.time() print("df.shape[0] takes: " +str(end-start)+ " seconds") start = time.time() row_count = len(df) end = time.time() print("len(df) takes: " +str(end-start)+ " seconds") Output:
len(df.index) takes: 0.00010704994201660156 seconds df.shape[0] takes: 0.00010991096496582031 seconds len(df) takes: 7.677078247070312e-05 seconds Comments
An alternative method to finding out the amount of rows in a dataframe which I think is the most readable variant is pandas.Index.size.
Do note that, as I commented on the accepted answer,
Suspected
pandas.Index.sizewould actually be faster thanlen(df.index)buttimeiton my computer tells me otherwise (~150 ns slower per loop).
Comments
len(df.index) would work the fastest of all the ways listed
1 Comment
Think, the dataset is "data" and name your dataset as " data_fr " and number of rows in the data_fr is "nu_rows"
#import the data frame. Extention could be different as csv,xlsx or etc. data_fr = pd.read_csv('data.csv') #print the number of rows nu_rows = data_fr.shape[0] print(nu_rows) 


df.count()will only return the count of non-NA/NaN rows for each column. You should usedf.shape[0]instead, which will always correctly tell you the number of rows.