1

Is it possible to start the index from n in a pandas dataframe?

I have some datasets saved as csv files, and would like to add the column index with the row number starting from where the last row number ended in the previous file.

For example, for the first file I'm using the following code which works fine, so I got an output csv file with rows starting at 1 to 1048574, as expected:

yellow_jan['index'] = range(1, len(yellow_jan) + 1)

I would like to do same for the yellow_feb file, but starting the row index at 1048575 and so on.

Appreciate any help!

1
  • df.index = np.arange(start_from, start_from+length+1) Commented Nov 27, 2017 at 16:34

3 Answers 3

2
df["new_index"] = range(10, 20) df = df.set_index("new_index") df 
Sign up to request clarification or add additional context in comments.

Comments

1

If your plan is to concat the dataframe you can just use

import pandas as pd import numpy as np df1 = pd.DataFrame({"a": np.arange(10)}) df2 = pd.DataFrame({"a": np.arange(10,20)}) df = pd.concat([df1, df2],ignore_index=True) 

otherwise

df2.index += len(df) 

Comments

0

you may just reset the index at the end or define a local variable and use it in `arange' function. update the variable with the numbers of rows for each file you read.

1 Comment

Thank you so much Roo! Yes, in my case by just using the reset index I could achieve my goal. Here is the piece of code that worked fine for me: df = df.reset_index() df['index'] = df.index + 1048575

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.