0

I have a large csv file and it contains repeated rows, I want to delete all these repeated rows, containing word "Names"

1 Names Dates Picture 2 Alex 6-12 4364.jpg 3 Names Dates Picture 4 Jade 8-11 7435.jpg 5 Names Dates Picture 6 Dread 1-5 8635.jpg 

The csv file looks like this. I want to delete all the rows with these repeated "Names" "Dates" "Picture".

I have tried different methods from online but I can't find solution

Im using pandas to import the csv file df = pd.read_csv('file2022.csv')

1
  • 1
    Names row seems to be column header but it's repeated in content. How is your file generated? Commented Apr 17, 2022 at 13:42

2 Answers 2

2

You can use drop_duplicates here:

df = pd.read_csv('test2.csv', sep=' *', engine='python', header=None, index_col=0) df.drop_duplicates(keep=False, inplace=True) df.reset_index(inplace=True, drop=True) print(df) 

Output:

 1 2 3 0 Alex 6-12 4364.jpg 1 Jade 8-11 7435.jpg 2 Dread 1-5 8635.jpg 
Sign up to request clarification or add additional context in comments.

Comments

1
df = df[df["Names"] != "Names"] 

should drop the "Names" values under "Names" column.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.