1

im making a little work with csv and pandas and I must merge two CSV lists on one and delete the duplicates but the final output add extra commas to the last column and I don´t know why

I have two CSV lists like this:

 DESCRIPTION EXTRAS ADDRESS AVAILABLE 1 House WiFi CP 432 1 2 Farm NONE CP 345 1 3 House Wifi CP 315 1 DESCRIPTION EXTRAS ADDRESS AVAILABLE 1 House WiFi CP 437 0 2 House Wifi CP 315 0 

And when I merge the both the result is: (the number of "," is absolutely random)

ID DESCRIPTION EXTRAS ADDRESS AVAILABLE,,,,, 1 House WiFi CP 432 1,,,,,, 2 Farm NONE CP 345 1,,,, 3 House Wifi CP 315 1,,,,,, 1 House WiFi CP 437 0,,,,, 

This is my code:

with open("C:\\files\\20171412123920-1\\20171412123920-1Total.csv", "rt", encoding="utf-8") as f2: reader = csvCSV.reader(f) for row in reader: merged.append(row) with open("C:\\files\\20171412123920-1\\20171412123920-1.csv", "rt", encoding="utf-8") as f: readerTotal = csvCSV.reader(f2) for row in readerTotal: merged.append(row) with open("C:\\Users\\Desktop\\Test\\Python\\20171412123920-1Comparacion.csv", "wb") as csvfile: spamwriter = csv.writer(csvfile,dialect='excel', encoding='utf-8') spamwriter.writerow(["ID","DESCRIPTION","EXTRAS","ADDRESS","AVAILABLE"]) for row in merged: spamwriter.writerow(row) df=pd.read_csv("C:\\Users\\Desktop\\Test\\Python\\20171412123920-1Comparacion.csv", error_bad_lines=False) df.to_string(index=False) df.drop_duplicates(['DESCRIPTION'], keep='first', inplace = True) df = df.reset_index(drop=True) df.set_index('ID', inplace = True) df.to_csv("C:\\Users\\Desktop\\Test\\Python\\201714121239201Comparacion.csv") 
3
  • Huh... why are you opening the files with "rt"? Commented Dec 15, 2017 at 9:48
  • Really I´m new in this kind of stuff and I took it from a tutorial, the "rt" means "read in default text mode" Commented Dec 15, 2017 at 9:52
  • 1
    First of all, always use pd.read_csv when loading CSVs into a dataframe. I think this problem is happening because of the manner in which you're reading those CSVs. Commented Dec 15, 2017 at 9:54

1 Answer 1

1

First you will merge both csv file in pandas dataframe. Then drop duplicate data from dataframe.

import pandas as pd df1=pd.read_csv('first.csv') df2=pd.read_csv('second.csv') frames = [df1, df2] result=pd.concat(frames) df5 = pd.DataFrame(result) df5.drop_duplicates() print(df5) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.