Im trying to compare two sets of coordinates in to dataframes using nested for loops. Where distance is less than a predefined value, I want to overwrite the coordinates in the qinsy_file_2. If they are not within that distance, I want to drop the row.
So far, the script seems to pass through one iteration, but fail on a key error on the second iteration whilst calculating the distance.
Is there anything Im obviously doing wrong here? Ive looked extensively for questions already but have come up empty so far. (I'm going slightly mad, this has stumped me the whole week)
## Pull values from GUI qinsy_file=pd.read_csv(values["-QINSYInput-"],sep=',') segy_file=pd.read_csv(values["-SEGYInput-"],sep='\t') #print(segy_file) in_file=str(values["-QINSYInput-"]) ## Make the outfile name by replacing file suffix out_file=in_file.replace(".csv","_SEGY_NAV.csv").replace(".txt","_SEGY_NAV.txt") ## Correlation zone = 30cm buffer=0.2 ## Get required headers segy_vlookup=segy_file[['CDP_X','CDP_Y']] qinsy_file_2=qinsy_file[['Date','Time','Sparker CoG Easting','Sparker CoG Northing', 'Streamer CoG Easting','Streamer CoG Northing','CMP Easting', 'CMP Northing','Fix Number','CMP DTM Depth']] ## Loop through Qinsy file for index_qinsy,row_qinsy in qinsy_file_2.iterrows(): ## Loop through SEGY navigation for index_segy,row_segy in segy_vlookup.iterrows(): ## Calculate distance between points distance = (((segy_vlookup["CDP_X"][index_segy] - qinsy_file_2["CMP Easting"][index_qinsy])**2) + ((segy_vlookup["CDP_Y"][index_segy] - qinsy_file_2["CMP Northing"][index_qinsy])**2))**0.5 print(distance) ## If distance between points is less than or equal to the correlation value, replace the CMP X and Y values in the QINSY file if distance <= buffer: qinsy_file_2["CMP Easting"][index_qinsy]=segy_vlookup["CDP_X"][index_segy] qinsy_file_2["CMP Northing"][index_qinsy]=segy_vlookup["CDP_Y"][index_segy] print(qinsy_file_2) #qinsy_file_2["CMP Easting"]=segy_vlookup["CDP_X"] #qinsy_file_2["CMP Northing"]=segy_vlookup["CDP_Y"] else: ## Need to delete the row at this point qinsy_file_2.drop(index_qinsy,inplace=True) ## Export the "filtered" dataframe to csv, turning off index qinsy_file_2.to_csv(out_file,sep=',',index=False,header=True) When it works, it should export a stripped down version of Qinsy_file_2, only containing rows with coordinates in common with SEGY_vlookup (I appreciate the last is poorly named, I changed my methodology)
Here is the terminal feedback I keep recieving:
71.10718458835196 # this is distance Traceback (most recent call last): File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\range.py", line 414, in get_loc return self._range.index(new_key) ValueError: 0 is not in range The above exception was the direct cause of the following exception: Traceback (most recent call last): File "p:\Xtra\Public\TH\Python Code\SEIS_NAV Comparison.py", line 52, in <module> distance = (((segy_vlookup["CDP_X"][index_segy] - qinsy_file_2["CMP Easting"][index_qinsy])**2) + ((segy_vlookup["CDP_Y"][index_segy] - qinsy_file_2["CMP Northing"][index_qinsy])**2))**0.5 File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py", line 1040, in __getitem__ return self._get_value(key) File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\series.py", line 1156, in _get_value loc = self.index.get_loc(label) File "C:\Users\tholgate\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\core\indexes\range.py", line 416, in get_loc raise KeyError(key) from err KeyError: 0
breakto theTruecase of your innerforand then where you haveelse, you want to unindent that one level making it the "no break" clause of the inner loop. Finally, altering something while iterating over it is tricky and you will likely want to rework things to avoid that as it will often lead to unexpected results.