I have a dataframe called df1:
Long_ID IndexBegin IndexEnd 0 10000001 0 3 1 10000002 3 6 2 10000003 6 10 I have a second dataframe called df2, which can be up to 1 million rows long:
Short_ID 0 1 1 2 2 3 3 10 4 20 5 30 6 100 7 101 8 102 9 103 I want to link Long_ID to Short_ID in such a way that if (IndexBegin:IndexEnd) is (0:3), then Long_ID gets inserted into df2 at indexes 0 through 2 (IndexEnd - 1). The starting index and ending index are determined using the last two columns of df1.
So that ultimately, my final dataframe looks like this: df3:
Short_ID Long_ID 0 1 10000001 1 2 10000001 2 3 10000001 3 10 10000002 4 20 10000002 5 30 10000002 6 100 10000003 7 101 10000003 8 102 10000003 9 103 10000003 First, I tried storing the index of df2 as a key and Short_ID as a value in a dictionary, then iterating row by row, but that was too slow. This led me to learn about vectorization.
Then, I tried using where(), but I got "ValueError: Can only compare identically-labeled Series objects."
df2 = df2.reset_index() df2['Long_ID'] = df1['Long_ID'] [ (df2['index'] < df1['IndexEnd']) & (df2['index'] >= df1['IndexBegin']) ] I am relatively new to programming, and I appreciate if anyone can give a better approach to solving this problem. I have reproduced the code below:
df1_data = [(10000001, 0, 3), (10000002, 3, 6), (10000003, 6, 10)] df1 = pd.DataFrame(df1_data, columns = ['Long_ID', 'IndexBegin', 'IndexEnd']) df2_data = [1, 2, 3, 10, 20, 30, 100, 101, 102, 103] df2 = pd.DataFrame(df2_data, columns = ['Short_ID'])