0

I have to dataframes (df), df1 contains countries with the number infections over time (2000+ rows) and df2 contains countries with population numbers (200 rows).

I have been trying to get the population number from df2 to df1 in order to transform the infections to infection density (?) over time.

In my mind I have to iterate over the rows of df1 and check the Country column per index to df2. If the result is True I can copy the the population from df2 to df1. I have tried multiple approaches (just one below) but am at a loss right now :(...could someone give me a push in the right direction?

for index, row in df2.iterrows(): df_test = df1['Country'].str.contains(row[0]) 

Edit update with df1, df2 and preferred outcome: df1

 ObservationDate Country/Region Confirmed 0 -2.118978 Hong Kong 0.0 1 -2.118978 Japan 2.0 2 -2.118978 Macau 1.0 3 -2.118978 Mainland China 547.0 4 -2.118978 South Korea 1.0 

df2

 0 1 0 China 1.401580e+09 1 India 1.359321e+09 2 United States[c] 3.293798e+08 3 Indonesia 2.669119e+08 4 Brazil 2.111999e+08 

df_preferred

 ObservationDate Country/Region Confirmed Population 0 -2.118978 Hong Kong 0.0 1 -2.118978 Japan 2.0 2 -2.118978 Macau 1.0 3 -2.118978 Mainland China 547.0 1.401580e+09 4 -2.118978 South Korea 1.0 
2
  • You do not give enough element for me to give any code, but this looks like a use case for merge. Commented Mar 9, 2020 at 9:05
  • can you update your question with two data frames and the result you are expecting to get? Commented Mar 9, 2020 at 9:06

2 Answers 2

1

Assume that your both DataFrames are as follows:

 Country Date Infection 0 Aaaaa 2020-03-02 10 1 Aaaaa 2020-03-04 20 2 Bbbbb 2020-03-02 15 3 Bbbbb 2020-03-04 20 4 Ccccc 2020-03-02 12 5 Ccccc 2020-03-04 40 Country Population 0 Aaaaa 10000000 1 Bbbbb 35200000 2 Ccccc 48700000 

Then, to merge them and save the result in another DataFrame you can run:

df3 = df1.merge(df2, on='Country') 

getting:

 Country Date Infection Population 0 Aaaaa 2020-03-02 10 10000000 1 Aaaaa 2020-03-04 20 10000000 2 Bbbbb 2020-03-02 15 35200000 3 Bbbbb 2020-03-04 20 35200000 4 Ccccc 2020-03-02 12 48700000 5 Ccccc 2020-03-04 40 48700000 

And to compute the infection rate you can execute:

df3['InfectionRate'] = df3.Infection / df3.Population 
Sign up to request clarification or add additional context in comments.

1 Comment

Amazing, this tackles most of my problems, some countries are not merges but this is because their name differs between df's. Thank you!
0

I think this will do the work:

data1 = {'Country':['Germany', 'USA',"Canada", "UK"], 'Inf':[2,5,6,8]} data2 = {'Country':['Germany', 'USA',"Canada", "UK"], 'popul':[80,300,30,70]} # Creating the dataframes df1 = pd.DataFrame(data1) df2 = pd.DataFrame(data2) # Setting the index from the column country df2 = df2.set_index('Country') df1 = df1.set_index('Country') # concating the dataframes along axis 1 without sorting pd.concat([df1,df2], axis=1, sort=False) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.