I have a code that grabs quickly the twitter followers for 12 different users. After appending to a Pandas DataFrame, the data is compared to the same file pulled the day before.
It allows me to see which users have gained, lost, and returned followers.
The code works fine. However, the for loop to compare changes between days is slow. Any ideas how to help me out with the for loop section?
TRANSFORMATION FUNCTIONS
# Transformation Functions # Gained Users def gained_users(account, new, old): gained_followers = [] old_user_ids = old.Follower_ID[old.Handles == account].unique().tolist() new_user_ids = new.Follower_ID[new.Handles == account].unique().tolist() gained = list(set(new_user_ids) - set(old_user_ids)) gained_followers.extend(map(str, gained)) return gained_followers # Lost Users def lost_users(account, new, old): old_user_ids = old.Follower_ID[old.Handles == account].unique().tolist() new_user_ids = new.Follower_ID[new.Handles == account].unique().tolist() lost = list(set(old_user_ids) - set(new_user_ids)) return lost # Returned Users def returned_users(account, new, old): returned_followers = [] new_user_ids = new.Follower_ID[new.Handles == account].unique().tolist() returned_user_ids = old.Follower_ID[(old.Handles == account) & (old.End_Date.notnull() == True)].unique().tolist() returned = list(set(returned_user_ids).intersection(new_user_ids)) returned_followers.extend(map(str, returned)) return returned_followers FOR LOOP SECTION
# Add Returned Users for username in lookup_users: returned_ids = returned_users(username, new_followers_df, historical_followers_df) if returned_ids: historical_followers_df.loc[(historical_followers_df["Handles"] == username) & (historical_followers_df["Follower_ID"] == ids), "Start_Date"] = today historical_followers_df.loc[(historical_followers_df["Handles"] == username) & (historical_followers_df["Follower_ID"] == ids), "Returned_After_Days"] = pd.to_datetime(historical_followers_df.Start_Date) - pd.to_datetime(historical_followers_df.End_Date) historical_followers_df.loc[(historical_followers_df["Handles"] == username) & (historical_followers_df["Follower_ID"] == ids), "End_Date"] = np.NaN # Add Lost Users for username in lookup_users: lost_ids = lost_users(username, new_followers_df, historical_followers_df) if lost_ids: for ids in lost_ids: historical_followers_df.loc[(historical_followers_df["Handles"] == username) & (historical_followers_df["Follower_ID"] == ids), "End_Date"] = today # Add Gained Users for username in lookup_users: new_ids = gained_users(username, new_followers_df, historical_followers_df) if new_ids: gained_users_list = pd.DataFrame({ "Handles": username, "Follower_ID": new_ids, "Start_Date": today}) historical_followers_df = gained_users_list.append(historical_followers_df, ignore_index=True) EDIT:
Hi @Graipher thanks for your help. The reasoning makes sense and the structure is much neater! Hopefully if you don't mind, can you please answer these three questions:
1. Could you explain the old_account = old.Handles == account passage, as I have never seen it before!
2. If I run the code as it is it says that old_followers = user_followers & (historical_followers_df["Follower_ID"] == ids) is this because ids in the filter is not specified?
If so, Is it correct to say that:
a. I have to amend the categorize_users function above. Adding: old = map(str, old_user_ids) and return old, gained, lost, returned b. In the for loop where I call a function I add a new variable ids so that it looks like: ids, new_ids, lost_ids, returned_ids = categorize_users(username, new_followers_df, historical_followers_df)
3. Finally, I think that there's a problem with new_ids as it says that TypeError: object of type 'map' has no len()
a. Is this a code problem? I can't really figure that part out.