I am thinking about it not the first time, namely if I have a variable that I want to convert later to the variable dummy (cities in this case), should I delete lines that occur less often than N times?
For example, the value of new york has occurred 400+ times but there are cities that only appeared once or twice.
What should I do with values that have appeared only once or twice?
print(df[cities].value_counts()) Output:
city1 424 city2 107 city3 35 city4 33 city5 28 city6 24 city7 15 city8 7 city9 4 city10 3 city11 2 city12 1 city13 1 city14 1 city15 1 city16 1 city17 1