I need to find out how many of the first N rows of a dataframe make up (just over) 50% of the sum of values for that column.
Here's an example:
import pandas as pd import numpy as np df = pd.DataFrame(np.random.rand(10, 1), columns=list("A")) 0 0.681991 1 0.304026 2 0.552589 3 0.716845 4 0.559483 5 0.761653 6 0.551218 7 0.267064 8 0.290547 9 0.182846 therefore
sum_of_A = df["A"].sum() 4.868260213425804
and with this example I need to find, starting from row 0, how many rows I need to get a sum of at least 2.43413 (approximating 50% of sum_of_A).
Of course I could iterate through the rows and sum and break when I get over 50%, but is there a more concise/Pythonic/efficient way of doing this?