I have a list of people with fields unique_id, sex, born_at (birthday) and I’m trying to group by sex and age bins, and count the rows in each segment.
Can’t figure out why I keep getting NaN or 0 as the output for each segment.
Here’s the latest approach I've taken...
Data sample:
|---------------------|------------------|------------------| | unique_id | sex | born_at | |---------------------|------------------|------------------| | 1 | M | 1963-08-04 | |---------------------|------------------|------------------| | 2 | F | 1972-03-22 | |---------------------|------------------|------------------| | 3 | M | 1982-02-10 | |---------------------|------------------|------------------| | 4 | M | 1989-05-02 | |---------------------|------------------|------------------| | 5 | F | 1974-01-09 | |---------------------|------------------|------------------| Code:
df[‘num_people’]=1 breakpoints = [18,25,35,45,55,65] df[[‘sex’,’born_at’,’num_people’]].groupby([‘sex’,pd.cut(df.born_at.dt.year, bins=breakpoints)]).agg(‘count’) I’ve tried summing as the agg type, removing NaNs from the data series, pivot_table using the same pd.cut function but no luck. Guessing there’s also probably a better way to do this that doesn’t involve creating a column of 1s.
Desired output would be something like this... 
The extra born_at column isn't necessary in the output and I'd also like the age bins to be 18 to 24, 25 to 34, etc. instead of 18 to 25, 25 to 35, etc. but I'm not sure how to specify that either.