Extracting number of rows using df.query() inside for-loops?

Question

I'm working on a data frame that has several branch_ids for each state and I would like to extract the number of rows for each of the id in each of the state. For this I'm using a for loop like so:

for branch in prim_data.loc[prim_data.state == 'AZ'].branch_id.unique(): print("{0} :: {1} samples".format(branch, len(prim_data.query("branch_id == branch and state == 'AZ'"))))

But executing this code is giving me an error and a long traceback:

UndefinedVariableError: name 'branch' is not defined

Is there a better way to achieve this? For info regarding the data frame, it looks like this:

segment branch_id state 1 1 AZ 1 3 AZ 2 7 AZ

There are a number of states but let's focus on just one state for the moment.

wong.lok.yin · Accepted Answer · 2020-01-21 09:37:46Z

UndefinedVariableError: name 'branch' is not defined

query() somehow do not accept variable in a string expression? Instead, you can use F-string to do it.

for branch in prim_data.loc[prim_data.state == 'AZ'].branch_id.unique(): print(len(prim_data.query(f"(branch_id == {branch}) & (state == 'AZ')")))

Is there a better way to achieve this? Yes, you can use group_by to do it.

prim_data.groupby(['state','branch_id']).count()

Collectives™ on Stack Overflow

Extracting number of rows using df.query() inside for-loops?

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related