0

I'm working on a data frame that has several branch_ids for each state and I would like to extract the number of rows for each of the id in each of the state. For this I'm using a for loop like so:

for branch in prim_data.loc[prim_data.state == 'AZ'].branch_id.unique(): print("{0} :: {1} samples".format(branch, len(prim_data.query("branch_id == branch and state == 'AZ'")))) 

But executing this code is giving me an error and a long traceback:

UndefinedVariableError: name 'branch' is not defined 

Is there a better way to achieve this? For info regarding the data frame, it looks like this:

segment branch_id state 1 1 AZ 1 3 AZ 2 7 AZ 

There are a number of states but let's focus on just one state for the moment.

1 Answer 1

2

UndefinedVariableError: name 'branch' is not defined

query() somehow do not accept variable in a string expression? Instead, you can use F-string to do it.

for branch in prim_data.loc[prim_data.state == 'AZ'].branch_id.unique(): print(len(prim_data.query(f"(branch_id == {branch}) & (state == 'AZ')"))) 

Is there a better way to achieve this? Yes, you can use group_by to do it.

prim_data.groupby(['state','branch_id']).count() 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.