2

I have a df which is the concat of two identically structured df's, the first is Orders and the second is Cancels. There are more than 20,000 rows in Orders and a small number of Cancels that have a corresponding OrderNo & ItemCode. I have made the canceled quantities negative, so that while grouping the df by both OrderNo & ItemCode I can sum the quantity fields with agg, thus giving me the actual quantity shipped which compensates for canceled orders.

Below is my dataframe:

 OrderNo OrderDate LineNo ClientNo ItemCode QtyOrdered QtyShipped 0 528758 1/3/2017 1 1358538 111931 70 70 1 528791 1/3/2017 10 1254798 110441 300 300 2 528791 1/3/2017 1 1254798 1029071 10 10 3 528791 1/3/2017 2 1254798 1033341 10 10 4 528791 1/3/2017 8 1254798 1040726 15 15 ... ... ... ... ... ... ... ... 28344 537667 2/6/2017 12 43823870 10137992 0 -2 28345 537771 2/7/2017 5 1276705 1041106 0 -4 28346 539524 2/13/2017 6 1254798 1038323 0 -10 28347 542362 2/23/2017 11 1254612 1041108 0 -2 28348 542835 2/23/2017 13 1255235 10137993 0 -5 28349 rows × 7 columns 

After running:

ActualOrders = PreActualOrders.groupby(['OrderNo','ItemCode']).agg({'QtyOrdered': 'sum', 'QtyShipped': 'sum'}).reset_index() 

I get my desired result but i lose all other columns in the DF.

Result sample below:

 OrderNo ItemCode QtyOrdered QtyShipped 28255 543734 1038324 1 1 28256 543734 10137992 1 1 28257 543734 10137993 1 1 28258 543735 1041106 1 1 28259 543735 1041108 1 1 28260 543735 10135359 1 1 

What do I need to add inorder to keep all columns in the original df?

All values in those other columns match as they are corresponding cancels or the order.

Thank you,

MTH

2 Answers 2

2

I was able to get the desired result by including the other columns in the agg funtion with 'first' while the 'QtyOrdered' & 'QtyShipped' are subject to 'sum'.

ActualOrders = PreActualOrders.groupby(['OrderNo','ItemCode']).agg({'OrderDate': 'first', 'LineNo': 'first', 'ClientNo': 'first', 'QtyOrdered': 'sum', 'QtyShipped': 'sum' }).reset_index()

Yeilds my desired reult of:

 OrderNo ItemCode OrderDate LineNo ClientNo QtyOrdered QtyShipped 28255 543734 1038324 2/27/2017 3 1254787 1 1 28256 543734 10137992 2/27/2017 1 1254787 1 1 28257 543734 10137993 2/27/2017 2 1254787 1 1 28258 543735 1041106 2/27/2017 4 1816460 1 1 28259 543735 1041108 2/27/2017 3 1816460 1 1 28260 543735 10135359 2/27/2017 2 1816460 1 1 28261 543735 10137993 2/27/2017 1 1816460 1 1 

The output example doesn't show any difference between Qty ordered and shipped because the number of matching cancels is very small. The rows which have a corresponding cancel are correctly adjusted.

Sign up to request clarification or add additional context in comments.

Comments

0

If I understood you correctly, you could maybe try another approach without groupby. Something similar to this:

orders = [["123", "1", 10], ["1234", "2", 100], ["12345", "3", 15]] cancels = [["123", "1", 10]] df_orders = pd.DataFrame(orders, columns=["OrderNo", "ItemCode", "Amount"]) df_cancels = pd.DataFrame(cancels, columns=["OrderNo", "ItemCode", "Amount"]) merged = df_orders.merge(df_cancels, how="left", on=["OrderNo", "ItemCode"], suffixes=["_orders", "_cancels"]) merged["Amount_cancels"] = merged["Amount_cancels"].fillna(0) print("Before substract cancels") print(merged) merged["Amount_orders"] = merged["Amount_orders"] - merged["Amount_cancels"] print("After substract cancels") print(merged) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.