I have a df which is the concat of two identically structured df's, the first is Orders and the second is Cancels. There are more than 20,000 rows in Orders and a small number of Cancels that have a corresponding OrderNo & ItemCode. I have made the canceled quantities negative, so that while grouping the df by both OrderNo & ItemCode I can sum the quantity fields with agg, thus giving me the actual quantity shipped which compensates for canceled orders.
Below is my dataframe:
OrderNo OrderDate LineNo ClientNo ItemCode QtyOrdered QtyShipped 0 528758 1/3/2017 1 1358538 111931 70 70 1 528791 1/3/2017 10 1254798 110441 300 300 2 528791 1/3/2017 1 1254798 1029071 10 10 3 528791 1/3/2017 2 1254798 1033341 10 10 4 528791 1/3/2017 8 1254798 1040726 15 15 ... ... ... ... ... ... ... ... 28344 537667 2/6/2017 12 43823870 10137992 0 -2 28345 537771 2/7/2017 5 1276705 1041106 0 -4 28346 539524 2/13/2017 6 1254798 1038323 0 -10 28347 542362 2/23/2017 11 1254612 1041108 0 -2 28348 542835 2/23/2017 13 1255235 10137993 0 -5 28349 rows × 7 columns After running:
ActualOrders = PreActualOrders.groupby(['OrderNo','ItemCode']).agg({'QtyOrdered': 'sum', 'QtyShipped': 'sum'}).reset_index() I get my desired result but i lose all other columns in the DF.
Result sample below:
OrderNo ItemCode QtyOrdered QtyShipped 28255 543734 1038324 1 1 28256 543734 10137992 1 1 28257 543734 10137993 1 1 28258 543735 1041106 1 1 28259 543735 1041108 1 1 28260 543735 10135359 1 1 What do I need to add inorder to keep all columns in the original df?
All values in those other columns match as they are corresponding cancels or the order.
Thank you,
MTH