In short, the reason is that SumCommand's (temporary) definition won't automatically get distributed to the parallel kernels because now they liveSumCommand lives in the package`Private` context, not Global` (after correcting a small mistake, see below). This means that SumCommand won't get evaluated to Sum on the subkernels. It gets returned as-is to the main kernel, where now SumCommand does have a definition, gets evaluated to Sum, which in turn gets evaluated to the desired result. But all the evaluation happens on the main kernel.
Aside: Note that Begin["Private`"] should be Begin["`Private`"] so that private symbols will go into package`Private` and not into Private` .
When you put the function into a package, itsthe context changes whichof all the auxiliary symbols (such as SumCommand and TableCommand) change. This prevents the distribution mechanism from kicking in. Package symbols, residingwhich reside in contexts other than Global` , do not get distributed by default. But This is to prevent the distribution of package symbol definitions to subkernels, which would break packages which must also do initialization (such as loading LibraryFunctions) in addition to issuing definitions. Instead packages should be properly loaded on each subkernel, which can be automated with ParallelNeeds.
Unfortunately I do not fully understand the distribution rules for contexts though ... you can read more at DistributedContexts and links within that page.
This ideatheory that the problem is that SumCommand doesn't get distributed can be verified by adding DistributeDefinitions[SumCommand] right after TableCommand = ParallelTable; SumCommand = Sum;. This will make it run fast again (but it is not a good workaround, see below).
TheOne problem with your use ofthe way Block is used here is that Block won't have any effect across kernels. It only workworks on the main kernel. Thus if we simply insert DistributeDefinitions[SumCommand] inside of the Block body, the definition will get distributed to subkernelthe subkernels, but it won't get cleared on the subkernels when the Block finishes. Instead it will persist even after function finishes. You can verify this with ParallelEvaluate[package`Private`SumCommand].
Instead I suggest never sending the symbol SumCommand to the subkernel in the first place. Just send Sum instead. One way to achieve this is with a With-definition (instead of a Block-definition), which does a direct replacement of SumCommand within the body of With.
(To avoid the red colouring you might consider using a different name for the With variable.)
This version is robust and runs fast.