As pointed out by @szabolcs part of the problem is associated with the distribution over the kernels. But I think that is not only problem.
We can generalize the foo function to run in parallel and achieve the same performance as the unpackage code.
The code is much slower in the packaged case, because it is different and lot more complex. If you create a function with the same instructions as the unpackaged code, you should get the equivalent timing and the same results.
BeginPackage["package`"]; foo::usage = "foo[x]"foo[x, test] is a function to calculate stuff"; Begin["Private`"]; foo[ x_Real, test_:False ] := If[ test == True , ParallelTable[ Sum[ BesselJ[0, 10^-9 k]/(n + x^k), {k, 0, 10000} ] , {n, 0, 12} ] , Table[ Sum[ BesselJ[0, 10^-9 k]/(n + x^k), {k, 0, 10000} ] , {n, 0, 12} ] ] End[]; EndPackage[]; ClearSystemCache[] res1 = AbsoluteTiming[ Table[ Sum[ BesselJ[0, 10^-9 k]/(n + 1.6^k) , {k, 0, 10000} ] , {n, 0, 12}] ]; ClearSystemCache[] res3res3a = AbsoluteTiming[AbsoluteTiming[foo[1.6,True]]; ClearSystemCache[] res3b foo[1= AbsoluteTiming[foo[1.6]6,False]]; ClearSystemCache[] res3c ];= AbsoluteTiming[foo[1.6]]; { res1[[1]]res1[[ 1 ]], res3[[1]]res3a[[ 1 ]], res3b[[ 1 ]], res3c[[ 1 ]] } res1[[2]]==res3[[2]]res1[[ 2 ]] == res3a[[ 2 ]] == res3b[[ 2 ]] == res3c[[ 2 ]] {4.352175524, 1.92163, 4.2368838698, 4.38055} True