Here goes a self answer.
To be honest, I almost decide to delete my post right after I posted. Because I naively thought unpacking may be unavoidable. Luckily, I keep this post, and this problem becomes more interesting. Thank you so much to @Carl Woll, @Sjoerd Smit and @Henrik Schumacher for providing great answers. Learned a lot.
Now, I summarize and add some benchmarks to other answers.
First, two related post on flatten or join pure list(not association)
https://mathematica.stackexchange.com/a/75592/4742 https://mathematica.stackexchange.com/a/184578/4742
these two post suggest us, we need to differentiate three cases:
- completely packed array
- completely unpacked array
- sublist packed array(unpacked at first level)
Briefly,
- for completely packed array,
Flatten is much faster to Apply[Join], because Apply unpacked first level. - for completely unpacked array, 'Apply[Join]` is faster, but timing is much closer than completely packed.
- for sublist packed array,
Apply[Join] is much faster to Flatten
Before benchmark, we need routines to generated three kind of array
ClearAll[genCompletelyPacked]; genCompletelyPacked[sublistLen_, totalLen_] := ConstantArray[Range[sublistLen], totalLen]; ClearAll[genCompletelyUnpacked]; genCompletelyUnpacked[sublistLen_, totalLen_] := Developer`FromPackedArray@genCompletelyPacked[sublistLen, totalLen]; ClearAll[genSublistPacked]; genSublistPacked[sublistLen_, totalLen_] := Developer`ToPackedArray /@ genCompletelyUnpacked[sublistLen, totalLen];
and define below function (I add Catenate suggested by Mr.Wizard in other post)
f = Flatten[#, 1] &; g = Join @@ # &; h = Catenate[#] &; Needs["GeneralUtilities`"];
Below for every case, I give two benchmark: one for small sublist, the other for long sublist.
complete packed array
BenchmarkPlot[{f, g, h}, genCompletelyPacked[10, #] &, PowerRange[10, 10^3, 2], "IncludeFits" -> True]

BenchmarkPlot[{f, g, h}, genCompletelyPacked[1000, #] &, PowerRange[10, 10^3, 2], "IncludeFits" -> True]

The jumping maybe due to memory issue as suggested by Michael E2.
completely unpacked array
BenchmarkPlot[{f, g, h}, genCompletelyUnpacked[10, #] &, PowerRange[10, 10^3, 2], "IncludeFits" -> True]

BenchmarkPlot[{f, g, h}, genCompletelyUnpacked[1000, #] &, PowerRange[10, 10^3, 2], "IncludeFits" -> True]

sublist packed array
BenchmarkPlot[{f, g, h}, genSublistPacked[10, #] &, PowerRange[10, 10^3, 2], "IncludeFits" -> True]

BenchmarkPlot[{f, g, h}, genSublistPacked[1000, #] &, PowerRange[10, 10^3, 2], "IncludeFits" -> True]

Now, we return to Association case:
As point out by Carl Woll, Merge simply put list together, and resulting sublist packed case that is unpacked at the first level(In this sense, unpacking is unavoidable in Merge, because first level is unpacked). So we can envisage, Apply[Join] will be much faster, however, we will see things getting a little different.
Sjoerd Smit suggested quite novel GeneralUtilitiesAssociationTransposewhich is even better, because it doesn't unpack at all, and we can give much hopeGeneralUtilitiesAssociationTranspose will be much faster. However, we will see this is not always the case.
First, we define routines to generate list of Association with values are packed list.
ClearAll[genAssoc]; genAssoc[len_, n_] := Table[AssociationThread[{1, 2, 3} -> RandomReal[1., {3, len}]], n];
and several functions
f1 = Merge[#, Flatten] &; f2 = Merge[#, Apply[Join]] &; f3 = Merge[#, Catenate] &; f4 = Flatten /@ GeneralUtilities`AssociationTranspose[#] &; BenchmarkPlot[{f1, f2, f3, f4}, genAssoc[10, #] &, PowerRange[10, 10^4, 2], "IncludeFits" -> True]

BenchmarkPlot[{f1, f2, f3, f4}, genAssoc[1000, #] &, PowerRange[10, 10^4, 2], "IncludeFits" -> True]

you can clearly see some peculiarity as I pointed out previous. For example,
AssociationTranspose is only good for small sublist. - For small sublist,
Flatten becomes comparable to Apply[Join] when number of association becomes large.
I have no answer, waiting other to explain.