CPUs run machine code. A C compiler can make machine code for you from C source, vs. with assembly language you're making all the important decisions yourself.
The most important factor is that Bubble Sort is a bad algorithm you should basically never use, especially when you care about performance (Why bubble sort is not efficient?). (Only sometimes if you care about tiny code size, like code golf Sort an Integer List).
Other than that, if you know exactly what you're doing in asm, it will never be slower than C, and you can control micro-optimization choices like the ones that led to GCC making a very slower bubble sort when trying to auto-vectorize. See Bubble sort slower with -O3 than -O2 with GCC for the details of that case.
But if you don't know asm and CPU architecture in that level of detail, you're unlikely to improve on compiler output. There is no "generally" that's true across all CPU (micro-)architectures and all programmer skill-levels.
Related: Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? - another example of the kinds of beginner mistakes that can make your asm even slower than a debug build of a C program.
If you cared about making a sort function run fast, the first thing you'd change would be the algorithm, to Insertion Sort or something. Or a SIMD sorting network, especially if you're writing in non-portable assembly language in the first place.
So this question only makes sense if there's something forcing you to use a bad algorithm like Bubble Sort. Or if you aren't aiming for performance in the first place and are making arbitrary choices of algorithm. Or don't know that Bubble Sort is slow, in which case we should definitely make pessimistic assumptions about programmer skill at assembly language. (Assembly language performance is even more sensitive than most languages to programmer skill at optimizing.)
But if someone forced you to spend time optimizing Bubble Sort, with asm you could probably optimize the case where an element bubbles a long way, maybe avoiding a store/reload as part of a loop-carried dependency chain. But you could probably do the same thing in C with a tmp variable, so IDK whether to count it.
You'd only think of doing that in C if you were aware of the asm / CPU-architecture reasons for it, but often you can get a compiler to generate the efficient asm you want by changing the C. That's usually best because it's still portable C.