05AB1E, 28 bytes
žMSkþß©di®_i1ì}.γžMÃ}¦J…shmì Try it online or verify all test cases.
Explanation:
žMS # Push a list of vowels: ["a","e","i","o","u"] k # Pop and get the first indices of those vowels in the # (implicit) string (or -1 if the vowel isn't present in the # input-string) þ # Remove all those -1s ß # Push the minimum remaining index, or "" if none are left © # Store this in variable `®` (without popping) di # Pop, and if it's a (non-negative) integer: ®_i } # If `®` is equal to 0 (thus the input starts with a vowel): 1ì # Prepend a "1" in front of the (implicit) input-string .γ # Group the (potentially modified) input-string by: žMÃ # Keeping vowels }¦ # After the group-by: remove the first item J # Join the other parts back to a string …shmì # And prepend "smh" # (after which it is output implicitly as result) # (implicit else) # (implicitly output the implicit input-string instead)