2

I need to generate a uuid in python.

The application I want to interface with uses 22-characters uuids. These seem to be generated by generating a char-32 uuid and base64-encoding it.

Now, if I try to do that in python, my (base64 encoded) uuid always has 24 characters.

For the moment I just "cut off" the encoded uuid after 22 chars. Not sure if this is valid though. Chances are I'll get duplicates this way. Am I maybe making a mistake in generating my uuid?

My code:

import uuid import base64 my_uuid = new_occ_id = base64.urlsafe_b64encode(uuid.uuid4().bytes) my_uuid_22 = new_occ_id = my_uuid[0:22] print(my_uuid) print(my_uuid_22) 

...yields (!!Corrected output!!):

b'1K-HAjUjEemzDxwbtRxjwA==' b'1K-HAjUjEemzDxwbtRxjwA' 

Sample UUID from the application:

051MZjd97kcBgtZiEH}IvW 
3
  • Please watch your code format. You need to put some effort in a question to get answers. Commented Feb 20, 2019 at 11:15
  • 4
    A } is not valid in base 64, so the program must use something different than base 64. Commented Feb 20, 2019 at 11:21
  • I suggest using shortuuid which uses base57 instead of base64. Base57 natively produces 11 characters for each 64 bytes, without padding, and is more url friendly. Commented Apr 1, 2022 at 16:30

1 Answer 1

1

Are you sure that is the output from that code? it does not seem to be, the first string has 22 bytes (not 24) and the second has no terminating quote. Anyway...

I do not know about the implementation of 22-byte compressed UUID, but based on how base64 encoding works, if that is indeed used, I would say that yes, you can simply drop the last two characters.

The last two base64-chars will always be two padding characters == when the input is 16 bytes, since base64 is encoding 3 bytes at the time (15 in yields 20 out), leaving one lone input byte that needs padding. Decoders sometimes requires the padding to be present, but sometimes ignores them, since they can be calculated, and does not contain any actual data.

However, as someone said, the "}" is not used in the most common base64 characterset, and not in the urlsafe one either. It is usually just the last two chars that differ, usually they are +/ or in urlsafe -_ but your implementation seems to use { and something more. Wild guess is } or | but you will have to look through existing compressed uuids to see if you find another char outside of the A-Za-z0-9 chars, unless you can find a specification for your target application.

Sign up to request clarification or add additional context in comments.

1 Comment

Many thanks! 1) Made a mistake in copy/pasting my output. Currected that. 2) I didn't realize the leading b and the quotes were not part of the actual value, but instead are enclosing the actual printed value (an artifact of print so to say). Now everything is clear... Ignoring the " b' ...' " I actually get 24 characters with the trailing "==" and 22 characters without - so I can safely cut off the "==".

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.