21

I'm using Azure storage tables and I have data going in to the RowKey that has slashes in it. According to this MSDN page, the following characters are disallowed in both the PartitionKey and RowKey:

  • The forward slash (/) character

  • The backslash () character

  • The number sign (#) character

  • The question mark (?) character

  • Control characters from U+0000 to U+001F, including:

  • The horizontal tab (\t) character

  • The linefeed (\n) character

  • The carriage return (\r) character

  • Control characters from U+007F to U+009F

I've seen some people use URL encoding to get around this. Unfortunately there's a few glitches that can arise from this, such as being able to insert but unable to delete certain entities. I've also seen some people use base64 encoding, however this also can contain disallowed characters.

How can I encode my RowKey efficiently without running in to disallowed characters, or rolling my own encoding?

3
  • "but unable to delete certain entities" why would that be? Commented Jan 19, 2014 at 20:41
  • @usr It's a bug. Not sure why, but I've seen multiple reports of it. Commented Jan 19, 2014 at 22:36
  • See Also: Azure Table Storage RowKey restricted Character Patterns? Commented Apr 14, 2021 at 2:30

6 Answers 6

20

Updated 18-Aug-2020 for (new?) issue with '+' character in Azure Search. See comments from @mladenb below for background. Of note, the documentation page referenced does not exclude the '+' character.

When a URL is Base64 encoded, the only character that is invalid in an Azure Table Storage key column is the forward slash ('/'). To address this, simply replace the forward slash character with another character that is both (1) valid in an Azure Table Storage key column and (2) not a Base64 character. The most common example I have found (which is cited in other answers) is to replace the forward slash ('/') with the underscore ('_').

private static String EncodeUrlInKey(String url) { var keyBytes = System.Text.Encoding.UTF8.GetBytes(url); var base64 = System.Convert.ToBase64String(keyBytes); return base64.Replace('/','_').Replace('+','-'); } 

When decoding, simply undo the replaced character (first!) and then Base64 decode the resulting string. That's all there is to it.

private static String DecodeUrlInKey(String encodedKey) { var base64 = encodedKey.Replace('-','+').Replace('_', '/'); byte[] bytes = System.Convert.FromBase64String(base64); return System.Text.Encoding.UTF8.GetString(bytes); } 

Some people have suggested that other Base64 characters also need encoding. According to the Azure Table Storage docs this is not the case.

Sign up to request clarification or add additional context in comments.

9 Comments

Since this has a lot of view, I just wanted to add that it is a lot simpler just to convert from/to base58 instead
Valid point if you're already using a library with base58 routines. The OP wished to avoid "rolling his own" and this answer makes no assumptions. if you do go down this road, it's probably a good idea to document which base58 encoding is being used as there is more than one.
Instead of going all the way to base64 while encoding, I think I might prefer to dump your UTF8 keyBytes variable out as a HEX String. That way, I don't have to mess with the forward slash you mention.
@ShawnEary - Sure, if you don't mind the longer keys. BitConverter is one way to do so. You can find other options in the answers here: stackoverflow.com/questions/623104/byte-to-hex-string
@MladenB - Thank you for the follow up. Sample code updated.
|
13

I ran into the same need.

I wasn't satisfied with Base64 encoding because it turns a human-readable string into an unrecognizable string, and will inflate the size of strings regardless of whether they follow the rules (a loss when the great majority of characters are not illegal characters that need to be escaped).

Here's a coder/decoder using '!' as an escape character in much the same way one would traditionally use the backslash character.

public static class TableKeyEncoding { // https://msdn.microsoft.com/library/azure/dd179338.aspx // // The following characters are not allowed in values for the PartitionKey and RowKey properties: // The forward slash(/) character // The backslash(\) character // The number sign(#) character // The question mark (?) character // Control characters from U+0000 to U+001F, including: // The horizontal tab(\t) character // The linefeed(\n) character // The carriage return (\r) character // Control characters from U+007F to U+009F public static string Encode(string unsafeForUseAsAKey) { StringBuilder safe = new StringBuilder(); foreach (char c in unsafeForUseAsAKey) { switch (c) { case '/': safe.Append("!f"); break; case '\\': safe.Append("!b"); break; case '#': safe.Append("!p"); break; case '?': safe.Append("!q"); break; case '\t': safe.Append("!t"); break; case '\n': safe.Append("!n"); break; case '\r': safe.Append("!r"); break; case '!': safe.Append("!!"); break; default: if (c <= 0x1f || (c >= 0x7f && c <= 0x9f)) { int charCode = c; safe.Append("!x" + charCode.ToString("x2")); } else { safe.Append(c); } break; } } return safe.ToString(); } public static string Decode(string key) { StringBuilder decoded = new StringBuilder(); int i = 0; while (i < key.Length) { char c = key[i++]; if (c != '!' || i == key.Length) { // There's no escape character ('!'), or the escape should be ignored because it's the end of the array decoded.Append(c); } else { char escapeCode = key[i++]; switch (escapeCode) { case 'f': decoded.Append('/'); break; case 'b': decoded.Append('\\'); break; case 'p': decoded.Append('#'); break; case 'q': decoded.Append('?'); break; case 't': decoded.Append('\t'); break; case 'n': decoded.Append("\n"); break; case 'r': decoded.Append("\r"); break; case '!': decoded.Append('!'); break; case 'x': if (i + 2 <= key.Length) { string charCodeString = key.Substring(i, 2); int charCode; if (int.TryParse(charCodeString, NumberStyles.HexNumber, NumberFormatInfo.InvariantInfo, out charCode)) { decoded.Append((char)charCode); } i += 2; } break; default: decoded.Append('!'); break; } } } return decoded.ToString(); } } 

Since one should use extreme caution when writing your own encoder, I have written some unit tests for it as well.

using Xunit; namespace xUnit_Tests { public class TableKeyEncodingTests { const char Unicode0X1A = (char) 0x1a; public void RoundTripTest(string unencoded, string encoded) { Assert.Equal(encoded, TableKeyEncoding.Encode(unencoded)); Assert.Equal(unencoded, TableKeyEncoding.Decode(encoded)); } [Fact] public void RoundTrips() { RoundTripTest("!\n", "!!!n"); RoundTripTest("left" + Unicode0X1A + "right", "left!x1aright"); } // The following characters are not allowed in values for the PartitionKey and RowKey properties: // The forward slash(/) character // The backslash(\) character // The number sign(#) character // The question mark (?) character // Control characters from U+0000 to U+001F, including: // The horizontal tab(\t) character // The linefeed(\n) character // The carriage return (\r) character // Control characters from U+007F to U+009F [Fact] void EncodesAllForbiddenCharacters() { List<char> forbiddenCharacters = "\\/#?\t\n\r".ToCharArray().ToList(); forbiddenCharacters.AddRange(Enumerable.Range(0x00, 1+(0x1f-0x00)).Select(i => (char)i)); forbiddenCharacters.AddRange(Enumerable.Range(0x7f, 1+(0x9f-0x7f)).Select(i => (char)i)); string allForbiddenCharacters = String.Join("", forbiddenCharacters); string allForbiddenCharactersEncoded = TableKeyEncoding.Encode(allForbiddenCharacters); // Make sure decoding is same as encoding Assert.Equal(allForbiddenCharacters, TableKeyEncoding.Decode(allForbiddenCharactersEncoded)); // Ensure encoding does not contain any forbidden characters Assert.Equal(0, allForbiddenCharacters.Count( c => allForbiddenCharactersEncoded.Contains(c) )); } } } 

5 Comments

Yeah I think not losing the readability of keys is really important. Surprised more people don't do this or URLencode and then fix up anything that doesn't work with that instead of Base64-encode and fix up which seems to be the general approaches here.
Something to be careful of here is that the Microsoft Azure Storage Explorer as of v0.8.3 isn't able to query for objects via PartitionKey eq if you have a ! character in the PartitionKey. Under the hood, things appear to be working fine but the equality operator doesn't work properly for some reason. I reported this as a bug, but just keep it in mind if you use this solution and that specific character.
I suspect it's because natural keys only make sense in a subset of cases, and readability usually/often does not benefit surrogate keys.
The code for '\t', '\n' , '\r' should be unnecessary, because these characters are handled by the range checks anyway. OK, it increases readability compared to the encoding of the ranges.
In Decode method, if hexadecimal value is incorrect, e.g. when encoded string contains !xhh, we ignore all of the four characters. We should have same behavior when escapeCode is unknown (e.g. !h) instead of appending character ! to the decoded string.
2

How about URL encode/decode functions. It takes care of '/', '?' and '#' characters.

string url = "http://www.google.com/search?q=Example"; string key = HttpUtility.UrlEncode(url); string urlBack = HttpUtility.UrlDecode(key); 

Comments

1

see these links https://www.rfc-editor.org/rfc/rfc4648#page-7 Code for decoding/encoding a modified base64 URL (see also second answer: https://stackoverflow.com/a/1789179/1094268)

I had the problem myself. These are my own functions I use for this now. I use the trick in the second answer I mentioned, as well as changing up the + and / which are incompatible with azure keys that may still appear.

private static String EncodeSafeBase64(String toEncode) { if (toEncode == null) throw new ArgumentNullException("toEncode"); String base64String = Convert.ToBase64String(Encoding.UTF8.GetBytes(toEncode)); StringBuilder safe = new StringBuilder(); foreach (Char c in base64String) { switch (c) { case '+': safe.Append('-'); break; case '/': safe.Append('_'); break; default: safe.Append(c); break; } } return safe.ToString(); } private static String DecodeSafeBase64(String toDecode) { if (toDecode == null) throw new ArgumentNullException("toDecode"); StringBuilder deSafe = new StringBuilder(); foreach (Char c in toDecode) { switch (c) { case '-': deSafe.Append('+'); break; case '_': deSafe.Append('/'); break; default: deSafe.Append(c); break; } } return Encoding.UTF8.GetString(Convert.FromBase64String(deSafe.ToString())); } 

2 Comments

According to the Azure docs the '+' character is not invalid in an Azure Table key field.
@JasonWeber It's a while since I originally made this, but I'm pretty sure I remember reading that is (or was) an undocumented exception.
1

If it is just the slashes, you can simply replace them on writing to the table with another character, say, '|' and re-replace them on reading.

Comments

1

What I have seen is that although alot of non-alphanumeric characters are technically allowed it doesn't really work very well as partition and row key.

I looked at the answears already given here and other places and wrote this: https://github.com/JohanNorberg/AlphaNumeric

Two alpha-numeric encoders.

If you need to escape a string that is mostly alphanumeric you can use this:

AlphaNumeric.English.Encode(str); 

If you need to escape a string that is mostly not alphanumeric you can use this:

AlphaNumeric.Data.EncodeString(str); 

Encoding data:

var base64 = Convert.ToBase64String(bytes); var alphaNumericEncodedString = base64 .Replace("0", "01") .Replace("+", "02") .Replace("/", "03") .Replace("=", "04"); 

But, if you want to use for example an email adress as a rowkey you would only want to escape the '@' and '.'. This code will do that:

 char[] validChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ3456789".ToCharArray(); char[] allChars = rawString.ToCharArray(); StringBuilder builder = new StringBuilder(rawString.Length * 2); for(int i = 0; i < allChars.Length; i++) { int c = allChars[i]; if((c >= 51 && c <= 57) || (c >= 65 && c <= 90) || (c >= 97 && c <= 122)) { builder.Append(allChars[i]); } else { int index = builder.Length; int count = 0; do { builder.Append(validChars[c % 59]); c /= 59; count++; } while (c > 0); if (count == 1) builder.Insert(index, '0'); else if (count == 2) builder.Insert(index, '1'); else if (count == 3) builder.Insert(index, '2'); else throw new Exception("Base59 has invalid count, method must be wrong Count is: " + count); } } return builder.ToString(); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.