4

I'm working with a string of bytes (which can be anywhere between 10kb and 3MB) and I need to filter out approximately 16 bytes (replacing them with other bytes)

At the moment I have a function a bit like this..

BYTE_REPLACE = { 52: 7, # first number is the byte I want to replace 53: 12, # while the second number is the byte I want to replace it WITH } def filter(st): for b in BYTE_REPLACE: st = st.replace(chr(b),chr(BYTE_REPLACE[b])) return st 

(Byte list paraphrased for the sake of this question)

Using map resulted in an execution time of ~.33s, while this results in a 10x faster time of ~.03s (Both performed on a HUGE string, larger than 1.5MB compressed).

While any performance gains would be considerably negligible, is there a better way of doing this?

(I am aware that it would be much more optimal to store the filtered string. This isn't an option, though. I'm fooling with a Minecraft Classic server's level format and have to filter out bytes that certain clients don't support)

6
  • 1
    How are you reading in the string? Is it from the file system, from a URL, is it already all in memory? That will probably have a big influence on the most optimal method. Commented Sep 17, 2013 at 3:20
  • It's all available in memory (and passed straight to the function in every case) There is a few cases where a single byte will be passed to this function - this is negligible enough that I'm not bothered by it. Commented Sep 17, 2013 at 3:20
  • How many pairs are in BYTE_REPLACE? Just 2? Commented Sep 17, 2013 at 3:22
  • 16 usually. With the full list, and a loadtest level that is fairly large (512*512*256 bytes uncompressed), it takes .03s to do the full replacement (with str.replace) Commented Sep 17, 2013 at 3:24
  • 2
    string.maketrans and string.translate may help here. Commented Sep 17, 2013 at 3:25

3 Answers 3

7

Use str.translate:

Python 3.x

def subs(st): return st.translate(BYTE_REPLACE) 

Example usage:

>>> subs('4567') '\x07\x0c67' 

Python 2.x

str.translate (Python 2)

import string k, v = zip(*BYTE_REPLACE.iteritems()) k, v = ''.join(map(chr, k)), ''.join(map(chr, v)) tbl = string.maketrans(k, v) def subs(st): return st.translate(tbl) 
Sign up to request clarification or add additional context in comments.

1 Comment

Using str.translate was approximately twice as fast as the previous way I was doing it.
4

Look up the translate() method on strings. That allows you to do any number of 1-byte transformations in a single pass over the string. Use the string.maketrans() function to build the translation table. If you usually have 16 pairs, this should run about 16 times faster than doing 1-byte replacements 16 times.

Comments

0

In your current design, String.replace() is being called on the string n times, for each pair. While its most likely an efficient algorithm, on a 3MB string it might slow down.

If the string is already contained in memory by the time this function is called, I'd wager that the most efficient way would be:

BYTE_REPLACE = { 52: 7, # first number is the byte I want to replace 53: 12, # while the second number is the byte I want to replace it WITH } def filter(st): st = list(st) # Convert string to list to edit in place :/ for i,s in enumerate(st): #iterate through list if ord(s) in BYTE_REPLACE.keys(): s[i]=chr(BYTE_REPLACE[ord(b)]) return "".join(st) #return string 

There is a large operation to create a new list at the start, and another to convert back to a string, but since python strings are immutable in your design a new string is made for each replacement.

This is all based on conjecture, and could be wrong. You'd want to test it with your actual data.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.