How are integer operations implemented in Python?

Question

This might be a very broad question. I wanted to create a way to represent strings that would support O(1) append, O(1) append to the left, and O(1) comparison while maintaining O(N) slicing and O(1) indexing. My idea is that I would store unicode characters as their unicode number, and use mathematical operations to add and delete characters from either end. I called it NumString:

class Numstring: def __init__(self, init_str=""): self.num = 0 self.length = 0 for char in init_str: self.append(char) def toString(self, curr=None): if curr is None: curr = self.num retlst = [] while curr: char = chr(curr % 10000) retlst.append(char) curr = curr // 10000 return "".join(retlst) def appendleft(self, char): assert len(char) == 1 self.num *= 10000 self.num += ord(char) self.length += 1 def append(self, char): self.num += ord(char)*(10000**self.length) self.length += 1 def pop(self): # self.num = self.num % (10000**self.length-1) self.num = self.num % 10000**(self.length-1) self.length -= 1 def popleft(self): self.num = self.num // 10000 self.length -= 1 def compare(self, numstring2): return self.num == numstring2.num def size(self): return self.length def get(self, start, end=None): if start >= self.length: raise IndexError("index out of bounds") if end and end > self.length + 1: raise IndexError("index out of bounds") if end is not None: if start == end: return "" curr = self.num curr = curr // (10000 ** start) curr = curr % 10000**(end) return self.toString(curr) else: curr = self.num curr = curr // (10000 ** start) char = chr(curr % 10000) return char

Here is the profiling code:

import string import time import random from NumString import Numstring import numpy as np import matplotlib.pyplot as plt if __name__ == '__main__': numstring_times = [] regstring_times = [] numstring = Numstring("abc") regstring = "abc" for i in range(0, 10000): char_to_add = random.choice(string.ascii_letters) start_time = time.time() numstring.append(char_to_add) end_time = time.time() numstring_times.append(end_time-start_time) start_time = time.time() regstring += char_to_add end_time = time.time() regstring_times.append(end_time-start_time) plt.plot(numstring_times, label="Numstring") plt.plot(regstring_times, label="Regular strings") plt.legend() plt.show()

It works the way I want it to, but when I tested the time it takes to append to the string and my Numstring class performed way worse. I understood that there would be overhead, but I didn't expect the overall trend to be way worse than concatenating strings in python.

Curiously, comparison times are actually better for Numstrings than regular strings:

I've realized I don't really understand how integer operations in Python are implemented and what the limitations of infinite integer precision are. Could somebody help me understand what I'm missing?

provide the profiling code...

juanpa.arrivillaga
– juanpa.arrivillaga

2020-09-24 23:59:13 +00:00
Commented Sep 24, 2020 at 23:59 — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Sep 24, 2020 at 23:59
@juanpa.arrivillaga just did

Alex Peniz
– Alex Peniz

2020-09-25 00:02:14 +00:00
Commented Sep 25, 2020 at 0:02 — Alex Peniz
– Alex Peniz, Commented Sep 25, 2020 at 0:02

2 revs · Accepted Answer · 2020-09-26 00:29:28Z

5

ints are essentially represented as strings of digits (in a higher base than 10), and most operations on them, including addition and multiplication, take time proportional to the number of digits they contain or worse. Because the base you’re using (10000) doesn’t match the base ints use, operations like multiplying or dividing by the base become a complex operation instead of a simple copying of bytes of memory. So it’s pretty much a less efficient reimplementation of the operations strings already do (which is what you found by benchmarking) and it doesn’t support all of Unicode (which goes up to 0x10FFFF, not 10000).

Hint for a data structure that actually has the properties you’re looking for, though: a deque based on a circular buffer.

edited Sep 26, 2020 at 0:29

community wiki

2 revs
Ry-

Sign up to request clarification or add additional context in comments.

10 Comments

Alex Peniz Over a year ago

I thought of using a deque, but I also wanted O(1) comparison. In my understanding, to compare two strings you have to go through all the characters and see if they're equal. I wanted to use numbers because I thought it was an O(1) operation to compare. I guess to a computer, there's no way to actually get O(1) comparison, since you'll always need to check the bytes

Ry- Over a year ago

@AlexPeniz: It isn’t an O(1) operation to compare arbitrary-size ints. Like the rest of the operations, it’s O(digits). Algorithms often work with integers of bounded size (and integers in other languages are often bounded, even), whose operations can be considered O(1) because of that.

Alex Peniz Over a year ago

got it, that makes a lot of sense. Sad, I spent a fair bit of time on this haha

Ry- Over a year ago

@AlexPeniz: You can get probabilistic O(1) comparison, though, by storing a hash alongside the deque and updating them in sync. The probability of failure can even be negligible, through use of a cryptographic hash (warning: asymptotically fast, but with a big constant factor). It kind of depends on what you’re using it for.

Mark Ransom Over a year ago

@AlexPeniz that's why the first rule of optimization is to profile. Your intuition will often lead you astray.

|

Collectives™ on Stack Overflow

How are integer operations implemented in Python?

1 Answer 1

10 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Related