|
22 | 22 | array. The array only supports two operations: indexing and assignment |
23 | 23 | to an array index.</p> |
24 | 24 | <p>The best way to think about an array is that it is one continuous block |
25 | | - of bytes in the computer's memory. This block is divided up into <math>n</math>-byte |
26 | | - chunks where <math>n</math> is based on the data type that is stored in the array. |
| 25 | + of bytes in the computer's memory. This block is divided up into <m>n</m>-byte |
| 26 | + chunks where <m>n</m> is based on the data type that is stored in the array. |
27 | 27 | Figure <url href="#fig_array" visual="#fig_array">1</url> illustrates the idea of an array that is sized |
28 | 28 | to hold six floating point values.</p> |
29 | 29 | <figure align="" xml:id="fig-array"> |
|
42 | 42 | <p>For example, suppose that our array starts at location <c>0x000040</c>, |
43 | 43 | which is 64 in decimal. To calculate the location of the object at |
44 | 44 | position 4 in the array we simply do the arithmetic: |
45 | | - <math>64 + 4 \cdot 8 = 96</math>. Clearly this kind of calculation is |
46 | | - <math>O(1)</math>. Of course this comes with some risks. First, since |
| 45 | + <m>64 + 4 \cdot 8 = 96</m>. Clearly this kind of calculation is |
| 46 | + <m>O(1)</m>. Of course this comes with some risks. First, since |
47 | 47 | the size of an array is fixed, one cannot just add things on to the end of |
48 | 48 | the array indefinitely without some serious consequences. Second, in |
49 | 49 | some languages, like C, the bounds of the array are not even checked, so |
|
77 | 77 | <p> |
78 | 78 | <ul> |
79 | 79 | <li> |
80 | | - <p>Accessing an itema at a specific location is <math>O(1)</math>.</p> |
| 80 | + <p>Accessing an itema at a specific location is <m>O(1)</m>.</p> |
81 | 81 | </li> |
82 | 82 | <li> |
83 | | - <p>Appending to the list is <math>O(1)</math> on average, but <math>O(n)</math> in |
| 83 | + <p>Appending to the list is <m>O(1)</m> on average, but <m>O(n)</m> in |
84 | 84 | the worst case.</p> |
85 | 85 | </li> |
86 | 86 | <li> |
87 | | - <p>Popping from the end of the list is <math>O(1)</math>.</p> |
| 87 | + <p>Popping from the end of the list is <m>O(1)</m>.</p> |
88 | 88 | </li> |
89 | 89 | <li> |
90 | | - <p>Deleting an item from the list is <math>O(n)</math>.</p> |
| 90 | + <p>Deleting an item from the list is <m>O(n)</m>.</p> |
91 | 91 | </li> |
92 | 92 | <li> |
93 | | - <p>Inserting an item into an arbitrary position is <math>O(n)</math>.</p> |
| 93 | + <p>Inserting an item into an arbitrary position is <m>O(n)</m>.</p> |
94 | 94 | </li> |
95 | 95 | </ul> |
96 | 96 | </p> |
|
142 | 142 | the new value is added to the list at <c>last_index</c>, and <c>last_index</c> |
143 | 143 | is incremented by one.</p> |
144 | 144 | <p>The <c>resize</c> method calculates a new size for the array using |
145 | | - <math>2 ^ {size\_exponent}</math>. There are many methods that could be used |
| 145 | + <m>2 ^ {size\_exponent}</m>. There are many methods that could be used |
146 | 146 | for resizing the array. Some implementations double the size of the |
147 | 147 | array every time as we do here, some use a multiplier of 1.5, and some |
148 | 148 | use powers of two. Python uses a multiplier of 1.125 plus a constant. |
149 | 149 | The Python developers designed this strategy as a good tradeoff for |
150 | 150 | computers of varying CPU and memory speeds. The Python strategy leads to |
151 | | - a sequence of array sizes of <math>0, 4, 8, 16, 24, 32, 40, 52, 64, 76\ldots</math> . |
| 151 | + a sequence of array sizes of <m>0, 4, 8, 16, 24, 32, 40, 52, 64, 76\ldots</m> . |
152 | 152 | Doubling the array size leads to a bit more wasted space at any |
153 | 153 | one time, but is much easier to analyze. Once a new array has been |
154 | 154 | allocated, the values from the old list must be copied into the new |
|
161 | 161 | that in Python objects that are no longer referenced are automatically |
162 | 162 | cleaned up by the garbage collection algorithm.</p> |
163 | 163 | <p>Before we move on let's analyze why this strategy gives us an average |
164 | | - <math>O(1)</math> performance for <c>append</c>. The key is to notice that most |
165 | | - of the time the cost to append an item <math>c_i</math> is 1. The only time |
| 164 | + <m>O(1)</m> performance for <c>append</c>. The key is to notice that most |
| 165 | + of the time the cost to append an item <m>c_i</m> is 1. The only time |
166 | 166 | that the operation is more expensive is when <c>last_index</c> is a power |
167 | 167 | of 2. When <c>last_index</c> is a power of 2 then the cost to append an |
168 | | - item is <math>O(last\_index)</math>. We can summarize the cost to insert the |
169 | | - <math>i_{th}</math> item as follows:</p> |
170 | | - <math_block docname="Advanced/PythonListsRevisited" label="True" nowrap="False" number="True" xml:space="preserve">c_i = |
| 168 | + item is <m>O(last\_index)</m>. We can summarize the cost to insert the |
| 169 | + <m>i_{th}</m> item as follows:</p> |
| 170 | + <math_block docname="Advanced/PythonListsRevisited" nowrap="False" number="True" xml:space="preserve">c_i = |
171 | 171 | \begin{cases} |
172 | 172 | i \text{ if } i \text{ is a power of 2} \\ |
173 | 173 | 1 \text{ otherwise} |
174 | 174 | \end{cases}</math_block> |
175 | 175 | <p>Since the expensive cost of copying <c>last_index</c> items occurs |
176 | 176 | relatively infrequently we spread the cost out, or <em>amortize</em>, the |
177 | 177 | cost of insertion over all of the appends. When we do this the cost of |
178 | | - any one insertion averages out to <math>O(1)</math>. For example, consider |
| 178 | + any one insertion averages out to <m>O(1)</m>. For example, consider |
179 | 179 | the case where you have already appended four items. Each of these four |
180 | 180 | appends costs you just one operation to store in the array that was |
181 | 181 | already allocated to hold four items. When the fifth item is added a new |
182 | 182 | array of size 8 is allocated and the four old items are copied. But now |
183 | 183 | you have room in the array for four additional low cost appends. |
184 | 184 | Mathematically we can show this as follows:</p> |
185 | | - <math_block docname="Advanced/PythonListsRevisited" label="True" nowrap="False" number="True" xml:space="preserve">\begin{aligned} |
| 185 | + <math_block docname="Advanced/PythonListsRevisited" nowrap="False" number="True" xml:space="preserve">\begin{aligned} |
186 | 186 | cost_{total} &= n + \sum_{j=0}^{\log_2{n}}{2^j} \\ |
187 | 187 | &= n + 2n \\ |
188 | 188 | &= 3n\end{aligned}</math_block> |
189 | 189 | <p>The summation in the previous equation may not be obvious to you, so |
190 | | - let's think about that a bit more. The sum goes from zero to <math>\log_2{n}</math>. |
| 190 | + let's think about that a bit more. The sum goes from zero to <m>\log_2{n}</m>. |
191 | 191 | The upper bound on the summation tells us how many times we |
192 | | - need to double the size of the array. The term <math>2^j</math> accounts for |
| 192 | + need to double the size of the array. The term <m>2^j</m> accounts for |
193 | 193 | the copies that we need to do when the array is doubled. Since the total |
194 | | - cost to append n items is <math>3n</math>, the cost for a single item is |
195 | | - <math>3n/n = 3</math>. Because the cost is a constant we say that it is |
196 | | - <math>O(1)</math>. This kind of analysis is called <term>amortized analysis</term> and |
| 194 | + cost to append n items is <m>3n</m>, the cost for a single item is |
| 195 | + <m>3n/n = 3</m>. Because the cost is a constant we say that it is |
| 196 | + <m>O(1)</m>. This kind of analysis is called <term>amortized analysis</term> and |
197 | 197 | is very useful in analyzing more advanced algorithms.</p> |
198 | 198 | <p>Next, let us turn to the index operators. |
199 | 199 | Listing <url href="#lst_arrindex" visual="#lst_arrindex">[lst_arrindex]</url> shows our Python |
200 | 200 | implementation for index and assignment to an array location. Recall |
201 | 201 | that we discussed above that the calculation required to find the memory |
202 | | - location of the <math>i_{th}</math> item in an array is a simple <math>O(1)</math> |
| 202 | + location of the <m>i_{th}</m> item in an array is a simple <m>O(1)</m> |
203 | 203 | arithmetic expression. Even languages like C hide that calculation |
204 | 204 | behind a nice array index operator, so in this case the C and the Python |
205 | 205 | look very much the same. In fact, in Python it is very difficult to get |
@@ -245,10 +245,10 @@ def __setitem__(self, idx, val): |
245 | 245 | self.my_array[i + 1] = self.my_array[i] |
246 | 246 | self.last_index += 1 |
247 | 247 | self.my_array[idx] = val</pre> |
248 | | - <p>The performance of the insert is <math>O(n)</math> since in the worst case we |
| 248 | + <p>The performance of the insert is <m>O(n)</m> since in the worst case we |
249 | 249 | want to insert something at index 0 and we have to shift the entire |
250 | 250 | array forward by one. On average we will only need to shift half of the |
251 | | - array, but this is still <math>O(n)</math>. You may want to go back to |
| 251 | + array, but this is still <m>O(n)</m>. You may want to go back to |
252 | 252 | Chapter <url href="#basicds" visual="#basicds">[basicds]</url> and remind yourself how all of these |
253 | 253 | list operations are done using nodes and references. Neither |
254 | 254 | implementation is right or wrong; they just have different performance |
|
0 commit comments