7

I'm a Python newbie trying to parse a file to make a table of memory allocations. My input file is in the following format:

48 bytes allocated at 0x8bb970a0 24 bytes allocated at 0x8bb950c0 48 bytes allocated at 0x958bd0e0 48 bytes allocated at 0x8bb9b060 96 bytes allocated at 0x8bb9afe0 24 bytes allocated at 0x8bb9af60 

My first objective is to make a table that counts the instances of a particular number of byte allocations. In other words, my desired output for the above input would be something like:

48 bytes -> 3 times 96 bytes -> 1 times 24 bytes -> 2 times 

(for now, I'm not concerned about the memory addresses)

Since I'm using Python, I thought doing this using a dictionary would be the right way to go (based on about 3 hours' worth of reading Python tutorials). Is that a good idea?

In trying to do this using a dictionary, I decided to make the number of bytes the 'key', and a counter as the 'value'. My plan was to increment the counter on every occurrence of the key. As of now, my code snippet is as follows:

# Create an empty dictionary allocationList = {} # Open file for reading with open("allocFile.txt") as fp: for line in fp: # Split the line into a list (using space as delimiter) lineList = line.split(" ") # Extract the number of bytes numBytes = lineList[0]; # Store in a dictionary if allocationList.has_key('numBytes') currentCount = allocationList['numBytes'] currentCount += 1 allocationList['numBytes'] = currentCount else allocationList['numBytes'] = 1 for bytes, count in allocationList.iteritems() print bytes, "bytes -> ", count, " times" 

With this, I get a syntax error in the 'has_key' call, which leads me to question whether it is even possible to use variables as dictionary keys. All examples I have seen so far assume that keys are available upfront. In my case, I can get my keys only when I'm parsing the input file.

(Note that my input file can run into thousands of lines, with hundreds of different keys)

Thank you for any help you can provide.

2
  • as i see you quoted 'numBytes', so, you are always referring to constant Commented Nov 28, 2011 at 9:15
  • and you omitted colon in lines after if allocationList.has_key('numBytes') and else - it should be syntax error Commented Nov 28, 2011 at 9:17

4 Answers 4

10

Learning a language is as much about the syntax and basic types as it is about the standard library. Python already has a class that makes your task very easy: collections.Counter.

from collections import Counter with open("allocFile.txt") as fp: counter = Counter(line.split()[0] for line in fp) for bytes, count in counter.most_common(): print bytes, "bytes -> ", count, " times" 
Sign up to request clarification or add additional context in comments.

3 Comments

I feel your answer is more true than anyone elses here
+1: If you are only interested in the count, Counter is the way to go. On the other hand, the OP wrote: for now, I'm not concerned about the memory addresses --- I suppose he might sooner or later need a custom solution that goes beyond Counter.
Thank you very much for this solution. I tried it, but it didn't work. This is because Counter is available only for Python > 2.7, and I'm using 2.6.4. But it led me to: stackoverflow.com/questions/3594514/…, and here I found a way to solve my problem. But I'm marking this answer as the solution, because this is probably the best way of solving the problem.
4

You get a syntax error because you are missing the colon at the end of this line:

if allocationList.has_key('numBytes') ^ 

Your approach is fine, but it might be easier to use dict.get() with a default value:

allocationList[numBytes] = allocationList.get(numBytes, 0) + 1 

Since your allocationList is a dictionary and not a list, you might want to chose a different name for the variable.

1 Comment

Thanks. I had no clue about the ":". Just figured out that I also need one at the end of my 'for' statement.
4

The dict.has_key() method of dictionnary has disappeared in python3, to replace it, use the in keyword :

if numBytes in allocationList: # do not use numBytes as a string, use the variable directly #do the stuff 

But in your case, you can also replace all the

if allocationList.has_key('numBytes') currentCount = allocationList['numBytes'] currentCount += 1 allocationList['numBytes'] = currentCount else allocationList['numBytes'] = 1 

with one line with get:

allocationList[numBytes] = allocationList.get(numBytes, 0) + 1 

3 Comments

There is no need to set the value twice using setdefault; use dict.get instead.
@FerdinandBeyer: you're right, it was a little bit overkill and useless to use setdefault.
Removed 'has_key' and used 'in'. Thanks for the tip. I was probably reading some outdated tutorials.
1

You most definitely can use variables as dict keys. However, you have a variable called numBytes, but are using a string containing the text "numBytes" - you're using a string constant, not the variable. That won't cause the error, but is a problem. Instead, try:

if numBytes in allocationList: # do stuff 

Additionally, consider a Counter. This is a convenient class for handling the case you're looking at.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.