0

a couple of my python programs aim to

  1. format into a hash table (hence, I'm a dict() addict ;-) ) some informations in a "source" text file, and

  2. to use that table to modify a "target" file. My concern is that the "source" files I usually process can be very large (several GB) so it makes more than 10sec to parse, and I need to run that program a bunch of times. To conclude, I feel like it's a waste to reload the same large file each time I need to modify a new "target".

My thought is, if it would be possible to write once the dict() made from the "source" file in a way that python would be able to read/process much faster (I think about a format close to the one used in RAM by python), it would be great.

Is there a possibility to achieve that?

Thank you.

2 Answers 2

4

Yea, you can marshal the dict, or you can use pickle. For the difference between the two, especially as regards to speed, see this question.

Sign up to request clarification or add additional context in comments.

1 Comment

In addition, that works very efficiently (I tried only Marshal)!
0

pickle is the usual solution to such things, but if you see any value in being able to edit the saved data, and if the dictionary uses only simple types such as strings and numbers (nested dictionaries or lists are also OK), you can simply write the repr() of the dictionary to a text file, then parse it back into a Python dictionary using eval() (or, better yet, ast.literal_eval()).

3 Comments

this won't be very fast though
Faster than parsing it from the original file, slower than pickle.
Marshal fits perfectly my needs. I appreciate your answer though, I might use it in the future. Thanks.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.