1

Say I have 250,000 dictionary entries divided on as many files. Each file begins with a line containing the headword. Together they are 2 GB. What is the best way to arrange the information so I can easily and quickly look up a word? Should I make subdirectories a b c etc? Should I combine several files into larger files?

3
  • 2
    This sort of rapid data access requirement is the exact reason for the existence of databases (or one of the most basic reasons, at least). Consider learning and using a relational database. I recommend Postgres, which is open source and very mature. Commented Oct 21, 2016 at 23:09
  • 2
    The filesystem is a database of sorts, but SQLite is a good choice for this, too. I have a similarly large weather database but cannot use a full RDBMS in the customer's environment. Using the database saves lots of inodes and it's quite fast and reliable. Commented Oct 21, 2016 at 23:18
  • @Wildcard, looking into Postgres, but it seems that learing it involves learning a programming language ... Commented Oct 23, 2016 at 9:39

2 Answers 2

2

You don't say much about what you're trying to do, or what the data is, but here's my idea -- which assumes all words are unique. You don't say you're concerned about efficient use of disk space.

Ext4 filesystem. Store each word in a separate file in one big directory. Let the filesystem find them for you - very easy for you (just open the file you want by name) and the filesystem has an efficient method of finding the files.

You'll need to make sure that your filesystem has sufficient free inodes - one per file so you'll need 250,000 free: du -i.

Avoid doing ls or other things which have to enumerate the files (such as opening it in a file browser) and performance should be fine.

1
  • Enumerating files with ls is pretty quick, if you don't show file types (or colours). It the stat call on each file to check the file type that takes a long time. Try ls > out.txt or ls | less or ls | cat or ls --color=never to list files without checking file types. Commented Jan 18, 2022 at 12:05
0

If it's dictionary like a python dictionary convert it to JSON and store it to MongoDB or some NoSQL implementation and access it from the database

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.