Say I have 250,000 dictionary entries divided on as many files. Each file begins with a line containing the headword. Together they are 2 GB. What is the best way to arrange the information so I can easily and quickly look up a word? Should I make subdirectories a b c etc? Should I combine several files into larger files?
- 2This sort of rapid data access requirement is the exact reason for the existence of databases (or one of the most basic reasons, at least). Consider learning and using a relational database. I recommend Postgres, which is open source and very mature.Wildcard– Wildcard2016-10-21 23:09:03 +00:00Commented Oct 21, 2016 at 23:09
- 2The filesystem is a database of sorts, but SQLite is a good choice for this, too. I have a similarly large weather database but cannot use a full RDBMS in the customer's environment. Using the database saves lots of inodes and it's quite fast and reliable.Christopher– Christopher2016-10-21 23:18:04 +00:00Commented Oct 21, 2016 at 23:18
- @Wildcard, looking into Postgres, but it seems that learing it involves learning a programming language ...Toothrot– Toothrot2016-10-23 09:39:08 +00:00Commented Oct 23, 2016 at 9:39
2 Answers
You don't say much about what you're trying to do, or what the data is, but here's my idea -- which assumes all words are unique. You don't say you're concerned about efficient use of disk space.
Ext4 filesystem. Store each word in a separate file in one big directory. Let the filesystem find them for you - very easy for you (just open the file you want by name) and the filesystem has an efficient method of finding the files.
You'll need to make sure that your filesystem has sufficient free inodes - one per file so you'll need 250,000 free: du -i.
Avoid doing ls or other things which have to enumerate the files (such as opening it in a file browser) and performance should be fine.
- Enumerating files with
lsis pretty quick, if you don't show file types (or colours). It thestatcall on each file to check the file type that takes a long time. Tryls > out.txtorls | lessorls | catorls --color=neverto list files without checking file types.Sam Watkins– Sam Watkins2022-01-18 12:05:33 +00:00Commented Jan 18, 2022 at 12:05
If it's dictionary like a python dictionary convert it to JSON and store it to MongoDB or some NoSQL implementation and access it from the database