Suggestions on a data structure

Question

I've been struggling with finding a suitable data structure that meets these requirements:

The elements of this data structure are organized in blocks. The order of these blocks are somewhat irrelevant, but the elements within the block maintain a certain order.
The number of insertions are not frequent.
Searches are by far more frequent.
Retrieving the index of the element within the data structure is critical.
Once an element of the data structure has been searched for, the next or previous (neightbour) element in the data structure should be easily accessible.

With this in mind, I have these considerations:

A linked or doubly-linked list might be optimal for requirements 1, 2, 4 and 5, but forces a linear search, which compromises req. 3.
A hash table solves req. 3, but as far as I understand poses a problem in req. 5 because in using hashes one loses control on the position of the elements within the data structure.

Developing a hash function that matches the order I need can be tricky because the input keys can be partially random.

A possible solution that I was considering (in C) was to keep an array of pointers to the elements that maintain the insertion order (or the order I need) and then a second array of pointers that sort the elements using a hash function. While the first array can be used to quickly access the elements and finding the neighbours, the second array can be used to quickly search the elements. But somehow I have the impression of overly complicating things and I do not want to reinvent the wheel.

What do you think? Any suggestion would be more than appreciated.

Many thanks

I forgot to add that the insertions can be at the end, middle or beginning. — mgfernan
– mgfernan, Commented Nov 13, 2018 at 13:55
One time I used a double-linked list, with a table containing the addresses of blocks every 50 or 100 links, plus the first and the last one (I didn't loop the list) — Jean-Marc Zimmer
– Jean-Marc Zimmer, Commented Nov 13, 2018 at 13:59
You can implement a hash table inside an array (of structs). The array can be addressed linearly, the (internal) hashtable can be used to find an index. — joop
– joop, Commented Nov 13, 2018 at 14:25
Are the elements sorted or just ordered (meaning if you have 2 distinct elements A and B, can A in some cases appear before B and in some cases after, or will it be definitively less or greater and must thus always be on the same side)? Are you given an insertion position and a block for insertion? If not, how do you determine those things? How do the blocks interact with each other? If you insert something in one block, can that affect any other blocks? How many blocks are there? — Bernhard Barker
– Bernhard Barker, Commented Nov 13, 2018 at 15:42
@Dukeling A will be always less or greater than B (always on the same side). The insertion position is determined on a certain property of the element to be inserted. This property determines the block and the element will be either appended or prepended to the block. Therefore, inserting in a block will affect the position of the blocks that come afterwards. The number of blocks can be known at the beginning (between 2 and 5). — mgfernan
– mgfernan, Commented Nov 14, 2018 at 6:28

dbush · Accepted Answer · 2018-11-13 14:27:01Z

An array would probably be the best data structure in this case.

Inserting into array involves searching for the correct slot for the new element, then shifting everything larger one element to the right using memmove. This can be costly if insertions are frequent, but if they are not frequent as you say then it shouldn't be a problem. You then have O(1) retrieval by index and O(log n) searches.

Maintaining an array of pointers to the actual data is a good move since that means you're only copying pointers instead of full data structures when inserting new elements.

So you have an array containing the data which is only appended to, and array of pointers which is resorted (i.e. find the right spot and shift) with each insert.

kfx · Accepted Answer · 2018-11-13 14:12:57Z

What about a search tree? It's a good fit for all of your requirements except 4.

To deal with this requirement, you can maintain an additional counter for each node. This counter would record the number of nodes in the subtree below the node.

Adding the counter would allow to find the index of the target node while doing the search operation (see here for an example how). It would make the insertion operation more complex, as after inserting a node you would also need to update the counters in all ancestor trees, but since you say you're not going to have many insertions, it should not be a problem.

Lundin · Accepted Answer · 2018-11-13 14:32:49Z

Possibly a "chained hash table", where each index in the hash table is a double-linked list. In your example, I suppose each "block" would be represented by such a double-linked list.

This gives fast search for the block, but relatively slow search of the individual element inside the block. The number of items inside the block matters. However, you will get the next/previous item instantly, and traversing the list from there will also be fast. Linked lists can also be implemented as arrays, which is more friendly to data cache memory than heap allocation of individual nodes.

Alternatively, you could perhaps use a similar hash table, but use a binary search tree for each index. You will get fast searching and it will scale well with lots of data. It will be ever so slightly slower at retriveing the next/previous item, as you have to check if left/right leaves exist, otherwise check parent node.

Collectives™ on Stack Overflow

Suggestions on a data structure

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related