5
\$\begingroup\$

I want to implement a linked list in C that I can add to my personal util library and keep it as generic and modular as possible

This is my header file containing all of the function prototypes and struct definitions:

#ifndef LINKED_LIST_H #define LINKED_LIST_H typedef struct NodeType { void* value; struct NodeType* next; } Node; typedef struct LinkedListType { Node* head; int (*compare)(void*, void*); void (*free) (void*); } LinkedList; LinkedList* linkedList_init(LinkedList* list, int (*compare) (void*, void*), void (*freeData) (void*)); // Initializes the generic functions and head of the list int linkedList_isInitialized(LinkedList* list); // Return 0 if is empty, otherwise return 1 int linkedList_add(LinkedList* list, void* data); // Add a node to the end int linkedList_remove(LinkedList* list, int index); // Remove a node at index int linkedList_insert(LinkedList* list, int index, void* data); // Insert a node at index void* linkedList_get(LinkedList* list, int index); // Return value of node at index, otherwise if index out of bounds return NULL void linkedList_free(LinkedList* list); // Free all the elements in the list #endif 

And this is my implementation

#include <linkedList.h> #define INDEX_OUT_OF_BOUNDS fprintf(stderr, "Index out of bounds \n"); int linkedList_init( LinkedList* list, int (*compare)(void*, void*), void (*freeData)(void*)) { // Set generic functions in LinkedList list->compare = compare; list->freeData = freeData; // Create head node list->head = (Node*)malloc(sizeof(Node)); list->head->next = NULL; return 0; } int linkedList_isInitialized(LinkedList* list) { if(list->head == NULL) { fprintf(stderr, "List is not initialized"); return 1; } return 0; } int linkedList_add(LinkedList* list, void* data) { if(! linkedList_isInitialized(list)) return 1; // Iterate through list starting with head until next node is null Node* node = list->head; while(node->next != NULL) { node = node->next; } // Create new node Node* newNode = (Node*)malloc(sizeof(Node)); newNode->next = NULL; newNode->value = data; // Link new node node->next = newNode; return 0; } int linkedList_remove(LinkedList* list, int index) { if(! linkedList_isInitialized(list)) return 1; if(index < 0) {INDEX_OUT_OF_BOUNDS; return 1;} // Zero index edge case if(index == 0) { Node* nodeToRemove = list->head->next; list->head->next = nodeToRemove->next; free(nodeToRemove); return 0; } // Traverse list and find node int i = 0; Node* previousNode = list->head; while(previousNode->next != NULL && i < index - 1) { previousNode = previousNode->next; i++; } // Check if index was out of bounds if(! internal_indexIsInBounds(i, index - 1)) return 1; Node* nodeToRemove = previousNode->next; previousNode->next = nodeToRemove->next; free(nodeToRemove); return 0; } int linkedList_insert(LinkedList* list, int index, void* data) { if(! linkedList_isInitialized(list)) return 1; if(index < 0) {INDEX_OUT_OF_BOUNDS; return 1;} // Instantiate new node to be inserted Node* newNode = (Node*)malloc(sizeof(Node)); newNode->value = data; // Zero index insertion replaces head->next if(index == 0) { Node* nextNode = list->head->next; list->head->next = newNode; newNode->next = nextNode; return 0; } // Traverse list and find node int i = 0; Node* parentNode = list->head->next; while(parentNode->next != NULL && i < index - 1) { parentNode = parentNode->next; i++; } // Check if index was out of bounds if(! internal_indexIsInBounds(i, index - 1)) return 1; // Insert node after parent node Node* nextNode = parentNode->next; parentNode->next = newNode; newNode->next = parentNode; return 0; } void* linkedList_get(LinkedList* list, int index) { if(! linkedList_isInitialized(list)) return NULL; if(index < 0) {INDEX_OUT_OF_BOUNDS; return NULL;} int i = 0; Node* node = list->head->next; while(node != NULL && i < index) { node = node->next; i++; } if(! internal_indexIsInBounds(i, index)) return NULL; return node->value; } static int internal_indexIsInBounds(int i, int index) { if(index < 0) {INDEX_OUT_OF_BOUNDS; return 1;} if(i < index) { INDEX_OUT_OF_BOUNDS; return 0; } return 1; } 

I want this code to be as clear to read, easy to use, robust, and all around airtight as possible so please nitpick anything that seems dependent or non-modular. I plan to use this is a multitude of projects and I don't want to have to go back very often and deal with implementation or design details of lower level stuff like this.

\$\endgroup\$
11
  • 4
    \$\begingroup\$ Trivial comments: "library" functions are usually mute; returning status codes to the caller. Many LL applications expect to use "stack" (LIFO) operations. A "tail pointer" would speed execution. What will you do when a "double linked list" is the best data model? Sometimes "circular buffers" are needed! Finally, the malloc'd pointers to "generic data" will often be devouring (and fragmenting) memory at an exorbitant rate... Ouch!! \$\endgroup\$ Commented Nov 21, 2024 at 0:44
  • 2
    \$\begingroup\$ Agreed with @Fe2O3: adding extra levels of indirection to enable code-reuse will make your data structures less efficient, possibly very inefficient with much worse cache locality. Linked lists are already problematic for efficiency. If you want code-reuse like this, use C++ where you can extend a struct/class and add your own members to the same struct next to the next pointer. In C, you're still going to need these functions to inline for performance not to suck even more than linked lists already do, so you still need a copy of the code in every project that uses them, or in headers \$\endgroup\$ Commented Nov 21, 2024 at 5:00
  • 2
    \$\begingroup\$ Examine: LinkedList* list, int (*compare)(void*, void*),... Notice the first 'splat' adjacent to the return type. Notice the second 'splat` adjacent to the token 'compare'... Imagine the improved consistency if the "splat-space" were "space-splat" in ALL instances where a 'splat' is required... It's optional (when it's correct), but I recommend using "space-splat" always. (A variable is, first-and-foremost, a pointer. What it points to - the datatype - is secondary...) Just free advice... \$\endgroup\$ Commented Nov 21, 2024 at 5:33
  • 3
    \$\begingroup\$ Please include the unit tests next time. Without seeing the tests, reviewers have no idea what has been confirmed to work, and what code is sitting completely untested. The unit tests also give indication of how you expect the library to be used (and that might inform reviews pointing out mistaken assumptions). \$\endgroup\$ Commented Nov 21, 2024 at 17:00
  • 3
    \$\begingroup\$ And please show the full code. You've omitted linkedList_free() from the implementation file, and that makes it hard for reviewers to fully test the code using Valgrind or similar. \$\endgroup\$ Commented Nov 21, 2024 at 17:01

5 Answers 5

15
\$\begingroup\$

Notes

  • Casting malloc (as well as realloc and calloc) is not necessary in C. Leaving them out will not cause warnings or errors and may suppress warnings that would otherwise be helpful. It is necessary in C++, but that's only relevant if you're using a C++ compiler to compile C code. Some reading on Stack Overflow: Should I cast the result of malloc (in C)?
  • You are not verifying that malloc has succeeded before dereferencing that pointer by checking it against NULL before use. This can lead to undefined behavior. See linkedlist_init for an example of this.
  • In the same function you use the list argument before checking that it is not NULL. You may know that in your current use case a null pointer is never passed in, but can you guarantee that will always be the case?

Linked lists?

From a general design standpoint, consider whether you really want to use linked lists extensively in your code, or if you want a dynamic array, more akin to C++'s std::vector. Linked lists have O(n) random access and must perform a dynamic memory allocation on each insertion, resulting in data that can be widely distributed in memory. A dynamic array can have O(1) access and if designed properly, only O(log n) dynamic memory allocations, and O(1) on de-allocation.

You may wish to watch Bjarne Stroustrup: Why you should avoid Linked Lists. Video appears to be unavailable, but there is a discussion on it on Stack Overflow.

Linked lists!

If you decide you do want to pursue the linked list, then there are some key things to think about.

  • You probably want your list struct to hold a pointer to the last in addition to the first node. This makes appending to the end an O(1) operation rather than O(n).
  • You probably want to make this an opaque data type.
    • This will help ensure that the user of a list cannot break things like that pointer to the last node.
    • If you do this, it's trivial to have your list contain a member describing its length. Any operation that modifies the length updates this member. With this, getting the length of a linked list becomes O(1) rather than O(n).
    • Relevant Stack Overflow reading: What defines an opaque type in C, and when are they necessary and/or useful?
  • Your node holds a pointer to the data it refers to. Do you want your lists to "own" the data they point to, or simply point to it? On freeing a list, should it also free that data?
    • It looks a bit like you were heading in the direction of letting your freeData function pointer decide this.
    • You might implement append and append_copy to handle this issue, with the latter copying data from the value passed in rather than just saving a pointer to it.
\$\endgroup\$
12
\$\begingroup\$

Overview

You never use the compare or freeData members of the linked list why are they there?

Also it obviously does not compile as the header it is free while in the code it is freeData (Copy and paste error?).

You have a lot of code checking the size of the list. Why not add a member to the "LinkedList" to store the number of members?

Bug(s)

You assume malloc() zeros out the memory that is returned. It does not. See the comment about the Init function.

Code Review

Seems a bit generic.

#ifndef LINKED_LIST_H #define LINKED_LIST_H 

I would be worried that another library I include in my project has the same header guard.


If you have "free" are you not also going to need an "allocate" to go with that? Normally these functions come in pairs that need to match.

typedef struct LinkedListType { Node* head; int (*compare)(void*, void*); void (*free) (void*); } LinkedList; 

The name isInitialized does not match the comment (is empty). Why the difference?

int linkedList_isInitialized(LinkedList* list); // Return 0 if is empty, otherwise return 1 

Adding to the end is more commonly referred to as "append".

int linkedList_add(LinkedList* list, void* data); // Add a node to the end 

Why are you creating a node when the list is empty?

int linkedList_init( .... // Create head node list->head = (Node*)malloc(sizeof(Node)); // Note: list->head->value has an indeterminate value. // You could use malloc () to make sure it is zeroed out. // But malloc does not do anything with the memory so // it could potentially have any value. list->head->next = NULL; 

Two issues:

1: This does not match the comments in the header file.

int linkedList_isInitialized(LinkedList* list) { if(list->head == NULL) { fprintf(stderr, "List is not initialized"); return 1; } return 0; } 

2: It is not guaranteed to work. If you have not called Init the value is indeterminate. You can't test to see if it is valid that way.


\$\endgroup\$
5
\$\begingroup\$

(It was suggested some suggestions in comments be posted as a full answer. Here we go...)

Silent functions

To make full use of library functions, those functions should never contain expectations that output messages can/will be read. Code intended to be packaged into a library can return published status codes allowing the calling application code to handle unfavourable situations appropriately. (An error message to stdout, even stderr might not be the right thing to do.)

(Always verify the return code from library or system functions.)

Linked List as a Stack

Without requiring a predefined allocation, the Linked List data structure is useful as a stack.

Code implementing both push() and pop() (and sizeOf(), and getTop()) would be programmer friendly.

Faster append()

Adding to the tail of a long LL would execute faster if the LL object maintained a pointer to the current tail node (as it does to the head node.)

Feature creep

Down the road, the OP may find a need for a "double linked list". Perhaps even a "circular linked list" or a "circular double linked list". The OP should consider how close to fulfilling many/all future needs the current code actually is before calling it a day.

Generic comes at a price

There are a number of Standard Library implementations of malloc() and its cousins. Inside that black box, its easy to imagine some form of linked list code that maintains pointers AND byte counts. On modern hardware, it's likely any heap allocation carries (at least) a 16 byte overhead.

The OP's "generic" Node is, itself, another 16 bytes, without considering any data stored. If that data is, trivially, an 8 byte double stored in another heap allocation (16 more bytes), one begins to see the cost. One double stored in a node of the OP's LL will consume (estimated) 56 bytes of memory. Ouch!

Accessing these nodes will also require both the system library functions and application functions to sequentially skip through their collections of occupied nodes.

Opacity

The current version requires the caller to include a header file that exposes the implementation of the linked list: those struct declarations. This risks coders taking short cuts, writing application code that directly manipulates the mechanics of the linked list. This increases the maintenance headache if/when the library implementation details are ever altered.

Application code should not be able to "see" the inside workings of a library function. (Eg: Without digging, a coder does not know how their malloc()/free() functions actually work; just that they do.)

Odd notes

Examine the function parameter list: LinkedList* list, int (*compare)(void*, void*),... Notice the first 'splat' adjacent to the parameter's datatype. Notice the second 'splat' adjacent to the token compare... Imagine the improved consistency if the "splat-space" were "space-splat" in ALL instances where a 'splat' is required. It's optional (when it's correct), but I recommend using "space-splat" always. (A variable or a returned value will be, first-and-foremost, a pointer. What that pointer points to - the datatype - is secondary...)

There's no apparent thought been given, yet, to multi-threaded code considerations. Everything in its time.

If the OP is thinking of getting-into coding embedded systems, it's worth noting that dynamic memory allocation is strongly NOT recommended. Just something to think about before dreaming of writing the ultimate library LL implementation.

("(Balanced) Binary Trees" and "Tries" are lurking just over the horizon, too...)

\$\endgroup\$
2
  • \$\begingroup\$ Re "space-splat": agreed, but for different reasons. Consider "int* x, y". It looks like x and y are both pointers, but only x is. Whereas if you write it: "int *x, y", the difference is more obvious. \$\endgroup\$ Commented Dec 4, 2024 at 14:44
  • \$\begingroup\$ @SteveMelnikoff Yes, this too! \$\endgroup\$ Commented Dec 4, 2024 at 20:28
4
\$\begingroup\$

Case naming inconsistency.

[Edit 2024 Dec 15]
As pointed out in comments, OP is following a naming convention common in another language. Employing ideas of other languages other than C is useful yet risks misunderstanding when that other language's attributes are not known to the C user.

Code uses:

linkedList.h LINKED_LIST_H LinkedListType { LinkedList; linkedList_init( linkedList_...( 

Be consisted with case and _ use:

linkedList.h LINKEDLIST_H or linkedList_H // Yes its a macro .... linkedList_type { linkedList; linkedList_init( linkedList_...( 

As not a standard header, move to "".

Edit: 20241215
See comment. This depends on how standard this is for OP and considering code export.

//#include <linkedList.h> #include "linkedList.h" 
\$\endgroup\$
5
  • 5
    \$\begingroup\$ I have to disagree. OP is consistent and is using very common conventions (used in Java for example). \$\endgroup\$ Commented Nov 21, 2024 at 5:17
  • 2
    \$\begingroup\$ Choice of <>/"" is not determined by whether header is a Standard header, but rather whether it's a library or program-defined. In this case, a good suggestion is marred by mistaken reasoning. \$\endgroup\$ Commented Nov 21, 2024 at 16:27
  • \$\begingroup\$ @tkausl Setting the case issue aside as used in another language for the moment, do you see injecting the first '_' in LINKED_LIST_H consistent with the other 5? Perhaps LINKEDLIST_H woudl have been more consistent? \$\endgroup\$ Commented Nov 23, 2024 at 15:42
  • \$\begingroup\$ Using snake-case with uppercase is common. Word boundaries are implied using (lower or upper) camel case, when using all-uppercase we instead use underscores. \$\endgroup\$ Commented Nov 23, 2024 at 15:48
  • \$\begingroup\$ @TobySpeight As OP had "add to my personal util library", using <> or ``""` is dependent on how standard this routine is for OP. AS I see it, to export OP's code for others, putting it in "" is preferable. \$\endgroup\$ Commented Dec 15, 2024 at 11:22
1
\$\begingroup\$

Consider a ring

If linkedList_add() is to be used much, consider rather than pointing to the first node, point to the last and have the last point to the first. (End of list is detected when the next is pointing to the first again.)
No separate .last needed in the head node.

This changes O() of linkedList_add() from O(n) to O(1). Other function's O() remain the same.

Use const*

Rather than int (*compare)(void*, void*);, use pointers to const data like

int (*compare)(const void*, const void*);

Certainly the compare function does not need to change the referenced data. This allows more functions to nicely work for comparing.

Future: add apply()

Add a function, perhaps called int apply_to_list(void *state, int (*apply)(void *state, void *value)), that executes executes apply() to each list's node's .data. If (*apply)() returns non-zero, then stop iterating and return that value. Else return 0.

This can be useful for printing, counting, searching, etc.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.