1

I'm writing a simple program in C that finds anagrams of words based on looking them up in a hash table. What I have is an array of hash tables, where words are indexed by their length and hashed within by the summation of their letters, e.g. a = 1, b = 2, etc.

I'm still getting acquainted with dynamic memory management in C, so this question may be quite simple, but I have a functions that allocate and deallocate memory for each hash table (and its inside data).

I originally wrote this program using a single hash table and after running the program through valgrind I found that the memory had been freed correctly and there were no leaks. However, when expanding the program to use an array of hash tables, valgrind began to find possible leaks (though it's reporting them as still reachable). I'm confused as to why the memory isn't being freed correctly in the array of hash tables, although each hash table in the array is being run through the deallocate function that was used originally.

Gist with the full code Full Code

typedef struct Bucket Bucket; typedef struct HashTable HashTable; struct Bucket { char* data; Bucket *next; }; struct HashTable { int size; Bucket **buckets; }; int main(int argc, char const *argv[]) { // Allocate memory for array of hash tables HashTable** hash_array = (HashTable**) malloc(sizeof(HashTable*) * WORD_SIZE); for(i = 0; i < WORD_SIZE; i++) { hash_alloc(&hash_array[i], BUCKET_COUNT); } // Main logic here... // Free memory for(i = 0; i < WORD_SIZE; i++) { hash_dealloc(hash_array[i]); } free(hash_array); return 0; } 

Hash Table Allocation function

void hash_alloc(HashTable** a, unsigned int size) { *a = (HashTable*) malloc(sizeof(HashTable)); (*a)->buckets = (Bucket**) malloc(sizeof(Bucket*) * size); (*a)->size = size; } 

Hash Table Deallocation function

void hash_dealloc(HashTable* a) { int i; Bucket* current, *temp; for(i = 0; i < a->size; i++) { current = a->buckets[i]; while(current != NULL) { temp = current; free(temp->data); current = current->next; free(temp); } free(current); } free(a->buckets); free(a); } 

Add to hash table function

void add_to_hash_array(HashTable** a, char* data) { // Removed some other logic for readability... replace_str(data, "\n", "\0"); newNode->data = strdup(data); newNode->next = currentTable->buckets[index]; currentTable->buckets[index] = newNode; } else { return; } } 

Valgrind output

==39817== 261,120 bytes in 128 blocks are still reachable in loss record 5 of 7 ==39817== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==39817== by 0x400A38: hash_alloc (main.c:73) ==39817== by 0x4008B0: main (main.c:39) ==39817== ==39817== 286,936 bytes in 31,553 blocks are still reachable in loss record 6 of 7 ==39817== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==39817== by 0x4EBAD71: strdup (strdup.c:43) ==39817== by 0x400D4D: add_to_hash_array (main.c:141) ==39817== by 0x400914: main (main.c:51) ==39817== ==39817== 504,848 bytes in 31,553 blocks are still reachable in loss record 7 of 7 ==39817== at 0x4C2B6CD: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==39817== by 0x400D16: add_to_hash_array (main.c:136) ==39817== by 0x400914: main (main.c:51) 
22
  • Could you please rewrite the code without the superfluous casts etc. so as to make it more readable? Commented Jan 17, 2013 at 19:37
  • I've edited some of the irrelevant logic out and left the allocs and deallocs that are important for the question. Hope that helps. Commented Jan 17, 2013 at 19:40
  • Can we see HashTable and Bucket declarations? Commented Jan 17, 2013 at 20:13
  • Sure. I've just edited them in. Commented Jan 17, 2013 at 20:35
  • HashTable** hash_array = (HashTable**) malloc(sizeof(HashTable*) * WORD_SIZE); Why don't use the Hash_alloc() function instead ? Commented Jan 17, 2013 at 20:41

3 Answers 3

1

In generate_anagrams

void generate_anagrams(HashTable* a, char* word) { int index = hash(word); char* word_dup = strdup(word); char* current_word; string_sort(word_dup); printf("%s: %i size: %zi\n", word, index, strlen(word)); Bucket* current = a->buckets[index]; while(current != NULL) { if(current->data) { current_word = strdup(current->data); string_sort(current_word); if(strcmp(word_dup, current_word) == 0) { printf("%s\n", current->data); } } current = current->next; } free(current_word); free(word_dup); } 

you are leaking thestrduped current_words. Move the free(current_word); into the if (current->data).

In add_to_hash_array

void add_to_hash_array(HashTable** a, char* data) { int index = hash(data); HashTable* currentTable = a[strlen(data)]; Bucket* existingNode = hash_lookup(currentTable, data); if(existingNode == NULL) { Bucket *newNode = (Bucket*) malloc(sizeof(Bucket)); if(newNode == NULL) { exit(1); } replace_str(data, "\n", "\0"); newNode->data = strdup(data); newNode->next = currentTable->buckets[index]; currentTable->buckets[index] = newNode; } else { return; } } 

You only remove the newline from data after you have looked it up, but you insert data after removing the newline, so you will not detect duplicates.

In main, you should not

 generate_anagrams(hash_array[strlen(arg_dup) + 1], arg_dup); 

but use hash_array[strlen(arg_dup)] if you remove the newline right at the beginning of add_to_hash_array.

And of course, you should ensure that hash generates no out-of-range indices.

And strncpy(str, str, p-str); is undefined behaviour.

Sign up to request clarification or add additional context in comments.

Comments

1

Moving my comment to an answer because this nearly works.

No need to initialise to NULL before malloc, but any variable pointers that you later free should be set to NULL and only free if not NULL. Also initialise pointer values in your structs to NULL (easier to use calloc and get this automatically) so your guards work properly. I've had a bit more of look over your code and your hashing worries me - you need to make sure your hash function only returns values in the range 0 to BUCKET_COUNT-1. I made a few quick changes of this kind, and fixed your replace_str method and it no longer complains with valgrind.

Now when I run the code I see it finding the correct words to compare with, but the sorting function is not returning a sorted string, so it is not identifying the anagrams. Should be an easy fix.

Edit: I just pasted in the first string sort routine a google search found for me and ran to get the following output:

./a.out trips trips: 82 size: 5 strip stirp sprit spirt 

Comments

1

Ther's no memory leak, but the code has some problems (the replace_str function need to be replaced by a correct version). The output of valgrind on my linux box:

$ valgrind --leak-check=full --show-reachable=yes ./MEMORY_LEAK absurde ==13476== Memcheck, a memory error detector ==13476== Copyright (C) 2002-2010, and GNU GPL'd, by Julian Seward et al. ==13476== Using Valgrind-3.6.1-Debian and LibVEX; rerun with -h for copyright info ==13476== Command: ./MEMORY_LEAK absurde ==13476== ==13476== Conditional jump or move depends on uninitialised value(s) ==13476== at 0x8048A9A: hash_lookup (MEMORY_LEAK.c:132) ==13476== by 0x80489BD: add_to_hash_array (MEMORY_LEAK.c:113) ==13476== by 0x804871C: main (MEMORY_LEAK.c:50) ==13476== absurde: 70 size: 7 ==13476== Conditional jump or move depends on uninitialised value(s) ==13476== at 0x8048D1C: generate_anagrams (MEMORY_LEAK.c:169) ==13476== by 0x8048795: main (MEMORY_LEAK.c:56) ==13476== ==13476== Conditional jump or move depends on uninitialised value(s) ==13476== at 0x8048890: hash_dealloc (MEMORY_LEAK.c:81) ==13476== by 0x80487D8: main (MEMORY_LEAK.c:64) ==13476== ==13476== Conditional jump or move depends on uninitialised value(s) ==13476== at 0x4027BC2: free (vg_replace_malloc.c:366) ==13476== by 0x804889C: hash_dealloc (MEMORY_LEAK.c:87) ==13476== by 0x80487D8: main (MEMORY_LEAK.c:64) ==13476== ==13476== ==13476== HEAP SUMMARY: ==13476== in use at exit: 0 bytes in 0 blocks ==13476== total heap usage: 360 allocs, 360 frees, 133,424 bytes allocated ==13476== ==13476== All heap blocks were freed -- no leaks are possible ==13476== ==13476== For counts of detected and suppressed errors, rerun with: -v ==13476== Use --track-origins=yes to see where uninitialised values come from ==13476== ERROR SUMMARY: 65330 errors from 4 contexts (suppressed: 11 from 6) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.