4

Edit: I have added all files (.c, .h and a Makefile). The program works but has many memory leaks.

http://dl.dropbox.com/u/21926200/Ngrams.tar.gz

How do you iterate through a hash table to call free()? I am wondering if I need to create a separate function to call free() on htable and results or I can do it inside this code. Any ideas? I just need to call free() on htable and results.

h_ptr *htable; int tsize; void new_table(int size) { tsize = size; htable = (h_ptr *) calloc(size, sizeof(h_ptr)); if (!htable) { fprintf(stderr, "Couldn't allocate hash array, exiting\n"); exit(1); } // At the beginning, each element in the hash table // is a NULL pointer for(int i=0; i<size; i++) { htable[i]=NULL; } } // Allocate new element to store in hash table h_ptr new_element(char *s) { h_ptr result = (h_ptr) malloc(sizeof(h_rec)); int wlen = strlen(s); if (wlen > llen) { lstring = s; llen = wlen; lcnt = 1; } else if (wlen == llen) lcnt++; if (!result) { fprintf(stderr, "Couldn't allocate hash element, exiting\n"); exit(1); } result->word = s; result->freq = 1; result->next = NULL; return result; } 

Thanks

4
  • Are you sure to understand what hashtables are? You definitely should explain what h_ptr is... We can only guess .... and then your code seems incorrect... Commented Mar 9, 2013 at 18:38
  • 2
    Learn to use a pool allocator and thank me later. Commented Mar 9, 2013 at 18:44
  • Still missing the header file defining h_ptr Commented Mar 9, 2013 at 18:45
  • Sorry, its a struct. Edited the post Commented Mar 9, 2013 at 18:50

2 Answers 2

3

Before the 'whole program' was provided

Given the information in the new_element() function:

result->word = s; result->freq = 1; result->next = NULL; 

we have to infer (since you didn't include the actual information) that your hash table is an array where the elements are linked lists of individually allocated elements containing an allocated name, a frequency and a pointer to the next item. Therefore, freeing the hash table involves visiting each element of the array in turn, chasing down the linked list, freeing the name and the element.

void free_hashtable(void) { if (htable != 0) { for (int i = 0; i < tsize; i++) { h_ptr next = 0; for (h_ptr curr = htable[i]; curr != 0; curr = next) { next = curr->next; free(curr->word); free(curr); } } free(htable); htable = 0; tsize = 0; } } 

After the 'whole program' was provided

We still don't have the data structures, so we still can't really say what is going on. So, we can add this to the top of the code:

#include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct hash_table_entry { int freq; struct hash_table_entry *next; char *word; } *h_ptr; 

However, one issue that arises when we add that is:

 h_ptr result = (h_ptr) malloc(sizeof(h_rec)); 

The compiler says:

error: 'h_rec' undeclared (first use in this function) 

identifying that line. Now, it might be that your have some typedef like:

typedef struct hash_table_entry h_rec; 

but we should not be having to guess. Please create an SSCCE (Short, Self-Contained, Correct Example) so we do not have to guess.

I compile the code using:

gcc -O3 -g -std=c99 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition ht.c -o ht 

The compiler warns me about:

ht.c:27:6: warning: no previous prototype for ‘lowercase’ [-Wmissing-prototypes] ht.c:41:6: warning: no previous prototype for ‘new_table’ [-Wmissing-prototypes] ht.c:59:7: warning: no previous prototype for ‘new_element’ [-Wmissing-prototypes] ht.c:82:10: warning: no previous prototype for ‘hash_function’ [-Wmissing-prototypes] ht.c:103:7: warning: no previous prototype for ‘save_string’ [-Wmissing-prototypes] ht.c:116:7: warning: no previous prototype for ‘find_element’ [-Wmissing-prototypes] ht.c:142:7: warning: no previous prototype for ‘sort_words’ [-Wmissing-prototypes] ht.c:176:6: warning: no previous prototype for ‘insert_string’ [-Wmissing-prototypes] ht.c:191:6: warning: no previous prototype for ‘init_token’ [-Wmissing-prototypes] ht.c:201:7: warning: no previous prototype for ‘get_word’ [-Wmissing-prototypes] ht.c:224:7: warning: no previous prototype for ‘get_token’ [-Wmissing-prototypes] ht.c:276:6: warning: no previous prototype for ‘word_freq’ [-Wmissing-prototypes] ht.c: In function ‘main’: ht.c:322:5: warning: implicit declaration of function ‘add_string_option’ [-Wimplicit-function-declaration] ht.c:323:5: warning: implicit declaration of function ‘add_int_option’ [-Wimplicit-function-declaration] ht.c:328:5: warning: implicit declaration of function ‘parse_options’ [-Wimplicit-function-declaration] ht.c:329:5: warning: implicit declaration of function ‘show_options’ [-Wimplicit-function-declaration] 

And for the last four names, the linker fails to find them. We need code that doesn't use code that is not available. Reduce your code to an SSCCE (note self-contained!).

I've hacked out the calls to those functions and simplified main accordingly. When run on its own source (./ht < ht.c), it produces:

N-gram size 1 Running analysis... (This can take several minutes or more!) Initializing hash table... Inserting all n-grams into hash table in lowercase form... Sorting all hash table elements according to frequency... Analysis Details: (Top 10 list of n-grams) 82 '=' 30 'int' 27 '0' 23 'if' 23 'char' 22 's' 16 '1' 16 'i' 14 'ls' 14 'h_ptr' Analysis Summary: 817 total n-grams 211 unique n-grams 94 singleton n-grams (occur only once) Most common n-gram (with 82 occurrences) is '=' Longest n-gram (1 have length 16) is 'hash_table_entry' Total time = 0.002225 seconds 

Looks OK...but run the optimized build under valgrind and another story shows up...but I've since concluded that the problem is in the optimized implementation of strlen() in the specific version of GCC I'm using (because, in part, using a different version of GCC on the very same code makes the errors go away, and because the errors were in a call to strlen() which has no business reading 4 bytes starting at offset 4 in an allocation of 5 bytes).

I added a function print_hashtable(). I think that for almost any non-trivial structure, it is worth having a 'print' or 'dump' function available like this.

static void print_hashtable(FILE *fp, const char *tag) { fprintf(fp, "Print hash table: %s\n", tag); fprintf(fp, "Size = %d; Address: %p\n", tsize, htable); if (htable != 0) { for (int i = 0; i < tsize; i++) { printf("Entry %d (%p)\n", i, htable[i]); h_ptr next = 0; for (h_ptr curr = htable[i]; curr != 0; curr = next) { next = curr->next; printf("Word: %s (freq %d; next %p)\n", curr->word, curr->freq, next); } } } } 

I inserted a call to this just before sort_words(). This showed that despite the not dreadfully effective hash algorithm, the data structure was intact.

I also inserted a call to print_hashtable() after sort_words(), and that showed that the hash table was comprehensively destroyed by the sorting process. The sort phase eliminates the hash table as a hash table; the only thing to do is to release the table as whole (free(htable)). Freeing all the table entries has to be done at the foot of the function word_freq():

static void free_hashlist(h_ptr head) { while (head != 0) { h_ptr next = head->next; printf("Free: %d (%s) %p -> %p\n", head->freq, head->word, (void *)head, (void *)next); free(head->word); free(head); head = next; } } static void word_freq(FILE *src, int details, int size) { char *s; h_ptr list_head, ptr; printf(" Initializing hash table...\n"); init_token(src); new_table(size); printf(" Inserting all n-grams into hash table in lowercase form...\n"); while ((s = get_word())) insert_string(s); print_hashtable(stdout, "After reading data"); printf(" Sorting all hash table elements according to frequency...\n"); list_head = sort_words(); if (details > 0) { printf("\nAnalysis Details:\n(Top %i list of n-grams)\n", details); ptr = list_head; while (ptr && details--) { printf("%d\t'%s'\n", ptr->freq, ptr->word); ptr = ptr->next; } } printf("\nAnalysis Summary:\n"); printf("%d total n-grams\n", wcnt); printf("%d unique n-grams\n", ucnt); printf("%d singleton n-grams (occur only once)\n", scnt); printf("Most common n-gram (with %d occurrences) is '%s'\nLongest n-gram (%d have length %d) is '%s'\n", mcnt, mstring, lcnt, llen, lstring); free_hashlist(list_head); } 

And the free_hashtable() code has to be simplified:

static void free_hashtable(void) { if (htable != 0) { free(htable); htable = 0; lcnt = 0; lstring = 0; mcnt = 0; mstring = 0; scnt = 0; tsize = 0; wcnt = 0; ucnt = 0; } } int main(void) { printf("\nRunning analysis... (This can take several minutes or more!)\n"); word_freq(stdin, 10, 1024); printf("Total time = %f seconds\n", (double) clock() / CLOCKS_PER_SEC); free_hashtable(); return 0; } 

In this program, it's not critical to reset the pointers and counts to null/zero. Also, the lstring and mstring pointers are invalidated by free_hashlist(), so arguably those should be nulled/zeroed in that function.

This runs leak-free for me, and with only system-allocated memory still in use at the end.

Sign up to request clarification or add additional context in comments.

9 Comments

But nowhere in his code does the original poster sets htable[i] in new_element so we cannot be sure that the hash-table is an array of linked list buckets.
@BasileStarynkevitch: I think new_element() is used to create an element that is added to the appropriate list in htable, but I agree that it was an inference and I said as much with 'we have to infer'. Let's see what the changes are with the whole program present.
@JonathanLeffler: Great, but we still miss the header file, and I'm getting the impression that the original poster (@RedWorkerGia) don't understand fully what his hash-table should be (or some basics about C programming).
sorry guys, I have added the h file. I should have explained it before. My bad. I do understand C but I mostly work in C++ and never had to deal with memory management like malloc and free. I just call new and it does everything :)
yeah, but its much better and easier than malloc, calloc and free
|
0

First, you don't need to clear each individual htable[i] after having allocated htable thru calloc (because calloc gives you a cleared memory zone or fails). I do suggest using calloc also in new_element for result.

Then, you probably want to have some function to delete your htable. Something like

void delete_htable(void) { if (!htable) return; for (int i=0; i<tsize; i++) free (htable[i]); free (htable); htable = NULL; } 

Remember that you can free a NULL pointer (nothing harmful would happen).

And you probably want to call delete_htable before your exit (just to avoid memory leaks, and make valgrind happy).

At last, you might consider using existing hashing functions in existing libraries. Consider for example Glib (within GTk, but you can use Glib outside of any GTK application). It gives you hash-tables already (and given that these are often used, you may be confident about them). Don't forget that GTK (hence Glib) is a free software library under LGPL2.1 license so you can download its source code, study and improve it. You want to look inside glib/ghash.h header and glib/ghash.c code source files (and some others).

3 Comments

Thank you for your response. I tried the similar thing before but didn't set the htable to NULL. I understand now, since htable is a global pointer variable, we need to set it to NULL after calling free. Thanks bro, u r the man. Any ideas on freeing results. I created a for loop to iterate through the list and call free but it broke my program.
I don't understand why it (and what exactly) has broken your program. Consider compiling with gcc -Wall -g and debugging with gdb and valgrind. Since we don't know what h_ptr is (to what exact struct-ure does it point to) we cannot help much more.
program works, but I need to fix "still reachable" memory leaks. I am using Valgrind

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.