I apologize for updating my answer multiple times. I shouldn't upload my answer when I'm tired.
I'm going to ignore the running time complexity to avoid confusing you.
If you want to know, this will run in O(n^2) time. This can be simplified to O(n log n) time using a tree, but using and explaining a tree might confuse a beginner.
The simplest way to do and undertand this is as follows:
# you don't need "import os" in this case. new_dict = {} # This is to open the file to get the count of all words: filename = 'abc.txt' with open(filename, "r") as fp: for line in fp: # For this to work, make sure the words don't end with any punctuation. # If any words end with a punctuation, take a look into re.sub() words = line.split() for word in words: word = word.lower() if word not in new_dict: new_dict[word] = 1 else: new_dict[word] += 1 # This is to calculate the count of all words: total_words = sum(new_dict.values()) # This is used to get the probability of each word. output_file = 'x.txt' with open(output_file, "w") as fs: # The dictionary is set as: {dictionary_name[key] : value} for key, value in sorted(new_dict.items()): probability = value / total_words fs.write(key + ": " + str(probability) + "\n")
Using for x_variable in collection_variable
When you are using for x_variable in collection_variable, you need to make sure any code using the x_variable resides inside of the for each loop. In this case, I pushed anything that uses word to make sure the word variable is accessible because you are calling it from inside the for word in words iterator.
When to call file.close()
When opening files with the with open(...) function, you don't need to explicitly close it. You can just leave the with open code block and the garbage collector will know you are done with that file. However, if you do open(...) without the with operator, then yeah, you need to call fs.close()
How sorted(variable) works
With simple data types like string, int, float, tuple, list, dictionary, they already include comparison functions, so you can use sorted(variable) to sort them. However, if you have your own data type or object, you need to define the comparison functions for the sorted() to work. Read more about sorted(variable) on Python docs
Hope this helps :)
and my input is 0.0006865437426441742Are you sure this is your input? \$\endgroup\$"i am going"will give you['i,','am','going']and" i am going."will give you['i', 'am','going.']. \$\endgroup\$