Best way to graph a dictionary with multiple values per key? [closed]

Question

I need to create a scatterplot of a dictionary of DNA sequence IDs and molecular weights. Many of the DNA sequences are ambiguous, so they can have many possible molecular weights (and thus there are many values per key). The dictionary looks something like this but many of the keys actually have far more values (I've removed some for the sake of brevity).

{'seq_7009': [6236.9764, 6279.027699999999, 6319.051799999999, 6367.049999999999], 'seq_418': [3716.3642000000004, 3796.4124000000006], 'seq_9143_unamb': [4631.958999999999], 'seq_2888': [5219.3359, 5365.4089], 'seq_1101': [4287.7417, 4422.8254]}

I have another function called get_all_weights that generates this dictionary, so I'm trying to call that function and then graph the results. This is what I have so far, based on another post on this site, but it doesn't work:

import matplotlib.pyplot as plt import itertools def graph_weights(file_name): with open (file_name) as file: d = {} # Initialize a dictionary and then fill it with the results of the get_all_weights function d.update(get_all_weights(file_name)) for k, v in d.items(): x = [key for (key,values) in b.items() for _ in range(len(values))] y = [val for subl in d.values() for val in subl] ax.plot(x, y) plt.show()

Does anyone know how I can achieve this? The plot should show the sequence IDs on the x axis and the values on the y axis and it should make it clear that the same value can occur multiple times.

Don't use with since you never use file. And if you did use with, take all the post-processing outside. Close the file as soon as you can. — Mad Physicist
– Mad Physicist, Commented Jan 10, 2022 at 6:14
Show what you get vs what you want. This is the one case when images are appropriate — Mad Physicist
– Mad Physicist, Commented Jan 10, 2022 at 6:15
We don't need to see the generation code. A proper minimal reproducible example would only need d = <first snippet> — Mad Physicist
– Mad Physicist, Commented Jan 10, 2022 at 6:18

BoomBoxBoy · Accepted Answer · 2022-01-10 16:34:18Z

You plot each sequence ID and their respective values with the following code.

import matplotlib.pyplot as plt d = {'seq_7009': [6236.9764, 6279.027699999999, 6319.051799999999, 6367.049999999999], 'seq_418': [3716.3642000000004, 3796.4124000000006], 'seq_9143_unamb': [4631.958999999999], 'seq_2888': [5219.3359, 5365.4089], 'seq_1101': [4287.7417, 4422.8254]} plt.figure(figsize=(15,5)) xlabels = [] for i, key in enumerate(d): if len(d[key])!=0: plt.scatter([i+1]*len(d[key]), d[key], c="#396B8B") xlabels.append(key) plt.xticks(list(range(1, len(xlabels)+1)), xlabels, rotation='horizontal') plt.grid(axis="y") plt.title("Molecular Weight by Sequence ID") plt.ylabel("Molecular Weight") plt.show()

You don't need to add 1 to the range and enumeration since they are never shown to the user directly
Thank you so much for the help. As you can probably tell, I'm quite new to Python. I tried this code and it works perfectly as you wrote it, but when I edit it so as to define the dictionary as (get_all_weights(file_name)), it throws the error "x and y must be the same size." Not sure why

Collectives™ on Stack Overflow

Best way to graph a dictionary with multiple values per key? [closed]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related