0

I need to create a scatterplot of a dictionary of DNA sequence IDs and molecular weights. Many of the DNA sequences are ambiguous, so they can have many possible molecular weights (and thus there are many values per key). The dictionary looks something like this but many of the keys actually have far more values (I've removed some for the sake of brevity).

{'seq_7009': [6236.9764, 6279.027699999999, 6319.051799999999, 6367.049999999999], 'seq_418': [3716.3642000000004, 3796.4124000000006], 'seq_9143_unamb': [4631.958999999999], 'seq_2888': [5219.3359, 5365.4089], 'seq_1101': [4287.7417, 4422.8254]} 

I have another function called get_all_weights that generates this dictionary, so I'm trying to call that function and then graph the results. This is what I have so far, based on another post on this site, but it doesn't work:

import matplotlib.pyplot as plt import itertools def graph_weights(file_name): with open (file_name) as file: d = {} # Initialize a dictionary and then fill it with the results of the get_all_weights function d.update(get_all_weights(file_name)) for k, v in d.items(): x = [key for (key,values) in b.items() for _ in range(len(values))] y = [val for subl in d.values() for val in subl] ax.plot(x, y) plt.show() 

Does anyone know how I can achieve this? The plot should show the sequence IDs on the x axis and the values on the y axis and it should make it clear that the same value can occur multiple times.

4
  • You know you can just do d = get_all_weights(...), right? Commented Jan 10, 2022 at 6:13
  • Don't use with since you never use file. And if you did use with, take all the post-processing outside. Close the file as soon as you can. Commented Jan 10, 2022 at 6:14
  • Show what you get vs what you want. This is the one case when images are appropriate Commented Jan 10, 2022 at 6:15
  • We don't need to see the generation code. A proper minimal reproducible example would only need d = <first snippet> Commented Jan 10, 2022 at 6:18

1 Answer 1

1

You plot each sequence ID and their respective values with the following code.

import matplotlib.pyplot as plt d = {'seq_7009': [6236.9764, 6279.027699999999, 6319.051799999999, 6367.049999999999], 'seq_418': [3716.3642000000004, 3796.4124000000006], 'seq_9143_unamb': [4631.958999999999], 'seq_2888': [5219.3359, 5365.4089], 'seq_1101': [4287.7417, 4422.8254]} plt.figure(figsize=(15,5)) xlabels = [] for i, key in enumerate(d): if len(d[key])!=0: plt.scatter([i+1]*len(d[key]), d[key], c="#396B8B") xlabels.append(key) plt.xticks(list(range(1, len(xlabels)+1)), xlabels, rotation='horizontal') plt.grid(axis="y") plt.title("Molecular Weight by Sequence ID") plt.ylabel("Molecular Weight") plt.show() 
Sign up to request clarification or add additional context in comments.

3 Comments

You don't need to add 1 to the range and enumeration since they are never shown to the user directly
Also, xlabels = list(d.keys())
Thank you so much for the help. As you can probably tell, I'm quite new to Python. I tried this code and it works perfectly as you wrote it, but when I edit it so as to define the dictionary as (get_all_weights(file_name)), it throws the error "x and y must be the same size." Not sure why

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.