scatter plot with multiple category so the points don't overlap

Question

I'm trying to plot two sets of data in categories, or at least using string values for the X and Y axis grid points. I've seen some examples like here, but it's using a bar graph instead of a scatter plot and I haven't figured out how to make it work. I'd like to be able to add a positive or negative offset to the points based off the trace or the data associated with each point. So for example if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal. Right now you can see they over lap

 import plotly.graph_objs as go import pandas as pd data = {} data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E'] data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D'] data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up'] data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8] #copy data to dataframe tempDF = pd.DataFrame(columns=list(data.keys())) for tempKey in list(data.keys()): tempDF[tempKey] = data[tempKey] tempDF['markers'] = len(tempDF)*[5] tempDF['markers'][tempDF['Direction'] == 'Down'] = len(tempDF['markers'][tempDF['Direction'] == 'Down'])*[6] tempDF['colors'] = len(tempDF)*['red'] tempDF['colors'][tempDF['Direction'] == 'Down'] = len(tempDF['colors'][tempDF['Direction'] == 'Down'])*['blue'] fig = go.Figure() for direction in ['Up', 'Down']: fig.add_trace( go.Scatter( mode='markers', x=tempDF['Tx'][tempDF['Direction'] == direction], y=tempDF['Rx'][tempDF['Direction'] == direction], # x=tempDF['Tx'], # y=tempDF['Rx'], marker_size=15, marker_symbol=tempDF['markers'][tempDF['Direction'] == direction], # Triangle-up or down marker=dict( color=tempDF['colors'][tempDF['Direction'] == direction], size=20, line=dict( color='MediumPurple', width=2 ) ), name=direction, hovertemplate="%{y} <- %{x}<br>count: 5/10<br> Pct: 10 <br>Dir %{name}<extra></extra>" ) ) #set axis order fig.update_layout(xaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E']}, yaxis={'categoryorder':'array', 'categoryarray':['A', 'B', 'C', 'D', 'E'][::-1]} ) fig.show()

Edit: as J_H suggested, I was able to map the categories to numerical values, and then add an offset to my values to move them up or down. I did this the tickvals and ticktext properties of the xaxis dictionarys in the figure layout. Doing caused another problem with the data when hovering over the points on the plot though. if the points fall exactly on the axis values (on 'A', or 'B', etc on the x axis in my example) the point will read as 'A' or 'B', but if it's offset with the numerical value, then it will show the number rather than the string. to correct this, I needed to use customdata and hovertemplate in the figure properties to set the original values back to what I wanted. here's the code and the plot that i've updated to show these changes.

import plotly.graph_objs as go import pandas as pd import numpy as np data = {} possibleCategories = ['A', 'B', 'C', 'D', 'E'] numericalValues = [1, 2, 3, 4, 5] offset = .1 data['Tx'] = ['A', 'B', 'C', 'D', 'D', 'D', 'E', 'C', 'A', 'E', 'B', 'C', 'A', 'B', 'E'] data['Rx'] = ['A', 'E', 'C', 'B', 'B', 'E', 'D', 'C', 'B', 'C', 'A', 'B', 'A', 'E', 'D'] data['Direction'] = ['Up', 'Down', 'Down', 'Down','Up', 'Up', 'Up', 'Down', 'Up', 'Down', 'Down', 'Up', 'Up', 'Down', 'Up'] data['Metric'] = [1.2, 3.5, 4.5, 2, 8, 2, 5.6, 7, 9, 1, 5, 2.6, 13, .5, 4.8] data['yValue'] = len(data['Tx'])*[-1] # pre allocate numerical value arrays data['xValue'] = len(data['Tx'])*[-1] data['markers'] = len(data['Tx'])*[5] # default marker value to be an up arrow data['colors'] = len(data['Tx'])*["red"] # default color to red for tempKey in data.keys(): data[tempKey] = np.array(data[tempKey], dtype="object") # transform all the lists into numpy arrays # create numerical values for the categories. The Y axis will have an offset, but not the x axis for i in range(len(data['Tx'])): if data['Direction'][i] == 'Up': data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]+offset else: data['yValue'][i] = numericalValues[possibleCategories.index(data['Rx'][i])]-offset data['xValue'][i] = numericalValues[possibleCategories.index(data['Tx'][i])] # set markers and colors downIndexs = np.where(data['Direction'] == 'Down') data['markers'][downIndexs] = 6 data['colors'][downIndexs] = "blue" #copy data to dataframe tempDF = pd.DataFrame(columns=list(data.keys())) for tempKey in list(data.keys()): tempDF[tempKey] = data[tempKey] fig = go.Figure() for direction in ['Up', 'Down']: fig.add_trace( go.Scatter( mode='markers', x=tempDF['xValue'][tempDF['Direction'] == direction], y=tempDF['yValue'][tempDF['Direction'] == direction], # x=tempDF['Tx'], # y=tempDF['Rx'], marker_size=15, marker_symbol=tempDF['markers'][tempDF['Direction'] == direction], # Triangle-up or down marker=dict( color=tempDF['colors'][tempDF['Direction'] == direction], size=20, line=dict( color='MediumPurple', width=2 ) ), name=direction, customdata=np.stack((tempDF['Rx'][tempDF['Direction'] == direction], tempDF['Tx'][tempDF['Direction'] == direction], tempDF['Metric'][tempDF['Direction'] == direction]), axis=-1), hovertemplate="<br>".join([ '%{customdata[0]} <- %{customdata[1]}', 'metric: = %{customdata[2]}', 'Dir: ' + direction, '<extra></extra>' ]) ) ) #set axis order fig.update_layout( xaxis=dict( tickmode='array', tickvals=numericalValues, ticktext=possibleCategories, range=[min(numericalValues)-1, max(numericalValues)+1], side='top' ), yaxis=dict( tickmode='array', tickvals=numericalValues, ticktext=possibleCategories, range=[max(numericalValues)+1, min(numericalValues)-1 ] ), ) ) fig.show()

Wow. Ordinarily markers are plotted "on" the grid location, but here the artistic effect is the marker is pointing "at" the location. Nice! Very effective graphic communication. — J_H
– J_H, Commented Jun 5, 2022 at 14:13

J_H · Accepted Answer · 2022-06-04 01:47:21Z

We wish to avoid plotting one symbol atop another.

if the Up points were moved up above the grid line and the Down points were moved just below the grid, that would be ideal.

Yes, you are certainly free to do that at the app level, by munging the (x, y) values before passing them to plotly. In your example this amounts to mapping letters to numeric values, tweaking them, and passing them to the library.

For values that are not already discretized, the more general problem is to find collisions, to find data points p1 & p2 within a small distance d that should be perturbed to make the distance exceed d.

To perform this in linear rather than quadratic time, assuming some reasonable input distribution, it is enough to discretize continuous input values to a desired grid size. This lets us get away with an exact equality test, which is easier than worrying about a distance metric. Store the discretized values in a set, and perturb upon noticing a collision. Use min( ... ) - d and max( ... ) + d so it won't matter which point was above or below.

If you can use the seaborn library, a swarmplot or stripplot would be the natural approach. Perhaps you're looking for this function: https://plotly.com/python-api-reference/generated/plotly.express.strip.html

EDIT

The ord() function will map characters to ordinal values for you:

>>> for ch in 'ABC': ... print(ch, ord(ch), ord(ch) - ord('A')) ... A 65 0 B 66 1 C 67 2

I was thinking there would be a way to say 'A' is mapped to 1, 'B' to 2, etc, and then assign values like 1.2 and 0.8 to be above/below the 'A' category, but I haven't been successful in my search on how to do that yet.
I see the ord function will give me the unicode value of an ascii character, but is there a way to map a the categories for the plot to numerical values. Something like make ['A', 'B', 'C'] map to [0, 2, 5] for example. My 'A', 'B', and 'C' labels might actually be something like 'Node 2.0', 'Node 2.1', 'Node 3.0', but I would still want them evenly spaced out along the axis instead of making them the numerical values [2.0, 2.1, and 3.0] for instance.
actually, I might have found it. I think I can use something like this to map them the way I'm wanting to: fig.update_layout( xaxis = dict( tickmode = 'array', tickvals = [1, 3, 5, 7, 9, 11], ticktext = ['One', 'Three', 'Five', 'Seven', 'Nine', 'Eleven'])) I haven't tested this yet, but I'll do so in the morning

Collectives™ on Stack Overflow

scatter plot with multiple category so the points don't overlap

1 Answer 1

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Related