1

I am currently working on a data visualisation project. I need to create a stacked bar chart using Bokeh. The data is sourced from an Excel file that is updated regularly and contains multiple inputs and outputs. The structure of the data can vary, meaning that the number of inputs and outputs can change. My aim is to ensure that the Bokeh chart can adapt to these changes automatically without requiring manual code adjustments. The stacked bar chart should be able to dynamically adjust to changes in the number of inputs and outputs in the Excel file. It should be capable of visualising various combinations of fixed inputs (e.g. input 1 fixed, input 2 fixed, input 3 variable) while displaying all corresponding outputs. Ideally, the solution would automatically read the Excel file, detect the structure of the inputs and outputs, and update the chart accordingly.

I have tried to visualize the stacked bar charts for a set scenario ( 2 Inputs and 3 Outputs). in the example excel file the data was stored in a scheme like this this:

Example Scheme:

Input_1 Input_2 Output_1 Output_2 Output_3
1 1 100 200 200
2 1 150 150 200
3 1 200 100 200
1 2 200 200 100
2 2 150 200 150
3 2 100 200 200
1 3 200 100 200
2 3 200 150 150
3 3 200 200 100

It worked for this static scenario:

import pandas as pd from bokeh.plotting import figure, output_file, show from bokeh.layouts import gridplot, row from bokeh.models import ColumnDataSource data_frame = pd.read_excel("example.xlsx") data_frame['Input_1'] = data_frame['Input_1'].astype(str) data_frame['Input_2'] = data_frame['Input_2'].astype(str) output_file("stacked_bar_charts.html") unique_input_1 = data_frame['Input_1'].unique() unique_input_2 = data_frame['Input_2'].unique() plots_for_input_1 = [] plots_for_input_2 = [] for value in unique_input_1: filtered_data = data_frame[data_frame["Input_1"] == value] filtered_data = filtered_data.sort_values(by='Input_2') source = ColumnDataSource(filtered_data) plot = figure(title=f"Input_1 = {value} fixed", x_range=filtered_data["Input_2"].unique(), height=300, width=500 ) plot.xaxis.axis_label = "Input_2" plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"], x="Input_2", width=0.9, color=["orange", "gray", "brown"], source=source, legend_label=["Output 1", "Output 2", "Output 3"] ) plots_for_input_1.append(plot) for value in unique_input_2: filtered_data = data_frame[data_frame['Input_2'] == value] filtered_data = filtered_data.sort_values(by='Input_1') source = ColumnDataSource(filtered_data) plot = figure(title=f"Input_2 = {value} fixed", x_range=filtered_data["Input_1"].unique(), height=300, width=500 ) plot.xaxis.axis_label = "Input_1" plot.vbar_stack(stackers=["Output_1", "Output_2", "Output_3"], x="Input_1", width=0.9, color=["orange", "gray", "brown"], source=source, legend_label=["Output 1", "Output 2", "Output 3"] ) plots_for_input_2.append(plot) grid_for_input_1 = gridplot(plots_for_input_1, ncols=1) grid_for_input_2 = gridplot(plots_for_input_2, ncols=1) final_layout = row(grid_for_input_1, grid_for_input_2) show(final_layout) 

Here are output example pictures to show how I aim to visualize the data:

bokeh example plot 1:

bokeh example plot 1

bokeh example plot 2:

bokeh example plot 2

I was unable to find a dynamic approach that works for changing inputs and outputs, such as varying two inputs while keeping the others constant and visualizing the impacts on all outputs through stacked bar charts. Additionally, the approach should efficiently adapt to data updates without requiring manual code adjustments for each new scenario.

2
  • Your I/O don't match. For example, you claim three rows/subgraphs for Input_2 but it actually has only two unique values. Can you double-check ? Commented Mar 2, 2024 at 17:11
  • You were right! I have fixed the table so that the data matches. Commented Mar 3, 2024 at 9:29

1 Answer 1

1

You can use a groupby inside a nested-loop to plot all the possible I/O combinations :

output_file("stacked_bar_charts.html") grids = [] for inp_col in inp_cols: inplots = [] for name, sub_df in data_frame.groupby(inp_col): for inp_diff in inp_cols.difference([inp_col], sort=False): plot = figure( title=TITLE(inp_col, name), x_range=data_frame[inp_diff].unique(), height=H, width=W, ) plot.xaxis.axis_label = inp_diff _ = plot.vbar_stack( stackers=out_cols, x=inp_diff, width=BAR_WIDTH, color=COLORS, source=ColumnDataSource(sub_df.sort_values(inp_diff)), legend_label=out_cols.tolist(), ) inplots.append(plot) grids.append(gridplot(inplots, ncols=NCOLS)) final_layout = row(*grids) show(final_layout) 

Output (stacked_bar_charts.html) :

enter image description here

Used configuration :

import pandas as pd from bokeh.layouts import gridplot, row from bokeh.models import ColumnDataSource from bokeh.plotting import figure, output_file, show # PD-PREPROCESS data_frame = pd.read_excel("example.xlsx") inp_cols = data_frame.filter(like="Input").columns out_cols = data_frame.filter(like="Output").columns data_frame = data_frame.astype(dict.fromkeys(inp_cols, str)) # BOKEH-CONFIG TITLE = "{} = {} fixed".format COLORS = ["orange", "gray", "brown"] # depends on outputs BAR_WIDTH = 0.9 H, W = 300, 500 NCOLS = 1 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.