Skip to content

Improve perf infra display for multiple clients#7770

Open
eddyashton wants to merge 2 commits intomicrosoft:mainfrom
eddyashton:basicperf_display_for_multiple_clients
Open

Improve perf infra display for multiple clients#7770
eddyashton wants to merge 2 commits intomicrosoft:mainfrom
eddyashton:basicperf_display_for_multiple_clients

Conversation

@eddyashton
Copy link
Member

Need to run a many-client perf test to stress the blocking behaviour. This improves the display of the stats when we many clients in 2 ways:

  • Per-client colours for (send and receive) throughputs, displayed in stacked bar charts
  • Aggregate printing of running process

That's a big source diff on non-critical code, but here's some screenshots to show the actual diff.

First the baseline, showing there's minimal change to the common single-client display.

1 client

Before After
image image

The legend on the bar charts now says client_0, rather than just recvd, but things are otherwise identical.

But things look really different with many clients.

30 clients

Before After
image image
image image

First row shows the benefit of aggregate printing - instead of dumping 30 indistinguishable lines every second, you print one containing the same information.
Second shows the stacked colours. May not be immediately obvious how to read, but the general principle is that a smooth spectrum is a sign of client fairness, and distinct colour blocks are a sign that some clients are seeing very different results. That's more pronounced in the blocking tests (and is helping me tune the parameters there), but as a rough example independently here's a 4-client run where large write-ahead lets some finish first:

image

The last blue row shows very few sends for clients 1 and 2 - they finished their sends earlier. "Significantly?" is left as an exercise for the reader.

@eddyashton eddyashton requested a review from a team as a code owner March 25, 2026 11:25
Copilot AI review requested due to automatic review settings March 25, 2026 11:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the perf test infrastructure (basicperf.py) to improve readability and interpretability of output when running many clients concurrently, particularly around client status reporting and throughput visualisation.

Changes:

  • Aggregate client “running/completed” status logging when there are many clients.
  • Replace single-series bar charts with per-client stacked bar charts (send/receive/error breakdowns) using per-client colour gradients.
  • Add basic divide-by-zero guards in aggregated statistics calculations.
import datetime
import ccf.ledger
import plotext as plt
from plotext._utility import get_title, colorize, simple_bar_marker, to_rgb
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This imports from plotext._utility, which is a private/internal module (leading underscore) and may change without notice even within the pinned <6 range. If possible, prefer public plotext APIs for title/colour helpers, or isolate this behind a small compatibility wrapper (with a fallback) so future plotext updates don’t break perf runs.

Suggested change
from plotext._utility import get_title, colorize, simple_bar_marker, to_rgb
import importlib
try:
_plotext_utility = importlib.import_module("plotext._utility")
get_title = _plotext_utility.get_title
colorize = _plotext_utility.colorize
simple_bar_marker = _plotext_utility.simple_bar_marker
to_rgb = _plotext_utility.to_rgb
except (ImportError, AttributeError):
def get_title(*args, **kwargs):
# Fallback: no-op title helper
return ""
def colorize(text, *args, **kwargs):
# Fallback: return unmodified text if colorization is unavailable
return text
def simple_bar_marker():
# Fallback: basic block character for bar markers
return "█"
def to_rgb(color):
# Fallback: pass-through; assumes callers can handle a plain color value
return color
Copilot uses AI. Check for mistakes.
Comment on lines 484 to +487
start_send = agg["sendTime"].min()
end_recv = agg["receiveTime"].max()
duration_s = (end_recv - start_send).total_seconds()
throughput = len(agg) / duration_s
throughput = len(agg) / duration_s if duration_s > 0 else 0
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new duration_s > 0 guards won’t prevent failures when there are zero rows in agg (e.g., iterations=0 or all parquet files empty): agg['sendTime'].min()/max() will be None, so (end_recv - start_send) will raise before reaching these checks. Consider an early exit (or explicit validation) when agg.height == 0 (and similarly for per-client overall).

Copilot uses AI. Check for mistakes.
Comment on lines +588 to +592
for cn in client_names:
c_df = agg.filter(pl.col("client") == cn)
cs = (
c_df.with_columns(
((pl.col("sendTime") - start_send) / 1000000)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per-client chart prep does multiple agg.filter(...) passes per client (and then additional filters for rcvd/errors), which can become expensive for large perf runs. This can be computed in a single Polars pass (eg add second_send/second_recv once, then group_by(['client','second']) with conditional counts) to reduce repeated scans and allocations.

Copilot uses AI. Check for mistakes.
import datetime
import ccf.ledger
import plotext as plt
from plotext._utility import get_title, colorize, simple_bar_marker, to_rgb
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eddyashton colorize() is in the public API, simple_bar_marker is trivial enough that we can pass it directly as a unicode character. to_rgb exists under matplotlib.colors.

get_title is the only one that I can't find an easy non-private equivalent for, I am not sure it's necessary.

end_recv = agg["receiveTime"].max()
duration_s = (end_recv - start_send).total_seconds()
throughput = len(agg) / duration_s
throughput = len(agg) / duration_s if duration_s > 0 else 0
Copy link
Member

@achamayou achamayou Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever useful to support duration == 0? I think we could probably clamp down to 1.

Copy link
Member

@achamayou achamayou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this, except for the dependency on private APIs, which I think we should clear before moving on.

Also can we export these to the github job summary?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants