YAuB - Micro Benchmark Follow-on

Question

Following some great advice here from Simon, I realized that I had over-engineered things, and that the Task builder methods were a horrible Java8 abstraxction. In Simon's words: "From a usability perspective, this is a bit weird...".

After messing with things a bit more, the original use-case like:

uBench.addTask(Task.buildCheckedIntTask("Legato Java7", () -> getMaximumBeauty(line), expect)); uBench.addTask(Task.buildCheckedIntTask("Legato Java8", () -> getMaximumBeauty8(line), expect)); uBench.addTask(Task.buildCheckedIntTask("Janos Java7", () -> computeMaxBeauty(line), expect)); uBench.addTask(Task.buildCheckedIntTask("Rolfl Java7", () -> beautyMax7(line), expect)); uBench.addTask(Task.buildCheckedIntTask("Rolfl Java8Regex", () -> beautyMaxF(line), expect)); uBench.addTask(Task.buildCheckedIntTask("Rolfl Java8Filter", () -> beautyMax8(line), expect));

Can be drastically simplified if the addTask method takes a Supplier directly (instead of a task), and a separate Predicate to check the results. The same code above, can be expressed as:

 uBench.addTask("Legato Java7", () -> getMaximumBeauty(line), g -> g == 1574); uBench.addTask("Legato Java8", () -> getMaximumBeauty8(line), g -> g == 1574); uBench.addTask("Janos Java7", () -> computeMaxBeauty(line), g -> g == 1574); uBench.addTask("Rolfl Java7", () -> beautyMax7(line), g -> g == 1574); uBench.addTask("Rolfl Java8Regex", () -> beautyMaxF(line), g -> g == 1574); uBench.addTask("Rolfl Java8Filter", () -> beautyMax8(line), g -> g == 1574);

(where g is a mnemonic for got). That can in turn be simplified to a single predicate:

Predicate<Integer> check = g -> g == 1574;

and code like:

 uBench.addTask("Legato Java7", () -> getMaximumBeauty(line), check); uBench.addTask("Legato Java8", () -> getMaximumBeauty8(line), check); uBench.addTask("Janos Java7", () -> computeMaxBeauty(line), check); uBench.addTask("Rolfl Java7", () -> beautyMax7(line), check); uBench.addTask("Rolfl Java8Regex", () -> beautyMaxF(line), check); uBench.addTask("Rolfl Java8Filter", () -> beautyMax8(line), check);

In addition, to support primitive-type operations (instead of having to auto-box them - which may impact performance), there needs to be an implementation specialized for each of int, double, and long too.

To that end, I have removed the Task class entirely from the public interface, and incorporated it as an internal-only static class. To reduce the code footprint further, I have moved the TaskStats class in as a static nested class as well. This reduces the entire benchmark code to a single Java file, which has a number of advantages when maintaining or distributing the code.

The new usage of the code is (for context, not for review...):

final String line = "This is a test, including punctuation, and other words" + " and numbers like 1, UPPER, and Lower letters"; IntPredicate expect = (g) -> g == 1574; UBench uBench = new UBench("Beautiful"); uBench.addIntTask("Legato Java7", () -> getMaximumBeauty(line), expect); uBench.addIntTask("Legato Java8", () -> getMaximumBeauty8(line), expect); uBench.addIntTask("Janos Java7", () -> computeMaxBeauty(line), expect); uBench.addIntTask("Rolfl Java7", () -> beautyMax7(line), expect); uBench.addIntTask("Rolfl Java8Regex", () -> beautyMaxF(line), expect); uBench.addIntTask("Rolfl Java8Filter", () -> beautyMax8(line), expect); System.out.println("Warming up"); uBench.benchMark(5000).stream().forEach(System.out::println); System.out.println("\n\nReal runs\n\n"); uBench.benchMark(10000).stream().sorted(Comparator.comparing(UBench.Stats::get95thPercentile)) .forEach(System.out::println);

And the UBench code that supports that (the GitHub revision), is:

package net.tuis.ubench; import java.util.Arrays; import java.util.LinkedHashMap; import java.util.List; import java.util.LongSummaryStatistics; import java.util.Map; import java.util.concurrent.TimeUnit; import java.util.function.DoublePredicate; import java.util.function.DoubleSupplier; import java.util.function.IntPredicate; import java.util.function.IntSupplier; import java.util.function.LongPredicate; import java.util.function.LongSupplier; import java.util.function.Predicate; import java.util.function.Supplier; import java.util.stream.Collectors; import java.util.stream.DoubleStream; import java.util.stream.IntStream; import java.util.stream.LongStream; /** * The UBench class encompasses a suite of tasks that are to be compared... * possibly relative to each other. * <p> * Each task can be added to the suite. Once you have the tasks you need, then * all tasks can be benchmarked according to limits given in the run. * * @author rolf * */ public final class UBench { /** * Statistics representing the runs in this task. * <p> * Presents various statistics related to the run times that are useful for * interpreting the run performance. */ public static final class Stats { private static final double NANOxMILLI = 1000000.0; private final long[] results; private final long min; private final long max; private final double average; private final String suit; private final String name; /** * Construct statistics based on the nanosecond times of multiple runs. * * @param name * The name of the task that has been benchmarked * @param results * The nano-second run times of each successful run. */ Stats(String suit, String name, long[] results) { this.suit = suit; this.name = name; this.results = results; LongSummaryStatistics lss = LongStream.of(results).summaryStatistics(); min = lss.getMin(); max = lss.getMax(); average = lss.getAverage(); } /** * Get the raw data the statistics are based off. * * @return the individual test run times (in nanoseconds, and in order * of execution). */ public long[] getRawData() { return Arrays.copyOf(results, results.length); } /** * Summarize the time-progression of the run time for each iteration, in * order of execution (in milliseconds). * <p> * An example helps. If there are 200 results, and a request for 10 * zones, then return 10 double values representing the average time of * the first 20 runs, then the next 20, and so on, until the 10th zone * contains the average time of the last 20 runs. * <p> * This is a good way to see the effects of warm-up times and different * compile levels * * @param zoneCount * @return */ public final double[] getZoneTimesMilli(int zoneCount) { double[] ret = new double[Math.min(zoneCount, results.length)]; int perblock = results.length / ret.length; int overflow = results.length % ret.length; int pos = 0; for (int block = 0; block < ret.length; block++) { int count = perblock + (block < overflow ? 1 : 0); int limit = pos + count; long nanos = 0; while (pos < limit) { nanos += results[pos]; pos++; } ret[block] = (nanos / NANOxMILLI) / count; } return ret; } /** * Compute a log-2-based histogram relative to the fastest run in the * data set. * <p> * This gives a sense of what the general shape of the runs are in terms * of distribution of run times. The histogram is based on the fastest * run. * <p> * By way of an example, the output: <code>100, 50, 10, 1, 0, 1</code> * would suggest that: * <ul> * <li>100 runs were between 1 times and 2 times as slow as the fastest. * <li>50 runs were between 2 and 4 times slower than the fastest. * <li>10 runs were between 4 and 8 times slower * <li>1 run was between 8 and 16 times slower * <li>1 run was between 32 and 64 times slower * * @return */ public final int[] getHistogramByDoublingFactor() { int count = (int) (max / min); int[] histo = new int[Integer.numberOfTrailingZeros(Integer.highestOneBit(count)) + 1]; LongStream.of(results).mapToInt(t -> Integer.numberOfTrailingZeros(Integer.highestOneBit((int) (t / min)))) .forEach(i -> histo[i]++); return histo; } /** * Compute the 95<sup>th</sup> percentile of runtimes (in milliseconds). * <p> * 95% of all runs completed in this time, or faster. * * @return the millisecond time of the 95<sup>th</sup> percentile. */ public final double get95thPercentile() { if (results.length < 100) { return getSlowest(); } long limit = ((results.length + 1) * 95) / 100; return LongStream.of(results).sorted().limit(limit).max().getAsLong() / NANOxMILLI; } /** * Compute the average time of all runs (in milliseconds). * * @return the average time (in milliseconds) */ public final double getAverage() { return average / NANOxMILLI; } /** * Compute the slowest run (in milliseconds). * * @return The slowest run time (in milliseconds). */ public final double getSlowest() { return max / NANOxMILLI; } /** * Compute the fastest run (in milliseconds). * * @return The fastest run time (in milliseconds). */ public final double getFastest() { return min / NANOxMILLI; } @Override public String toString() { return String.format("Task %s -> %s:\n" + " Iterations : %12d\n" + " Fastest : %12.5fms\n" + " Average : %12.5fms\n" + " 95Pctile : %12.5fms\n" + " Slowest : %12.5fms\n" + " TimeBlock : %s\n" + " FactorHisto : %s\n", suit, name, results.length, getFastest(), getAverage(), get95thPercentile(), getSlowest(), formatMillis(getZoneTimesMilli(10)), formatHisto(getHistogramByDoublingFactor())); } private String formatHisto(int[] histogramByXFactor) { return IntStream.of(histogramByXFactor).mapToObj(i -> String.format("%5d", i)) .collect(Collectors.joining(" ")); } private String formatMillis(double[] zoneTimesMilli) { return DoubleStream.of(zoneTimesMilli).mapToObj(d -> String.format("%.5fms", d)) .collect(Collectors.joining(" ")); } public String getSuit() { return suit; } public String getName() { return name; } } private static class NamedTask { private final String name; private final Task task; public NamedTask(String name, Task task) { super(); this.name = name; this.task = task; } public String getName() { return name; } public Task getTask() { return task; } } @FunctionalInterface private interface Task { long time(); } private final Map<String, Task> tasks = new LinkedHashMap<>(); private final String suiteName; public UBench(String suiteName) { this.suiteName = suiteName; } private void putTask(String name, Task t) { synchronized (tasks) { tasks.put(name, t); } } /** * Include a named task (and validator) in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform * @param check The check of the results from the task. */ public <T> void addTask(String name, Supplier<T> task, Predicate<T> check) { putTask(name, () -> { long start = System.nanoTime(); T result = task.get(); long time = System.nanoTime() - start; if (check != null && !check.test(result)) { throw new IllegalStateException(String.format("Task %s failed Result: %s", name, result)); } return time; }); } /** * Include a named task in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform */ public <T> void addTask(String name, Supplier<T> task) { addTask(name, task, null); } /** * Include an int-specialized named task (and validator) in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform * @param check The check of the results from the task. */ public void addIntTask(String name, IntSupplier task, IntPredicate check) { putTask(name, () -> { long start = System.nanoTime(); int result = task.getAsInt(); long time = System.nanoTime() - start; if (check != null && !check.test(result)) { throw new IllegalStateException(String.format("Task %s failed Result: %s", name, result)); } return time; }); } /** * Include an int-specialized named task in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform */ public void addIntTask(String name, IntSupplier task) { addIntTask(name, task, null); } /** * Include a long-specialized named task (and validator) in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform * @param check The check of the results from the task. */ public void addLongTask(String name, LongSupplier task, LongPredicate check) { putTask(name, () -> { long start = System.nanoTime(); long result = task.getAsLong(); long time = System.nanoTime() - start; if (check != null && !check.test(result)) { throw new IllegalStateException(String.format("Task %s failed Result: %s", name, result)); } return time; }); } /** * Include a long-specialized named task in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform */ public void addLongTask(String name, LongSupplier task) { addLongTask(name, task, null); } /** * Include a double-specialized named task (and validator) in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform * @param check The check of the results from the task. */ public void addDoubleTask(String name, DoubleSupplier task, DoublePredicate check) { putTask(name, () -> { long start = System.nanoTime(); double result = task.getAsDouble(); long time = System.nanoTime() - start; if (check != null && !check.test(result)) { throw new IllegalStateException(String.format("Task %s failed Result: %s", name, result)); } return time; }); } /** * Include a double-specialized named task in to the benchmark. * @param name The name of the task. Only one task with any one name is allowed. * @param task The task to perform */ public void addDoubleTask(String name, DoubleSupplier task) { addDoubleTask(name, task, null); } /** * Benchmark a task until it completes the desired iterations, exceeds the * time limit, or reaches stability, whichever comes first. * * @param iterations * maximum number of iterations to run. * @param minStabilityLen * If this many iterations in a row are all within the * maxVariance, then the benchmark ends. * @param maxVariance * Expressed as a percent from 0.0 to 100.0, and so on * @return the results of all completed tasks. */ public List<Stats> benchMark(final int iterations, final int minStabilityLen, final double maxVariance, final long timeLimit, final TimeUnit timeUnit) { List<NamedTask> mytasks = getTasks(); Stats[] ret = new Stats[mytasks.size()]; int i = 0; for (NamedTask task : mytasks) { ret[i++] = runTask(task, iterations, minStabilityLen, 1 + (maxVariance / 100.0), timeLimit, timeUnit); } return Arrays.asList(ret); } /** * Benchmark all tasks until it they complete the desired elapsed time * * @param iterations * number of iterations to run. * @return the results of all completed tasks. */ public List<Stats> benchMark(final long timeLimit, final TimeUnit timeUnit) { return benchMark(Integer.MAX_VALUE, 0, 100, timeLimit, timeUnit); } /** * Benchmark all tasks until it they complete the desired iteration count * * @param iterations * number of iterations to run. * @return the results of all completed tasks. */ public List<Stats> benchMark(final int iterations) { return benchMark(iterations, 0, 100, 1000, TimeUnit.DAYS); } private List<NamedTask> getTasks() { synchronized (tasks) { return tasks.entrySet().stream().map(e -> new NamedTask(e.getKey(), e.getValue())) .collect(Collectors.toList()); } } private Stats runTask(final NamedTask ntask, final int iterations, final int minStability, final double maxLimit, final long timeLimit, final TimeUnit timeUnit) { long[] results = new long[Math.min(iterations, 10000)]; long[] recents = new long[Math.min(minStability, iterations)]; int rPos = 0; long limit = System.currentTimeMillis() + timeUnit.toMillis(timeLimit); for (int i = 0; i < iterations; i++) { long res = Math.max(ntask.getTask().time(), 1); if (rPos >= results.length) { results = Arrays.copyOf(results, expandTo(results.length)); } if (minStability > 0) { recents[rPos % recents.length] = res; } results[rPos++] = res; if ((timeLimit > 0 && System.currentTimeMillis() >= limit) || (minStability > 0 && rPos >= recents.length && inBounds(recents, maxLimit))) { return new Stats(suiteName, ntask.getName(), Arrays.copyOf(results, rPos)); } } return new Stats(suiteName, ntask.getName(), Arrays.copyOf(results, rPos)); } private int expandTo(int length) { // add 25% + 100 - limit to Integer.Max int toAdd = 100 + (length >> 2); toAdd = Math.min(Integer.MAX_VALUE - length, toAdd); return toAdd + length; } @Override public String toString() { return String.format("%s with tasks: %s", suiteName, tasks.toString()); } /** * Compute whether any of the values in times exceed the given bound, * realtive to the minimum value in times. * * @param times * the times to compute the bounds on * @param bound * the bound is represented as a value like 1.10 for 10% greater * than the minimum * @return true if all values are in bounds. */ private static final boolean inBounds(long[] times, double bound) { long min = times[0]; long max = times[0]; long limit = (long) (min * bound); for (int i = 1; i < times.length; i++) { if (times[i] < min) { min = times[i]; limit = (long) (min * bound); if (max > limit) { return false; } } if (times[i] > max) { max = times[i]; // new max, is it slower than the worst allowed? if (max > limit) { return false; } } } return true; } }

Again, I am looking for any and all feedback, but I am particularly interested in usability concerns and API issues.

Community · Accepted Answer · 2017-04-13 12:40:41Z

For now I'd point out mostly usability issues.

Duplicated logic

The input and the expected value variables are repeated in every task:

uBench.addIntTask("Legato Java7", () -> getMaximumBeauty(line), expect); uBench.addIntTask("Legato Java8", () -> getMaximumBeauty8(line), expect); uBench.addIntTask("Janos Java7", () -> computeMaxBeauty(line), expect); uBench.addIntTask("Rolfl Java7", () -> beautyMax7(line), expect); uBench.addIntTask("Rolfl Java8Regex", () -> beautyMaxF(line), expect); uBench.addIntTask("Rolfl Java8Filter", () -> beautyMax8(line), expect);

It doesn't really make sense to have to repeat this: when comparing a number of alternative implementations, normally you will use the same input for all of them. Of course, you'll probably want to re-run the same methods with different input/output pairs, but do so one at a time. To clarify even further, I don't see a use case for comparing the result of methodA on inputA with the result of methodB on inputB. Maybe there is such a use case, but I don't think that would be the typical case. I think normally you would want to run methodA, methodB, methodC, ... on inputA, then again run the same methods on inputB, then again on inputC, and so on.

One way to avoid repeatedly specifying the same inputs and outputs to each of the tasks could be to store them inside the benchmark instance, by adding .setInput and .setExpectedOutput methods, and let the tasks share that data. For running the same tasks against several input/output pairs, these methods could take varargs. The run method could validate if the input/output pairs are sane.

Task types

It's tedious to have separate .add*Task methods for different return values. It makes your implementation full of duplicated logic, and it forces users of the framework into methods returning specific types. How would I benchmark different search algorithms that sort collections in-place, with no return value?

I'd recommend to take a similar approach as I did in my framework:

Use instance variables to store the initial data and the computation result
Use a thin wrapper around the methods under test. The wrapper passes the input data to the real methods under test, knows how to get the output from the real methods, which can return any type, and store the computation result in an instance field
The validator verifies the result that was written to the instance field

The bottom line is: find a solution to not require prescribed return types. The framework will be easier to use, and the implementation will have less duplicated code. (no more addIntTask, addLongTask, ...)

Too much boilerplate to remember

There's quite a bit of boilerplate to remember to use this framework, especially this part:

System.out.println("Warming up"); uBench.benchMark(5000).stream().forEach(System.out::println); System.out.println("\n\nReal runs\n\n"); uBench.benchMark(10000).stream().sorted(Comparator.comparing(UBench.Stats::get95thPercentile)) .forEach(System.out::println);

Something like this would be nice to achieve the same result:

uBench.benchMark(5000, "Warming up"); uBench.benchMark(10000, "Real runs", UBench.Stats::get95thPercentile);

You could still keep the uBench.benchMark(int) version for "power users".

Looks promising!

A very important usability feature I see here is that all the functionality is easily accessible and intuitive from a UBench instance. One can easily explore the available features in an IDE using auto-completion on method names and hints on parameter types. This is in contrast with an annotation-driven approach that forces users to remember multiple things: the annotation names, and how to trigger the annotation processor that will run the benchmarks.

The reporting features are also great, and something I will definitely ~~shamelessly steal~~ borrow to improve my alternative framework.

Your comment about the boilerplate reporting is most useful. That needs to be resolved, and your suggestions are good ones. The primitive specializations for the Int/Long/Double variants are unavoidable, unless the system implicitly auto-boxes things, but that would impact performance. The line/expect reuse is an artifact of a bad example, and I don't think it is realistic to have the same input/output for each test. In fact, it is ideal to be able to run the same code for different values too. — rolfl
– rolfl, Commented Feb 25, 2015 at 0:02
I think I wasn't clear enough about sharing the input/output. I rewrote that part. As for the Int/Long/... variants, if you make the tasks void, then the framework can become unaware of the number of input params and their types. Take a look at the methods annotated with @MeasureTime in here. They are thin wrappers, containing the logic of calling the real tasks. The framework doesn't have to know. You can probably do something similar. Finally, I added a new section at the end about my favorite feature in this framework. — janos
– janos, Commented Feb 25, 2015 at 7:11
@janos A nice thing that rolfl indirectly has in his framework that you lack in yours is the ability to run in a loop for (int i = 0; i < size; i++) { ubench.addTask(..., solveNQueens(i)) }, i.e. a way to perform parametrized benchmarks. For that purpose, it's not really possible to share the input/output, is it? — Simon Forsberg
– Simon Forsberg, Commented Mar 2, 2015 at 13:54
@SimonAndréForsberg I think it's possible, by using multiple sets of input/ouput data matching the loop index. It might be complicated though. Parameterizing in an ergonomic way is tricky. Btw I didn't give up on my framework (yet), just didn't have time for it. When I do, I might have more insight for Rolf's framework too, and about parameterization issues. — janos
– janos, Commented Mar 2, 2015 at 14:04

Stack Exchange Network

YAuB - Micro Benchmark Follow-on

1 Answer 1

Duplicated logic

Task types

Too much boilerplate to remember

Looks promising!

You must log in to answer this question.

Linked

Hot Network Questions

YAuB - Micro Benchmark Follow-on

1 Answer 1

Duplicated logic

Task types

Too much boilerplate to remember

Looks promising!

You must log in to answer this question.

Linked

Related

Hot Network Questions