Skip to content

Conversation

@zyuiop
Copy link
Contributor

@zyuiop zyuiop commented Sep 17, 2025

Fixes an old FIXME.

A very quick wrk benchmark on the axum example seems to report a 50% performance improvement (both tests were made with #1939 applied, the effect may be different without this patch).

Before:

$ wrk http://localhost:8080/ Running 10s test @ http://localhost:8080/ 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 19.91ms 58.94ms 716.76ms 92.84% Req/Sec 1.10k 342.95 2.09k 72.50% 21844 requests in 10.00s, 3.37MB read Requests/sec: 2184.33 Transfer/sec: 345.57KB 

After:

$ wrk http://localhost:8080/ Running 10s test @ http://localhost:8080/ 2 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 12.06ms 50.31ms 782.84ms 95.71% Req/Sec 1.69k 475.75 3.68k 79.00% 33619 requests in 10.00s, 5.19MB read Requests/sec: 3361.60 Transfer/sec: 531.81KB 
@zyuiop
Copy link
Contributor Author

zyuiop commented Sep 17, 2025

This does not seem to work in all cases

edit: actually maybe it does, but then it requires the no-pre-emptive thing

@zyuiop zyuiop marked this pull request as draft September 17, 2025 14:34
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark Current: 4f5eba0 Previous: 40e0c6e Performance Ratio
startup_benchmark Build Time 135.62 s 136.14 s 1.00
startup_benchmark File Size 0.90 MB 0.90 MB 1.00
Startup Time - 1 core 0.94 s (±0.03 s) 0.94 s (±0.02 s) 1.00
Startup Time - 2 cores 0.93 s (±0.02 s) 0.92 s (±0.03 s) 1.01
Startup Time - 4 cores 0.94 s (±0.02 s) 0.96 s (±0.03 s) 0.98
multithreaded_benchmark Build Time 133.26 s 140.97 s 0.95
multithreaded_benchmark File Size 1.01 MB 1.01 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 2.52 % (±12.10 %) 2.17 % (±10.40 %) 1.16
Multithreaded Pi Efficiency - 4 Threads 1.59 % (±7.62 %) 1.51 % (±7.26 %) 1.05
Multithreaded Pi Efficiency - 8 Threads 0.73 % (±3.48 %) 0.77 % (±3.68 %) 0.95
micro_benchmarks Build Time 164.36 s 171.42 s 0.96
micro_benchmarks File Size 1.01 MB 1.01 MB 1.00
Scheduling time - 1 thread 3.31 ticks (±15.90 ticks) 2.77 ticks (±13.29 ticks) 1.20
Scheduling time - 2 threads 1.78 ticks (±8.55 ticks) 1.75 ticks (±8.39 ticks) 1.02
Micro - Time for syscall (getpid) 0.20 ticks (±0.96 ticks) 0.12 ticks (±0.58 ticks) 1.65
Memcpy speed - (built_in) block size 4096 1474.06 MByte/s (±7075.47 MByte/s) 1816.86 MByte/s (±8720.93 MByte/s) 0.81
Memcpy speed - (built_in) block size 1048576 557.11 MByte/s (±2674.13 MByte/s) 745.11 MByte/s (±3576.55 MByte/s) 0.75
Memcpy speed - (built_in) block size 16777216 205.90 MByte/s (±988.34 MByte/s) 219.45 MByte/s (±1053.36 MByte/s) 0.94
Memset speed - (built_in) block size 4096 991.74 MByte/s (±4760.33 MByte/s) 1875.00 MByte/s (±9000.00 MByte/s) 0.53
Memset speed - (built_in) block size 1048576 1291.16 MByte/s (±6197.55 MByte/s) 1029.20 MByte/s (±4940.18 MByte/s) 1.25
Memset speed - (built_in) block size 16777216 901.25 MByte/s (±4325.99 MByte/s) 924.64 MByte/s (±4438.25 MByte/s) 0.97
Memcpy speed - (rust) block size 4096 1363.64 MByte/s (±6545.45 MByte/s) 1411.76 MByte/s (±6776.47 MByte/s) 0.97
Memcpy speed - (rust) block size 1048576 753.96 MByte/s (±3619.02 MByte/s) 693.90 MByte/s (±3330.73 MByte/s) 1.09
Memcpy speed - (rust) block size 16777216 214.88 MByte/s (±1031.42 MByte/s) 219.67 MByte/s (±1054.40 MByte/s) 0.98
Memset speed - (rust) block size 4096 1791.04 MByte/s (±8597.01 MByte/s) 1791.04 MByte/s (±8597.01 MByte/s) 1
Memset speed - (rust) block size 1048576 1135.46 MByte/s (±5450.21 MByte/s) 1105.00 MByte/s (±5304.01 MByte/s) 1.03
Memset speed - (rust) block size 16777216 918.31 MByte/s (±4407.90 MByte/s) 954.96 MByte/s (±4583.81 MByte/s) 0.96
alloc_benchmarks Build Time 162.68 s 157.27 s 1.03
alloc_benchmarks File Size 0.97 MB 0.97 MB 1.00
Allocations - Allocation success 2.00 % (±13.86 %) 2.00 % (±13.86 %) 1
Allocations - Deallocation success 1.40 % (±9.69 %) 1.40 % (±9.67 %) 1.00
Allocations - Pre-fail Allocations 2.00 % (±13.86 %) 2.00 % (±13.86 %) 1
Allocations - Average Allocation time 259.34 Ticks (±1797.16 Ticks) 262.62 Ticks (±1819.84 Ticks) 0.99
Allocations - Average Allocation time (no fail) 259.34 Ticks (±1797.16 Ticks) 262.62 Ticks (±1819.84 Ticks) 0.99
Allocations - Average Deallocation time 17.12 Ticks (±118.65 Ticks) 17.05 Ticks (±118.18 Ticks) 1.00
mutex_benchmark Build Time 163.32 s 159.65 s 1.02
mutex_benchmark File Size 1.01 MB 1.01 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 0.32 ns (±2.22 ns) 0.36 ns (±2.49 ns) 0.89
Mutex Stress Test Average Time per Iteration - 2 Threads 0.36 ns (±2.49 ns) 0.38 ns (±2.63 ns) 0.95

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant