E2E performance testing, profiling, and analysis at Redis Ltd Filipe Oliveira Performance Engineer @Redis Ltd Feb, 2022
> whoami 2 ■ Working on continuous performance analysis ■ ~3 years working for Redis Ltd ■ Open Source Contributor (C, Go): github.com/filipecosta90 ● improve/develop open source performance/observability tools ■ https://github.com/HdrHistogram/hdrhistogram-go ■ https://github.com/RedisBloom/t-digest-c
3 > agenda ■ Performance @Redis Ltd ■ The “old behaviour” ■ The do’s and dont’s ■ Our approach ■ TODOs
>performance @Redis 4 Vanilla Redis (purely OSS project) 1. foster benchmark and observability standards across community and vendors 2. support the contributions of other members to the OSS projects a. performance numbers b. performance how-tos c. or the means to properly assess the performance impact of the change they’re proposing. 3. optimize an industry-leading solution Redis Ltd 1. Raise awareness for a proactive performance culture in the company 2. Empower developers with performance data about their code. a. Assurance b. Accountability 3. Intercept regressions 4. Understand the baseline performance of new features 5. Improve product’s performance 6. Determinism to performance analysis
Ordinarily, on our companies core products 5 We have... ● automatic extensive tests to catch functional failures ...but when ● we accidentally commit a performance regression, nothing intercepts it*!
> a real case from 2019 6
> a real case from 2019 7 Simple request 1. RediSearch minor version bump 2. Required multiple patch a. Feedback cycle took us at-least 1 day b. prioritized over other projects c. Siloed d. Jul. 30, Nov. 27, 2019 You can relate to... ● your team run performance tests before releasing
> a real case from 2019 8 Simple request 1. RediSearch minor version bump 2. Required multiple patch a. Feedback cycle took us at-least 1 day b. prioritized over other projects c. Siloed d. Jul. 30, Nov. 27, 2019 You can relate to... ● your team run performance tests before releasing
Ordinarily, on our companies core products 9 You can state... ● your team run performance tests before releasing ...but solving slowdowns just before releasing is... ● dangerous ● time-consuming ● one of the most difficult tasks to estimate time to
Ordinarily, on our companies core products 10 You can state... ● your team run performance tests before releasing ...doing so is just buffering potential issues!
> goal: reduce feedback cycle. avoid silos 11 Requirements for valid tests - Stable testing environment - Deterministic testing tools - Deterministic outcomes - Reduced testing/probing overhead - Reduce tested changes to the minimal Requirements for acceptance in products - Acceptable duration - No manual work - Actionable items - Well defined key performance indicators CODE REVIEW PREVIEW / UNSTABLE RELEASE MANUAL PERF CHECK CODE REVIEW PREVIEW / UNSTABLE RELEASE ZERO TOUCH PERF CHECK ZERO TOUCH PERF CHECK ZERO TOUCH PERF CHECK
12 > this is not new / disruptive Elastic https://elasticsearch-benchmarks.elastic.co/# Lucene https://home.apache.org/~mikemccand/lucenebench/
13 > this is not new / disruptive mongoDB
> our approach (lane A) 14 Vanilla Redis (purely OSS project) 1. Created an OSS SPEC a. https://github.com/redis/redis-benchmarks-specification/ 2. Extend the spec and use it a. for historical data b. for regression analysis c. for docs A [1] join: https://github.com/redis/redis-benchmarks-specification/#joining-the-performance-initiative Redis Developers Group + (Redis Ltd, AWS, Ericson, Alibaba, …. )
> our approach (lane B) 15 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B [1] https://github.com/RedisTimeSeries/RedisTimeSeries/tree/master/tests/benchmarks [2] https://github.com/RedisLabsModules/redisbench-admin
by branch scalability analysis > our approach (lane B) 16 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B by version
> our approach (lane B) 17 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B
> our approach (lane B) 18 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B nightly: feature* / perf* / v*:
> our approach (lane B) 19 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B
> our approach (lane B) 20 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B 1. Full process Flame Graph + main thread Flame Graph 2. perf report per dso 3. perf report per dso,sym (w/wout callgraph) 4. perf report per dso,sym,srcline (w/wout callgraph) 5. identical stacks collapsed 6. hotpatch callgraph 1 3 2 4 6
> our approach (lane B) 21 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B Analysis: https://github.com/RedisTimeSeries/RedisTimeSeries/issues/793 PR: https://github.com/RedisTimeSeries/RedisTimeSeries/pull/794 Live in progress: https://github.com/RedisTimeSeries/RedisTimeSeries/issues/907
22 > what we’ve gained ● Deeply reduced the feedback cycle ( days -> 1hour ) ● Dev’s can easily add tests ● Scaled + more challenging! ● Finding performance problems/points of improvement is now everyone’s power/responsibility
23 > what we’ve gained ● A/B test new tech/state-of-the-art HW/SW components ● Continuous up-to-date numbers for use-cases that matter ● Foster openness / unbiased community/cross-company efforts
24 > what’s next ● Feature parity on OSS platform and Company platform ● Extend profiler daemon to bpf tooling, vtune ○ off-cpu analysis ○ threading/locking ○ vectorization reports VISIBILITY for Points of Improvement
25 > what’s next ● Improve anomaly/regression detection ● Increase OSS / Company adoption ○ expose data on docs
26 > what’s next ● we will share the updates on DevOps Pro EU 22 <May 22>
thank you we’re hiring! performance@redis.com 27

End-to-end performance testing, profiling, and analysis at Redis

  • 1.
    E2E performance testing,profiling, and analysis at Redis Ltd Filipe Oliveira Performance Engineer @Redis Ltd Feb, 2022
  • 2.
    > whoami 2 ■ Workingon continuous performance analysis ■ ~3 years working for Redis Ltd ■ Open Source Contributor (C, Go): github.com/filipecosta90 ● improve/develop open source performance/observability tools ■ https://github.com/HdrHistogram/hdrhistogram-go ■ https://github.com/RedisBloom/t-digest-c
  • 3.
    3 > agenda ■ Performance@Redis Ltd ■ The “old behaviour” ■ The do’s and dont’s ■ Our approach ■ TODOs
  • 4.
    >performance @Redis 4 Vanilla Redis(purely OSS project) 1. foster benchmark and observability standards across community and vendors 2. support the contributions of other members to the OSS projects a. performance numbers b. performance how-tos c. or the means to properly assess the performance impact of the change they’re proposing. 3. optimize an industry-leading solution Redis Ltd 1. Raise awareness for a proactive performance culture in the company 2. Empower developers with performance data about their code. a. Assurance b. Accountability 3. Intercept regressions 4. Understand the baseline performance of new features 5. Improve product’s performance 6. Determinism to performance analysis
  • 5.
    Ordinarily, on ourcompanies core products 5 We have... ● automatic extensive tests to catch functional failures ...but when ● we accidentally commit a performance regression, nothing intercepts it*!
  • 6.
    > a realcase from 2019 6
  • 7.
    > a realcase from 2019 7 Simple request 1. RediSearch minor version bump 2. Required multiple patch a. Feedback cycle took us at-least 1 day b. prioritized over other projects c. Siloed d. Jul. 30, Nov. 27, 2019 You can relate to... ● your team run performance tests before releasing
  • 8.
    > a realcase from 2019 8 Simple request 1. RediSearch minor version bump 2. Required multiple patch a. Feedback cycle took us at-least 1 day b. prioritized over other projects c. Siloed d. Jul. 30, Nov. 27, 2019 You can relate to... ● your team run performance tests before releasing
  • 9.
    Ordinarily, on ourcompanies core products 9 You can state... ● your team run performance tests before releasing ...but solving slowdowns just before releasing is... ● dangerous ● time-consuming ● one of the most difficult tasks to estimate time to
  • 10.
    Ordinarily, on ourcompanies core products 10 You can state... ● your team run performance tests before releasing ...doing so is just buffering potential issues!
  • 11.
    > goal: reducefeedback cycle. avoid silos 11 Requirements for valid tests - Stable testing environment - Deterministic testing tools - Deterministic outcomes - Reduced testing/probing overhead - Reduce tested changes to the minimal Requirements for acceptance in products - Acceptable duration - No manual work - Actionable items - Well defined key performance indicators CODE REVIEW PREVIEW / UNSTABLE RELEASE MANUAL PERF CHECK CODE REVIEW PREVIEW / UNSTABLE RELEASE ZERO TOUCH PERF CHECK ZERO TOUCH PERF CHECK ZERO TOUCH PERF CHECK
  • 12.
    12 > this isnot new / disruptive Elastic https://elasticsearch-benchmarks.elastic.co/# Lucene https://home.apache.org/~mikemccand/lucenebench/
  • 13.
    13 > this isnot new / disruptive mongoDB
  • 14.
    > our approach(lane A) 14 Vanilla Redis (purely OSS project) 1. Created an OSS SPEC a. https://github.com/redis/redis-benchmarks-specification/ 2. Extend the spec and use it a. for historical data b. for regression analysis c. for docs A [1] join: https://github.com/redis/redis-benchmarks-specification/#joining-the-performance-initiative Redis Developers Group + (Redis Ltd, AWS, Ericson, Alibaba, …. )
  • 15.
    > our approach(lane B) 15 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B [1] https://github.com/RedisTimeSeries/RedisTimeSeries/tree/master/tests/benchmarks [2] https://github.com/RedisLabsModules/redisbench-admin
  • 16.
    by branch scalability analysis >our approach (lane B) 16 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B by version
  • 17.
    > our approach(lane B) 17 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B
  • 18.
    > our approach(lane B) 18 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B nightly: feature* / perf* / v*:
  • 19.
    > our approach(lane B) 19 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B
  • 20.
    > our approach(lane B) 20 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B 1. Full process Flame Graph + main thread Flame Graph 2. perf report per dso 3. perf report per dso,sym (w/wout callgraph) 4. perf report per dso,sym,srcline (w/wout callgraph) 5. identical stacks collapsed 6. hotpatch callgraph 1 3 2 4 6
  • 21.
    > our approach(lane B) 21 Redis Ltd 1. Started by the small scale projects a. Redis Module’s 2. Initial focus on OSS perf deployments 3. local and remote triggers 4. Used for testing, profiling a. Regression analysis i. and fix b. Approval of features c. Proactive optimization B Analysis: https://github.com/RedisTimeSeries/RedisTimeSeries/issues/793 PR: https://github.com/RedisTimeSeries/RedisTimeSeries/pull/794 Live in progress: https://github.com/RedisTimeSeries/RedisTimeSeries/issues/907
  • 22.
    22 > what we’vegained ● Deeply reduced the feedback cycle ( days -> 1hour ) ● Dev’s can easily add tests ● Scaled + more challenging! ● Finding performance problems/points of improvement is now everyone’s power/responsibility
  • 23.
    23 > what we’vegained ● A/B test new tech/state-of-the-art HW/SW components ● Continuous up-to-date numbers for use-cases that matter ● Foster openness / unbiased community/cross-company efforts
  • 24.
    24 > what’s next ●Feature parity on OSS platform and Company platform ● Extend profiler daemon to bpf tooling, vtune ○ off-cpu analysis ○ threading/locking ○ vectorization reports VISIBILITY for Points of Improvement
  • 25.
    25 > what’s next ●Improve anomaly/regression detection ● Increase OSS / Company adoption ○ expose data on docs
  • 26.
    26 > what’s next ●we will share the updates on DevOps Pro EU 22 <May 22>
  • 27.