Profiling your Java Applica2on @victorrentea ♦ VictorRentea.ro Code: github.com/victorrentea/performance-profiling.git
👉 victorrentea.ro/training-offer 👋 Hi, I'm Victor Rentea 🇷🇴 PhD(CS) Java Champion, 18 years of code, 10 years of teaching Consultant & Trainer for 100+ companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Spring, Hibernate, Reac3ve/WebFlux ⚡ Java Performance, Secure Coding 🔐 Conference Talks on YouTube Founder of European So<ware Cra<ers Community, > 6000 members 🔥 Free 1-hour webinars, a;er work, on clean code, design, tes>ng 👉 victorrentea.ro/community Past events on youtube.com/vrentea Father of 👧👦, servant of a 🐈, weekend gardener 🌼 @victorrentea VictorRentea.ro
3 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community The response *me of your endpoint is too slow 😱 🎉 Congratulations! 🎉 You made it to production, and you're taking serious load! Where to start?
4 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Log timestamps log.info("Before call"); !// 2023-05-23 00:01:50.000 !!... !// suspect code log.info("After call"); !// 2023-05-23 00:01:53.100 !!... Measuring Method Run*me – Old-School Style t1-t0 long t0 = currentTimeMillis(); List<CommentDto> comments = client.fetchComments(loanId); // suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));
5 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer @Timed !// wraps the method with an aspect that monitors spent time List<CommentDto> fetchComments(Long id) { !// suspect code } then, in a Spring application GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400 è avg = 0.25 s/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls è avg = 50 s / 100 = 0.5 s/call (spike) time average[ms] 250 500
6 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer API https://www.baeldung.com/micrometer !// programatic alternative to @Timed meterRegistry.timer("name").record(() -> { !/*suspect code!*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); !// accumulator meterRegistry.counter("earnings").increment(purchase); !// current value meterRegistry.gauge("player-count", currentPlayers.size()); !// eg. "95% requests took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)
7 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community §Micrometer records metrics a) your own b) key framework metrics for common issues c) opBonal metrics about Tomcat, executors, ... §The metrics are regularly polled/pushed to other tools for storage - eg. Prometheus §... and later used by (eg.) Grafana to: - Display impressive charts - Aggregate metrics - Raise alerts - ... Some examples è Observing a System with Micrometer
8 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community § hBp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 hBp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connec>ons_current_connec>ons{...} 298 ⚠ 98 requests wai>ng in queue tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starva>on? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_a;er_gc_percent{...} 45.15 ⚠ >30% => increase max heap? § hikaricp_connec>ons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starva>on? hikaricp_connec>ons_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% cache miss = bad config issue? cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱 § my_own_custom_metric xyz Some Metrics available at /actuator/prometheus
9 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community ? Monitoring a single system is easy!
10 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Compared to a Distributed Ecosystem 1st challenge: trace requests A B C D E Monitoring a single system is easy!
11 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community §TraceID header is read from the incoming HTTP request/MQ message - If missing, it is generated by first system in the call chain §TraceID is propagated inside the app: - Methods called (via ThreadLocal) - Any logs printed (via Logback MDC) – cri>cal for distributed log analysis - Upstream operators in a Reactor chain (via ReactorContext in WebFlux) - Async methods submiBed to a DI-managed *TaskExecutor (via a TaskDecorator) §TraceID header is sent in any outgoing request/message - ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ... Distributed Request Tracing h:ps://www.baeldung.com/spring-cloud-sleuth-single-applica<on A B <header> TraceId
12 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community §The TraceId is available for any processing we do §Every system reports the span of :me spent working on a TraceId - For performance, a frac>on of spans are reported (eg 1%) §... to a central system that builds a :me chart of all systems example è Distributed Request Sampling
13 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin
14 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin
15 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community §Repeated API calls, by accident 😱 §API calls in a loop, should be fetched in batches - GET /item/1, GET /item/3, .... §Long chains of REST calls - A à B à C à D à E §System-in-the-middle §Independent API calls than can be parallelized §Long spans not waiAng for other systems => zoom in 🔎 Distributed Processing An*-PaKerns ⚠
16 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Profiling Blackbox: inter-system communica/on (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè) app lib-a lib-b methodA() methodB()
17 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community How to find methods that take a long 1me? 💡 Naive idea: Instrument your code with a -javaagent that measures the execuPon of every single executed Java method Adds a huge CPU+MEMORY overhead
18 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Java Flight Recorder The Tools of the Java Gods
19 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Java Flight Recorder (JFR) Main features: § Capture stack traces of running threads every second § Record internal JVM events about locks, files, sockets, GC ac>vity, ... or your own § Usable in prod: op>mized to add x-low overhead (<2%) § Free since Java 11 Usage: § Local JVM using Java Mission Control or IntelliJ ⬆ § Remote JVM => generate a .jfr file => download => analyze First, enable JFR via -XX:+FlightRecorder, then start the recording: - Con3nuous from startup: -XX:StartFlightRecording=.... - Via command line (SSH): jcmd <JAVAPID> JFR.start - Via endpoints JMX or Actuator - Via tools, eg Glowroot in our experiment
26 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Flame Graph best way to visualize the output of an execuAon profiler
27 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR samples the stack traces of running threads every second
28 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of each bar is propor?onal to the number of samples (?me) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
29 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Data is merged by line number 2 samples (33%) 4 samples (66%) Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
30 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) IN REAL LIFE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ... Data is merged by line number Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second BLOCK NETWORK
31 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world model) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker hDp://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy github.com/Shopify/toxiproxy installed locally localhost:55432 +3ms network latency JFR
32 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community EXPERIMENT
33 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Glowroot.org §Extremely simple application profiler - Add VM arg: -javaagent:/path/to/glowroot.jar - Open http://localhost:4000 §Monitors - Methods Profiling, displayed as Flame Graphs - SQLs - API calls - Typical bottlenecks - Data per endpoint §Free - Probably not production-ready, more for learning/debugging
35 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiting for SQL server JDBC Connection Pool Starvation
36 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community False Split due to Just-In-Time Compiler (JIT) Optimization The Same Method is called via 2 paths !! Because JIT optimized a reflection call (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor Fix: Warmup JVM: record in prod, or re-run load test
37 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Time-consuming Aspects By default, TransactionInterceptor (@Transactional) acquires a connection before entering the annotated method :382 :388
38 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Lock Contention Automatic analysis by Java Mission Control of the recorded .jfr file: https://wiki.openjdk.org/display/jmc/Releases ß Waiting to enter a synchronized method à (native C code is not recorded by JFR)
39 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * (N x remove) * Expected to run super-fast O(size(list)) * In Production, with 10.000 items ... 2 SECONDS !!!! WHY?!! Gory Library QUIRKS
40 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Flamegraph shows: HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * 2 seconds are lost doing ArrayList.contains !! Gory Library QUIRKS
41 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Gory Library QUIRKS HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * If the collection C passed as parameter is larger then current HashSet, iterate the elements of the HashSet and remove them if C.contains(e) = O(N), since in our case, C was a List
43 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community Best Feature of Java 21 LTS? 🎉 🎉 🎉 GA on 19 sep 2023 🎉 🎉 🎉 Virtual Threads
44 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20
45 VictorRentea.ro a training by Join the European Software Crafters community at victorrentea.ro/community JFR Profiling – Lessons Learned ✅ "Sees" inside unknown code / library ❌ Can't see into naAve C code ⚠ 1 sample/second è requires load tests or ran in prod ✅ Can be used in produc5on (x-low overhead) ⚠ Tuning requires paAence, expectaAons, and library awareness ⚠ Sampling Biases and JIT-influenced if profiling CPU-bound code ❌ Cannot follow async calls/reacAve chains (change threads) ✅ Can record Virtual Threads[JEP-444 in Java 21 LTS] and thread forking done using Structured Concurrency[JEP-428 in Java19+]
47 VictorRentea.ro a training by Join the European So3ware Cra3ers community at victorrentea.ro/community Conclusions §Zoom in on the performance issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If bo_leneck is sAll unclear, or you can't use above tools ⬆, record prod or load-tests using JFR - Load the .jfr recording in IntelliJ or in JMC for automa6c analysis §OpAmize code & config, measuring the benefit vs. risk/pain
Profiling your Java Applica2on victor.rentea@gmail.com ♦ ♦ @victorrentea ♦ VictorRentea.ro Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community

Profiling your Java Application

  • 1.
    Profiling your JavaApplica2on @victorrentea ♦ VictorRentea.ro Code: github.com/victorrentea/performance-profiling.git
  • 2.
    👉 victorrentea.ro/training-offer 👋 Hi,I'm Victor Rentea 🇷🇴 PhD(CS) Java Champion, 18 years of code, 10 years of teaching Consultant & Trainer for 100+ companies: ❤ Clean Code, Architecture, Unit Tes3ng 🛠 Spring, Hibernate, Reac3ve/WebFlux ⚡ Java Performance, Secure Coding 🔐 Conference Talks on YouTube Founder of European So<ware Cra<ers Community, > 6000 members 🔥 Free 1-hour webinars, a;er work, on clean code, design, tes>ng 👉 victorrentea.ro/community Past events on youtube.com/vrentea Father of 👧👦, servant of a 🐈, weekend gardener 🌼 @victorrentea VictorRentea.ro
  • 3.
    3 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community The response *me of your endpoint is too slow 😱 🎉 Congratulations! 🎉 You made it to production, and you're taking serious load! Where to start?
  • 4.
    4 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Log timestamps log.info("Before call"); !// 2023-05-23 00:01:50.000 !!... !// suspect code log.info("After call"); !// 2023-05-23 00:01:53.100 !!... Measuring Method Run*me – Old-School Style t1-t0 long t0 = currentTimeMillis(); List<CommentDto> comments = client.fetchComments(loanId); // suspect code long t1 = currentTimeMillis(); System.out.println("Took millis: " + (t1 - t0));
  • 5.
    5 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer @Timed !// wraps the method with an aspect that monitors spent time List<CommentDto> fetchComments(Long id) { !// suspect code } then, in a Spring application GET /actuator/prometheus returns: method_timed_seconds_sum{method="fetchComments",} 100.0 method_timed_seconds_count{method="fetchComments",} 400 è avg = 0.25 s/call sampled again after 5 seconds: method_timed_seconds_sum{method="fetchComments",} 150.0 – 100 = 50 s method_timed_seconds_count{method="fetchComments",} 500 – 400 = 100 calls è avg = 50 s / 100 = 0.5 s/call (spike) time average[ms] 250 500
  • 6.
    6 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Micrometer API https://www.baeldung.com/micrometer !// programatic alternative to @Timed meterRegistry.timer("name").record(() -> { !/*suspect code!*/ }); meterRegistry.timer("name").record(t1 - t0, MILLISECONDS); !// accumulator meterRegistry.counter("earnings").increment(purchase); !// current value meterRegistry.gauge("player-count", currentPlayers.size()); !// eg. "95% requests took ≤ 100 ms" @Timed(percentiles = {0.95}, histogram = true)
  • 7.
    7 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §Micrometer records metrics a) your own b) key framework metrics for common issues c) opBonal metrics about Tomcat, executors, ... §The metrics are regularly polled/pushed to other tools for storage - eg. Prometheus §... and later used by (eg.) Grafana to: - Display impressive charts - Aggregate metrics - Raise alerts - ... Some examples è Observing a System with Micrometer
  • 8.
    8 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community § hBp_server_requests_seconds_sum{method="GET",uri="/profile/showcase/{id}",} 165.04 hBp_server_requests_seconds_count{method="GET",uri="/profile/showcase/{id}",} 542.0 § tomcat_connec>ons_current_connec>ons{...} 298 ⚠ 98 requests wai>ng in queue tomcat_threads_busy_threads{...} 200 ⚠ = using 100% of the threads => thread pool starva>on? § jvm_gc_memory_allocated_bytes_total 1.8874368E9 jvm_memory_usage_a;er_gc_percent{...} 45.15 ⚠ >30% => increase max heap? § hikaricp_connec>ons_acquire_seconds_sum{pool="HikariPool-1",} 80.37 ⚠ conn pool starva>on? hikaricp_connec>ons_acquire_seconds_count{pool="HikariPool-1",} 551.0 § cache_gets_total{cache="cache1",result="miss",} 0.99 ⚠ 99% cache miss = bad config issue? cache_gets_total{cache="cache1",result="hit",} 0.01 § logback_events_total{level="error",} 1255.0 ⚠ + 1000 than 5 seconds ago => 😱 § my_own_custom_metric xyz Some Metrics available at /actuator/prometheus
  • 9.
    9 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community ? Monitoring a single system is easy!
  • 10.
    10 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Compared to a Distributed Ecosystem 1st challenge: trace requests A B C D E Monitoring a single system is easy!
  • 11.
    11 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §TraceID header is read from the incoming HTTP request/MQ message - If missing, it is generated by first system in the call chain §TraceID is propagated inside the app: - Methods called (via ThreadLocal) - Any logs printed (via Logback MDC) – cri>cal for distributed log analysis - Upstream operators in a Reactor chain (via ReactorContext in WebFlux) - Async methods submiBed to a DI-managed *TaskExecutor (via a TaskDecorator) §TraceID header is sent in any outgoing request/message - ... sent via a DI-injected bean: @Autowired RestTemplate, RabbitTemplate, ... Distributed Request Tracing h:ps://www.baeldung.com/spring-cloud-sleuth-single-applica<on A B <header> TraceId
  • 12.
    12 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §The TraceId is available for any processing we do §Every system reports the span of :me spent working on a TraceId - For performance, a frac>on of spans are reported (eg 1%) §... to a central system that builds a :me chart of all systems example è Distributed Request Sampling
  • 13.
    13 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin
  • 14.
    14 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Distributed Request Sampling with Zipkin
  • 15.
    15 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community §Repeated API calls, by accident 😱 §API calls in a loop, should be fetched in batches - GET /item/1, GET /item/3, .... §Long chains of REST calls - A à B à C à D à E §System-in-the-middle §Independent API calls than can be parallelized §Long spans not waiAng for other systems => zoom in 🔎 Distributed Processing An*-PaKerns ⚠
  • 16.
    16 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Profiling Blackbox: inter-system communica/on (Zipkin, ...) Graybox: report key framework metrics (Micrometer) Whitebox: profile methods inside JVM (Java Flight Recorderè) app lib-a lib-b methodA() methodB()
  • 17.
    17 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community How to find methods that take a long 1me? 💡 Naive idea: Instrument your code with a -javaagent that measures the execuPon of every single executed Java method Adds a huge CPU+MEMORY overhead
  • 18.
    18 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Java Flight Recorder The Tools of the Java Gods
  • 19.
    19 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Java Flight Recorder (JFR) Main features: § Capture stack traces of running threads every second § Record internal JVM events about locks, files, sockets, GC ac>vity, ... or your own § Usable in prod: op>mized to add x-low overhead (<2%) § Free since Java 11 Usage: § Local JVM using Java Mission Control or IntelliJ ⬆ § Remote JVM => generate a .jfr file => download => analyze First, enable JFR via -XX:+FlightRecorder, then start the recording: - Con3nuous from startup: -XX:StartFlightRecording=.... - Via command line (SSH): jcmd <JAVAPID> JFR.start - Via endpoints JMX or Actuator - Via tools, eg Glowroot in our experiment
  • 20.
    26 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Flame Graph best way to visualize the output of an execuAon profiler
  • 21.
    27 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) JFR samples the stack traces of running threads every second
  • 22.
    28 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Flame.g(Flame.java:24) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.entry(Flame.java:17) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) Length of each bar is propor?onal to the number of samples (?me) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
  • 23.
    29 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) Data is merged by line number 2 samples (33%) 4 samples (66%) Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second Flame.dummy(Flame.java:14)
  • 24.
    30 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Thread.sleep(Native Method) Thread.sleep(Native Method) Flame.f(Flame.java:21) Flame.g(Flame.java:24) Flame.entry(Flame.java:17) Flame.entry(Flame.java:18) Flame.dummy(Flame.java:14) 2 samples (33%) 4 samples (66%) IN REAL LIFE Unsafe.park(Native Method) ......50 more lines↕...... SocketInputStream.socketRead0(Native Method) ......40 more lines↕...... 423 210 After recording 1 minute of heavy parallel load: .... 100 lines of Aspects, Spring, Tomcat ... Data is merged by line number Length of each bar is proportional to the number of samples (time) Building a Flame Graph JFR samples the stack traces of running threads every second BLOCK NETWORK
  • 25.
    31 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community ProfiledApp Spring Boot 2.7 http://localhost:8080 Load Tests Java app (Gatling.io) 23 parallel requests (closed world model) PostgreSQL in a Local Docker localhost:5432 WireMock in a Local Docker hDp://localhost:9999 remote API responses with 40ms delay Glowroot.org -javaagent:glowroot.jar 👉 http://localhost:4000 Experiment Environment ToxiProxy github.com/Shopify/toxiproxy installed locally localhost:55432 +3ms network latency JFR
  • 26.
    32 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community EXPERIMENT
  • 27.
    33 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Glowroot.org §Extremely simple application profiler - Add VM arg: -javaagent:/path/to/glowroot.jar - Open http://localhost:4000 §Monitors - Methods Profiling, displayed as Flame Graphs - SQLs - API calls - Typical bottlenecks - Data per endpoint §Free - Probably not production-ready, more for learning/debugging
  • 28.
    35 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Split Points ß 85% of the query time is spent acquiring a JDBC Connection à (aka connection starvation issue) Actual waiting for SQL server JDBC Connection Pool Starvation
  • 29.
    36 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community False Split due to Just-In-Time Compiler (JIT) Optimization The Same Method is called via 2 paths !! Because JIT optimized a reflection call (1)NativeMethodAccessorImpl è (2)GeneratedMethodAccessor Fix: Warmup JVM: record in prod, or re-run load test
  • 30.
    37 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Time-consuming Aspects By default, TransactionInterceptor (@Transactional) acquires a connection before entering the annotated method :382 :388
  • 31.
    38 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Lock Contention Automatic analysis by Java Mission Control of the recorded .jfr file: https://wiki.openjdk.org/display/jmc/Releases ß Waiting to enter a synchronized method à (native C code is not recorded by JFR)
  • 32.
    39 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * (N x remove) * Expected to run super-fast O(size(list)) * In Production, with 10.000 items ... 2 SECONDS !!!! WHY?!! Gory Library QUIRKS
  • 33.
    40 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Flamegraph shows: HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * 2 seconds are lost doing ArrayList.contains !! Gory Library QUIRKS
  • 34.
    41 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Gory Library QUIRKS HashSet<Long> largeSet = ...; ß ~10K items largeSet.remove(13); ß constant time O(1) List<Long> largerList = ...; ß ~10K items largeSet.removeAll(largerList); * If the collection C passed as parameter is larger then current HashSet, iterate the elements of the HashSet and remove them if C.contains(e) = O(N), since in our case, C was a List
  • 35.
    43 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community Best Feature of Java 21 LTS? 🎉 🎉 🎉 GA on 19 sep 2023 🎉 🎉 🎉 Virtual Threads
  • 36.
    44 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community JFR recording of ❤Virtual Threads❤ in a Spring Boot 3 app running on Java 19/20
  • 37.
    45 VictorRentea.ro a trainingby Join the European Software Crafters community at victorrentea.ro/community JFR Profiling – Lessons Learned ✅ "Sees" inside unknown code / library ❌ Can't see into naAve C code ⚠ 1 sample/second è requires load tests or ran in prod ✅ Can be used in produc5on (x-low overhead) ⚠ Tuning requires paAence, expectaAons, and library awareness ⚠ Sampling Biases and JIT-influenced if profiling CPU-bound code ❌ Cannot follow async calls/reacAve chains (change threads) ✅ Can record Virtual Threads[JEP-444 in Java 21 LTS] and thread forking done using Structured Concurrency[JEP-428 in Java19+]
  • 38.
    47 VictorRentea.ro a trainingby Join the European So3ware Cra3ers community at victorrentea.ro/community Conclusions §Zoom in on the performance issue via distributed tracing (Zipkin) §Study default metrics collected by Micrometer & expose more §If bo_leneck is sAll unclear, or you can't use above tools ⬆, record prod or load-tests using JFR - Load the .jfr recording in IntelliJ or in JMC for automa6c analysis §OpAmize code & config, measuring the benefit vs. risk/pain
  • 39.
    Profiling your JavaApplica2on victor.rentea@gmail.com ♦ ♦ @victorrentea ♦ VictorRentea.ro Find the code at: https://github.com/victorrentea/performance-profiling.git Join us! victorrentea.ro/community