Java in Flames Flame graphs: Visualization of profiled software M. Isuru Tharanga Chrishantha Perera Technical Lead at WSO2, Co-organizer of Java Colombo Meetup
Profiling Software ● Profiling can help you to analyze the performance of your applications and improve poorly performing sections in your code
Java Profiling Tools Available in JDK ● Java VisualVM ● Java Mission Control
Other Java Profiling Tools ● JProfiler - A commercially licensed Java profiling tool developed by ej-technologies ● Honest Profiler - A sampling JVM profiler without the safepoint sample bias ● Async Profiler - Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
Java Profiling Tools Survey by RebelLabs in 2016: http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html
Attitude toward performance work Survey by RebelLabs in 2017: https://zeroturnaround.com/rebellabs/developer-productivity-survey-2017/
Measuring Methods for CPU Profiling Sampling: Monitor running code externally and check which code is executed Instrumentation: Include measurement code into the real code
Sampling main() foo() bar()
Instrumentation main() foo() bar()
Sampling vs. Instrumentation Sampling: ● Overhead depends on the sampling interval ● Can see execution hotspots ● Can miss methods, which returns faster than the sampling interval. Instrumentation: ● Precise measurement for execution times ● More data to process
How Profilers Work? ● Generic profilers rely on the JVMTI spec ● JVMTI offers only safepoint sampling stack trace collection options ● Some profilers use AsyncGetCallTrace method, which is an OpenJDK internal API call to facilitate non-safepoint collection of stack traces
Safepoints ● A safepoint is a moment in time when a thread’s data, its internal state and representation in the JVM are, well, safe for observation by other threads in the JVM. ○ Between every 2 bytecodes (interpreter mode) ○ Backedge of non-’counted’ loops ○ Method exit ○ JNI call exit
Flame Graphs ● “Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.” ● Developed by Brendan Gregg, an industry expert in computing performance and cloud computing. ● Flame Graphs can be generated using https://github.com/brendangregg/FlameGraph ○ This creates an interactive SVG http://www.brendangregg.com/flamegraphs.html
Flame Graph Example
Flame Graph: Definition ● The x-axis shows the stack profile population, sorted alphabetically ● The y-axis shows stack depth ○ The top edge shows what is on-CPU, and beneath it is its ancestry ● Each rectangle represents a stack frame. ● Box width is proportional to the total time a function was profiled directly or its children were profiled
Types of Flame Graphs ● CPU - see which code-paths are hot (busy on-CPU) ● Memory - Memory Leak (and Growth) ● Off-CPU - Time spent by processes and threads when they are not running on-CPU ● Hot/Cold - both CPU and Off-CPU ● Differential - compare before and after flame graphs
Why do we need Flame Graphs? ● Finding out why CPUs are busy is an important task when troubleshooting performance issues ● Can use a sampling profiler to see which code-paths are hot. ● Usually a profiler will dump a lot of data with thousands of lines ● Flame Graph can simply visualize the stack traces output of a sampling profiler.
Naive Profiling: Taking Thread Dumps ● “A thread dump is a snapshot of the state of all threads that are part of the process.” ● The state of the thread is represented with a stack trace. ● A thread can be in only one state at a given point in time. ● You can take thread dumps at regular intervals to do “Naive Java Profiling”
Sample program to profile ● Get Sample “highcpu” program from https://github.com/chrishantha/sample-java-programs ● mvn clean install ● cd highcpu ● java -jar target/highcpu.jar --help
Flame Graph with Thread Dumps i=0; while (( i++ < 30 )); do jstack $(pgrep -f highcpu) >> out.jstacks; sleep 2; done cat out.jstacks | $FLAMEGRAPH_DIR/stackcollapse-jstack.pl > out.stacks-folded cat out.stacks-folded | $FLAMEGRAPH_DIR/flamegraph.pl > jstack_flamegraph.svg firefox jstack_flamegraph.svg
Flame Graph with Thread Dumps
Flame Graph with Thread Dumps (Without Thread Names) Top edge shows the methods on-CPU directly Visually compare lengths AncestryCode path Branches
Flame Graphs with Java Flight Recordings ● We can generate CPU Flame Graphs from a Java Flight Recording ● Program is available at GitHub: https://github.com/chrishantha/jfr-flame-graph ● The program uses the (unsupported) JMC Parser
Java Flight Recorder (JFR) ● A profiling and event collection framework built into the Oracle JDK ● Gather low level information about the JVM and application behaviour without performance impact (less than 2%) ● Always on Profiling in Production Environments ● Engine was released with Java 7 update 4 ● Commercial feature in Oracle JDK ● A main tool in Java Mission Control (since Java 7 update 40)
Generating a Flame Graph using JFR dump ● JFR has Method Profiling Samples ○ You can view those in “Hot Methods” and “Call Tree” tabs ● A Flame Graph can be generated using these Method Profilings Samples ● Use following to improve the accuracy of JFR Method Profiler. ● -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
Profiling the Sample Program ● Get a Profiling Recording ○ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=10s,duration=1m,name=Profiling,filena me=highcpu_profiling.jfr,settings=profile -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
Tree View (in JFR)
Using jfr-flame-graph create_flamegraph.sh -f highcpu_profiling.jfr -i > jfr_flamegraph.svg
Java Mixed-Mode Flame Graphs ● With Java Profilers, we can get information about Java process only. ● However with Java Mixed-Mode Flame Graphs, we can see how much CPU time is spent in Java methods, system libraries and the kernel. ● Mixed-mode means that the Flame Graph shows profile information from both system code paths and Java code paths.
Linux Perf (perf_events) ● System profiler ● Userspace + Kernel
Installing “perf_events” on Ubuntu ● On terminal, type perf ● sudo apt install linux-tools-common ● sudo apt install linux-tools-generic
The Problem with Java and Perf ● perf needs the Java symbol table. JVM doesn’t preserve frame pointers by default. ● Run sample program ○ java -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● Run perf record ○ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60 ● Display trace output ○ sudo perf script
No Java Frames!
Preserving Frame Pointers in JVM ● Run java program with the JVM flag "-XX:+PreserveFramePointer" ○ java -XX:+PreserveFramePointer -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● This flag is working only on JDK 8 update 60 and above. ● Some frames may be still missing when compared to Flame Graphs generated from JFR or jstack due to “inlining”. ● Can reduced the amount of inlining if you need to see more frames in the profile. ○ For example, -XX:InlineSmallCode=500
How to generate Java symbol table ● Use a java agent to generate method mappings to use with the linux `perf` tool ○ Clone & Build https://github.com/jvm-profiling-tools/perf-map-agent ● Create symbol map ○ ./create-java-perf-map.sh `pgrep -f highcpu` ● You can also use “jmaps” tool in FlameGraph repository to create symbol files for all Java processes. ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Let Java to “warm-up” before getting symbol maps.
Generate Java Mixed-Mode Flame Graph ● Run perf and create symbol map ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Generate Flame Graph ○ sudo perf script -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace | ○ stackcollapse-perf.pl --pid | grep java-`pgrep -f highcpu` | ○ flamegraph.pl --color=java --hash --width 1080 > java-mixed-mode.svg ○ firefox java-mixed-mode.svg
Java Mixed-Mode Flame Graph
Java Mixed-Mode Flame Graph for Netty
Java Mixed-Mode Flame Graph ● Helps to understand Java CPU Usage ● With Flame Graphs, we can see both java and system profiles ● Can profile GC as well
Thank you! Any questions?

Java in flames

  • 1.
    Java in Flames Flamegraphs: Visualization of profiled software M. Isuru Tharanga Chrishantha Perera Technical Lead at WSO2, Co-organizer of Java Colombo Meetup
  • 2.
    Profiling Software ● Profilingcan help you to analyze the performance of your applications and improve poorly performing sections in your code
  • 3.
    Java Profiling ToolsAvailable in JDK ● Java VisualVM ● Java Mission Control
  • 4.
    Other Java ProfilingTools ● JProfiler - A commercially licensed Java profiling tool developed by ej-technologies ● Honest Profiler - A sampling JVM profiler without the safepoint sample bias ● Async Profiler - Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
  • 5.
    Java Profiling Tools Surveyby RebelLabs in 2016: http://pages.zeroturnaround.com/RebelLabs-Developer-Productivity-Report-2016.html
  • 6.
    Attitude toward performancework Survey by RebelLabs in 2017: https://zeroturnaround.com/rebellabs/developer-productivity-survey-2017/
  • 7.
    Measuring Methods forCPU Profiling Sampling: Monitor running code externally and check which code is executed Instrumentation: Include measurement code into the real code
  • 8.
  • 9.
  • 10.
    Sampling vs. Instrumentation Sampling: ●Overhead depends on the sampling interval ● Can see execution hotspots ● Can miss methods, which returns faster than the sampling interval. Instrumentation: ● Precise measurement for execution times ● More data to process
  • 11.
    How Profilers Work? ●Generic profilers rely on the JVMTI spec ● JVMTI offers only safepoint sampling stack trace collection options ● Some profilers use AsyncGetCallTrace method, which is an OpenJDK internal API call to facilitate non-safepoint collection of stack traces
  • 12.
    Safepoints ● A safepointis a moment in time when a thread’s data, its internal state and representation in the JVM are, well, safe for observation by other threads in the JVM. ○ Between every 2 bytecodes (interpreter mode) ○ Backedge of non-’counted’ loops ○ Method exit ○ JNI call exit
  • 13.
    Flame Graphs ● “Flamegraphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately.” ● Developed by Brendan Gregg, an industry expert in computing performance and cloud computing. ● Flame Graphs can be generated using https://github.com/brendangregg/FlameGraph ○ This creates an interactive SVG http://www.brendangregg.com/flamegraphs.html
  • 14.
  • 15.
    Flame Graph: Definition ●The x-axis shows the stack profile population, sorted alphabetically ● The y-axis shows stack depth ○ The top edge shows what is on-CPU, and beneath it is its ancestry ● Each rectangle represents a stack frame. ● Box width is proportional to the total time a function was profiled directly or its children were profiled
  • 16.
    Types of FlameGraphs ● CPU - see which code-paths are hot (busy on-CPU) ● Memory - Memory Leak (and Growth) ● Off-CPU - Time spent by processes and threads when they are not running on-CPU ● Hot/Cold - both CPU and Off-CPU ● Differential - compare before and after flame graphs
  • 17.
    Why do weneed Flame Graphs? ● Finding out why CPUs are busy is an important task when troubleshooting performance issues ● Can use a sampling profiler to see which code-paths are hot. ● Usually a profiler will dump a lot of data with thousands of lines ● Flame Graph can simply visualize the stack traces output of a sampling profiler.
  • 18.
    Naive Profiling: TakingThread Dumps ● “A thread dump is a snapshot of the state of all threads that are part of the process.” ● The state of the thread is represented with a stack trace. ● A thread can be in only one state at a given point in time. ● You can take thread dumps at regular intervals to do “Naive Java Profiling”
  • 19.
    Sample program toprofile ● Get Sample “highcpu” program from https://github.com/chrishantha/sample-java-programs ● mvn clean install ● cd highcpu ● java -jar target/highcpu.jar --help
  • 20.
    Flame Graph withThread Dumps i=0; while (( i++ < 30 )); do jstack $(pgrep -f highcpu) >> out.jstacks; sleep 2; done cat out.jstacks | $FLAMEGRAPH_DIR/stackcollapse-jstack.pl > out.stacks-folded cat out.stacks-folded | $FLAMEGRAPH_DIR/flamegraph.pl > jstack_flamegraph.svg firefox jstack_flamegraph.svg
  • 21.
    Flame Graph withThread Dumps
  • 22.
    Flame Graph withThread Dumps (Without Thread Names) Top edge shows the methods on-CPU directly Visually compare lengths AncestryCode path Branches
  • 23.
    Flame Graphs withJava Flight Recordings ● We can generate CPU Flame Graphs from a Java Flight Recording ● Program is available at GitHub: https://github.com/chrishantha/jfr-flame-graph ● The program uses the (unsupported) JMC Parser
  • 24.
    Java Flight Recorder(JFR) ● A profiling and event collection framework built into the Oracle JDK ● Gather low level information about the JVM and application behaviour without performance impact (less than 2%) ● Always on Profiling in Production Environments ● Engine was released with Java 7 update 4 ● Commercial feature in Oracle JDK ● A main tool in Java Mission Control (since Java 7 update 40)
  • 25.
    Generating a FlameGraph using JFR dump ● JFR has Method Profiling Samples ○ You can view those in “Hot Methods” and “Call Tree” tabs ● A Flame Graph can be generated using these Method Profilings Samples ● Use following to improve the accuracy of JFR Method Profiler. ● -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
  • 26.
    Profiling the SampleProgram ● Get a Profiling Recording ○ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=10s,duration=1m,name=Profiling,filena me=highcpu_profiling.jfr,settings=profile -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10
  • 27.
  • 28.
    Using jfr-flame-graph create_flamegraph.sh -fhighcpu_profiling.jfr -i > jfr_flamegraph.svg
  • 29.
    Java Mixed-Mode FlameGraphs ● With Java Profilers, we can get information about Java process only. ● However with Java Mixed-Mode Flame Graphs, we can see how much CPU time is spent in Java methods, system libraries and the kernel. ● Mixed-mode means that the Flame Graph shows profile information from both system code paths and Java code paths.
  • 30.
    Linux Perf (perf_events) ●System profiler ● Userspace + Kernel
  • 31.
    Installing “perf_events” onUbuntu ● On terminal, type perf ● sudo apt install linux-tools-common ● sudo apt install linux-tools-generic
  • 32.
    The Problem withJava and Perf ● perf needs the Java symbol table. JVM doesn’t preserve frame pointers by default. ● Run sample program ○ java -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● Run perf record ○ sudo perf record -F 99 -g -p `pgrep -f highcpu` -- sleep 60 ● Display trace output ○ sudo perf script
  • 33.
  • 34.
    Preserving Frame Pointersin JVM ● Run java program with the JVM flag "-XX:+PreserveFramePointer" ○ java -XX:+PreserveFramePointer -jar target/highcpu.jar --hashing-algo SHA-512 --hashing-workers 20 --math-workers 10 --exit-timeout 300 ● This flag is working only on JDK 8 update 60 and above. ● Some frames may be still missing when compared to Flame Graphs generated from JFR or jstack due to “inlining”. ● Can reduced the amount of inlining if you need to see more frames in the profile. ○ For example, -XX:InlineSmallCode=500
  • 35.
    How to generateJava symbol table ● Use a java agent to generate method mappings to use with the linux `perf` tool ○ Clone & Build https://github.com/jvm-profiling-tools/perf-map-agent ● Create symbol map ○ ./create-java-perf-map.sh `pgrep -f highcpu` ● You can also use “jmaps” tool in FlameGraph repository to create symbol files for all Java processes. ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Let Java to “warm-up” before getting symbol maps.
  • 36.
    Generate Java Mixed-ModeFlame Graph ● Run perf and create symbol map ○ export AGENT_HOME=/home/isuru/performance/git-projects/perf-map-agent ○ sudo perf record -F 499 -a -g -- sleep 30;sudo -E $FLAMEGRAPH_DIR/jmaps ● Generate Flame Graph ○ sudo perf script -F comm,pid,tid,cpu,time,event,ip,sym,dso,trace | ○ stackcollapse-perf.pl --pid | grep java-`pgrep -f highcpu` | ○ flamegraph.pl --color=java --hash --width 1080 > java-mixed-mode.svg ○ firefox java-mixed-mode.svg
  • 37.
  • 38.
    Java Mixed-Mode FlameGraph for Netty
  • 39.
    Java Mixed-Mode FlameGraph ● Helps to understand Java CPU Usage ● With Flame Graphs, we can see both java and system profiles ● Can profile GC as well
  • 40.