Java Performance & Profiling M. Isuru Tharanga Chrishantha Perera Associate Technical Lead at WSO2 Co-organizer of Java Colombo Meetup
What is Profiling? Here is what wikipedia says: In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization. https://en.wikipedia.org/wiki/Profiling_(computer_programming)
What is Profiling? Here is what wikipedia says: Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods. https://en.wikipedia.org/wiki/Profiling_(computer_programming)
Measuring Performance We need a way to measure the performance: o To understand how the system behaves o To see performance improvements after doing any optimizations There are two key performance metrics. o Latency o Throughput
What is Throughput? Throughput measures the number of messages that a server processes during a specific time interval (e.g. per second). Throughput is calculated using the equation: Throughput = number of requests / time to complete the requests
What is Latency? Latency measures the end-to-end processing time for an operation.
Tuning Java Applications We need to have a very high throughput and very low latency values. There is a tradeoff between throughput and latency. With more concurrent users, the throughput increases, but the average latency will also increase.
Throughput and Latency Graphs Source: https://www.infoq.com/articles/Tuning-Java-Servers
Latency Distribution When measuring latency, it’s important to look at the latency distribution: min, max, avg, median, 75th percentile, 98th percentile, 99th percentile etc.
Longtail latencies When high percentiles have values much greater than the average latency Source: https://engineering.linkedin.com/performance/who- moved-my-99th-percentile-latency
Latency Numbers Every Programmer Should Know L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Why do we need Profiling? o Improve throughput (Maximizing the transactions processed per second) o Improve latency (Minimizing the time taken to for each operation) o Find performance bottlenecks
Java Just-In-Time (JIT) compiler Java code is usually compiled into platform independent bytecode (class files) The JVM is able to load the class files and execute the Java bytecode via the Java interpreter. Even though this bytecode is usually interpreted, it might also be compiled into native machine code using the JVM's Just-In-Time (JIT) compiler.
Java Just-In-Time (JIT) compiler Unlike the normal compiler, the JIT compiler compiles the code (bytecode) only when required. With JIT compiler, the JVM monitors the methods executed by the interpreter and identifies the “hot methods” for compilation. After identifying the Java method calls, the JVM compiles the bytecode into a more efficient native code.
JITWatch The JITWatch tool can analyze the compilation logs generated with the “-XX:+LogCompilation” flag. The logs generated by LogCompilation are XML- based and has lot of information related to JIT compilation. Hence these files are very large. https://github.com/AdoptOpenJDK/jitwatch
Java Profiling Tools Survey by RebelLabs in 2015: http://pages.zeroturnaround.com/RebelLabs---All-Report-Landers_Developer- Productivity-Report-2015.html
Java Profiling Tools Java VisualVM - Available in JDK Java Mission Control - Available in JDK JProfiler - A commercially licensed Java profiling tool developed by ej-technologies
Profiling Applications with Java VisualVM CPU Profiling: Profile the performance of the application. Memory Profiling: Analyze the memory usage of the application.
Java Mission Control o A set of powerful tools running on the Oracle JDK to monitor and manage Java applications o Free for development use (Oracle Binary Code License) o Available in JDK since Java 7 update 40 o Supports Plugins o Two main tools o JMX Console o Java Flight Recorder
Measuring Methods for CPU Profiling Sampling: Monitor running code externally and check which code is executed Instrumentation: Include measurement code into the real code
Sampling vs. Instrumentation Sampling: o Overhead depends on the sampling interval o Can see execution hotspots o Can miss methods, which returns faster than the sampling interval. Instrumentation: o Precise measurement for execution times o More data to process
Sampling vs. Instrumentation o Java VisualVM uses both sampling and instrumentation o Java Flight Recorder uses sampling o JProfiler supports both sampling and instrumentation
Problems with Profiling o Runtime Overhead o Interpretation of the results can be difficult o Identifying the "crucial“ parts of the software o Identifying potential performance improvements
Flame Graphs Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. Brendan Gregg created the open source program to generate flame graphs: https://github. com/brendangregg/FlameGraph
Java CPU Flame Graphs Helps to understand Java CPU Usage With Flame Graphs, we can see both java and system profiles Can profile GC as well
Flame Graphs with Java Flight Recordings We can generate CPU Flame Graphs from a Java Flight Recording Program is available at GitHub: https://github. com/chrishantha/jfr-flame-graph The program uses the (unsupported) JMC Parser
Benchmarking Tools Apache JMeter Apache Benchmark wrk - a HTTP benchmarking tool
Does profiling matter? Yes! Most of the performance issues are in the application code. Early performance testing is key. Fix problems while developing.
Thank you!

Java Performance & Profiling

  • 1.
    Java Performance & Profiling M.Isuru Tharanga Chrishantha Perera Associate Technical Lead at WSO2 Co-organizer of Java Colombo Meetup
  • 2.
    What is Profiling? Hereis what wikipedia says: In software engineering, profiling ("program profiling", "software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization. https://en.wikipedia.org/wiki/Profiling_(computer_programming)
  • 3.
    What is Profiling? Hereis what wikipedia says: Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods. https://en.wikipedia.org/wiki/Profiling_(computer_programming)
  • 4.
    Measuring Performance We needa way to measure the performance: o To understand how the system behaves o To see performance improvements after doing any optimizations There are two key performance metrics. o Latency o Throughput
  • 5.
    What is Throughput? Throughputmeasures the number of messages that a server processes during a specific time interval (e.g. per second). Throughput is calculated using the equation: Throughput = number of requests / time to complete the requests
  • 6.
    What is Latency? Latencymeasures the end-to-end processing time for an operation.
  • 7.
    Tuning Java Applications Weneed to have a very high throughput and very low latency values. There is a tradeoff between throughput and latency. With more concurrent users, the throughput increases, but the average latency will also increase.
  • 8.
    Throughput and LatencyGraphs Source: https://www.infoq.com/articles/Tuning-Java-Servers
  • 9.
    Latency Distribution When measuringlatency, it’s important to look at the latency distribution: min, max, avg, median, 75th percentile, 98th percentile, 99th percentile etc.
  • 10.
    Longtail latencies When highpercentiles have values much greater than the average latency Source: https://engineering.linkedin.com/performance/who- moved-my-99th-percentile-latency
  • 11.
    Latency Numbers EveryProgrammer Should Know L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns 3 us Send 1K bytes over 1 Gbps network 10,000 ns 10 us Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD Read 1 MB sequentially from memory 250,000 ns 250 us Round trip within same datacenter 500,000 ns 500 us Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
  • 12.
    Why do weneed Profiling? o Improve throughput (Maximizing the transactions processed per second) o Improve latency (Minimizing the time taken to for each operation) o Find performance bottlenecks
  • 13.
    Java Just-In-Time (JIT)compiler Java code is usually compiled into platform independent bytecode (class files) The JVM is able to load the class files and execute the Java bytecode via the Java interpreter. Even though this bytecode is usually interpreted, it might also be compiled into native machine code using the JVM's Just-In-Time (JIT) compiler.
  • 14.
    Java Just-In-Time (JIT)compiler Unlike the normal compiler, the JIT compiler compiles the code (bytecode) only when required. With JIT compiler, the JVM monitors the methods executed by the interpreter and identifies the “hot methods” for compilation. After identifying the Java method calls, the JVM compiles the bytecode into a more efficient native code.
  • 15.
    JITWatch The JITWatch toolcan analyze the compilation logs generated with the “-XX:+LogCompilation” flag. The logs generated by LogCompilation are XML- based and has lot of information related to JIT compilation. Hence these files are very large. https://github.com/AdoptOpenJDK/jitwatch
  • 16.
    Java Profiling Tools Surveyby RebelLabs in 2015: http://pages.zeroturnaround.com/RebelLabs---All-Report-Landers_Developer- Productivity-Report-2015.html
  • 17.
    Java Profiling Tools JavaVisualVM - Available in JDK Java Mission Control - Available in JDK JProfiler - A commercially licensed Java profiling tool developed by ej-technologies
  • 18.
    Profiling Applications withJava VisualVM CPU Profiling: Profile the performance of the application. Memory Profiling: Analyze the memory usage of the application.
  • 19.
    Java Mission Control oA set of powerful tools running on the Oracle JDK to monitor and manage Java applications o Free for development use (Oracle Binary Code License) o Available in JDK since Java 7 update 40 o Supports Plugins o Two main tools o JMX Console o Java Flight Recorder
  • 20.
    Measuring Methods forCPU Profiling Sampling: Monitor running code externally and check which code is executed Instrumentation: Include measurement code into the real code
  • 21.
    Sampling vs. Instrumentation Sampling: oOverhead depends on the sampling interval o Can see execution hotspots o Can miss methods, which returns faster than the sampling interval. Instrumentation: o Precise measurement for execution times o More data to process
  • 22.
    Sampling vs. Instrumentation oJava VisualVM uses both sampling and instrumentation o Java Flight Recorder uses sampling o JProfiler supports both sampling and instrumentation
  • 23.
    Problems with Profiling oRuntime Overhead o Interpretation of the results can be difficult o Identifying the "crucial“ parts of the software o Identifying potential performance improvements
  • 24.
    Flame Graphs Flame graphsare a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. Brendan Gregg created the open source program to generate flame graphs: https://github. com/brendangregg/FlameGraph
  • 25.
    Java CPU FlameGraphs Helps to understand Java CPU Usage With Flame Graphs, we can see both java and system profiles Can profile GC as well
  • 26.
    Flame Graphs withJava Flight Recordings We can generate CPU Flame Graphs from a Java Flight Recording Program is available at GitHub: https://github. com/chrishantha/jfr-flame-graph The program uses the (unsupported) JMC Parser
  • 27.
    Benchmarking Tools Apache JMeter ApacheBenchmark wrk - a HTTP benchmarking tool
  • 28.
    Does profiling matter? Yes! Mostof the performance issues are in the application code. Early performance testing is key. Fix problems while developing.
  • 29.