Application Context: We are currently deploying an application based on spring-boot:3.2.0 which involves REST-based APIs exposed to clients using spring-boot-starter-web, integrating with our datastore via org.springframework.boot:spring-boot-starter-data-elasticsearch topped up by an in-memory cache based on com.github.ben-manes.caffeine:caffeine and uses org.springframework.kafka:spring-kafka.
Issue: Every midnight exactly at 00:00 the thread count of our application spikes and after that very few threads are terminated, leaving our application with incrementally increased threads every day and resulting in the restart of pods sometimes(suspecting this due to OOM as the mem usage increases).
Observation: During this time, there is little deviation in the throughput pattern for the REST-based requests or consuming pipelines of the messages.
Attempt: So, we tried analyzing the thread dump of our application and compared the dump over two different days. One of the major aspects visible was that there are a significant number of threads that are timed_waiting in our application.
stacktrace for these looks like
"http-nio-8080-exec-61" #37660 [37664] daemon prio=5 os_prio=0 cpu=38462.67ms elapsed=46889.40s tid=0x00007f2e0c0557b0 nid=37664 waiting on condition [0x00007f2ddf2fe000] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@21/Native Method) - parking to wait for <0x0000000684f26058> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@21/AbstractQueuedSynchronizer.java:1758) at java.util.concurrent.LinkedBlockingQueue.poll(java.base@21/LinkedBlockingQueue.java:460) at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:99) at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:33) at org.apache.tomcat.util.threads.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1113) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1176) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.runWith(java.base@21/Thread.java:1596) at java.lang.Thread.run(java.base@21/Thread.java:1583) Help Wanted: But from here, we are currently lost in understanding which direction to look into. Looking forward to some hints from the community over:
- Could one of the libraries mentioned have some default configuration to perform some task internal to JVM exactly at 00:00?
- Based on the stacktrace of the dump, how do we figure out by what process was the object that a lot of threads are waiting for created? (additional info)
Note - I can share the thread dump if that helps and further information as and when required for this application. As I said, this literally happens every night with our application.
Update: Web and tomcat properties overridden for the application in setup include:
spring.servlet.multipart.max-file-size=10MB spring.servlet.multipart.max-request-size=10MB server.servlet.context-path=/places/api server.tomcat.accesslog.enabled=true server.tomcat.accesslog.suffix=.log server.tomcat.accesslog.prefix=access_log server.tomcat.accesslog.file-date-format=.yyyy-MM-dd server.tomcat.accesslog.pattern=%h %l %t "%r" %s %b "%{Referer}i" "%{User-Agent}i" %D server.tomcat.accesslog.rotate=true server.tomcat.basedir=/var/log server.tomcat.accesslog.directory=places server.tomcat.accesslog.max-days=7 


http-nio-8080-exec? Does the application have any non-default configuration regarding spring-web or tomcat? And in which timezone is 00:00? Is it the user timezone of the jvm or utc?http-nio-8080-exec. The timezone is IST[UTC+05:30]. There are very few non-default configurations, updated them in the question.