3

I have an weird issue with a Java process which is stuck (once/twice per day) and it only recovers after I execute:

jstack -F ${PID} 

While the Java process is stuck, If I try to do a thread dump with jcmd, I receive AttachNotSupportedException.

I only managed to do a thread dump using jstack -F, and using JDK version in sync with JRE version, JRE version that was used to start the java process.

The only thing I can think of, is that maybe the OS Scheduler does not allow CPU time for the Java process, and if I do a jstack -F I force it to allow it to run?

Any feedback will be appreciated.

Regards,

Cristi

UPDATE-1

Today it happen again. First thing I have checked is the memory used (99.1%) on that box. Afterwards I have performed a jmap -heap, and process resumed without any issues after the heap dump. Heap dump attached.

jmap -heap 7703 Attaching to process ID 7703, please wait... Debugger attached successfully. Server compiler detected. JVM version is 25.162-b12 using thread-local object allocation. Parallel GC with 2 thread(s) Heap Configuration: MinHeapFreeRatio = 0 MaxHeapFreeRatio = 100 MaxHeapSize = 536870912 (512.0MB) NewSize = 89128960 (85.0MB) MaxNewSize = 178782208 (170.5MB) OldSize = 179306496 (171.0MB) NewRatio = 2 SurvivorRatio = 8 MetaspaceSize = 21807104 (20.796875MB) CompressedClassSpaceSize = 1073741824 (1024.0MB) MaxMetaspaceSize = 17592186044415 MB G1HeapRegionSize = 0 (0.0MB) Heap Usage: PS Young Generation Eden Space: capacity = 143130624 (136.5MB) used = 73244792 (69.85167694091797MB) free = 69885832 (66.64832305908203MB) 51.1733897003062% used From Space: capacity = 17825792 (17.0MB) used = 8176960 (7.79815673828125MB) free = 9648832 (9.20184326171875MB) 45.871510225183826% used To Space: capacity = 17825792 (17.0MB) used = 0 (0.0MB) free = 17825792 (17.0MB) 0.0% used PS Old Generation capacity = 243269632 (232.0MB) used = 23534032 (22.443801879882812MB) free = 219735600 (209.5561981201172MB) 9.674052534432247% used 25964 interned Strings occupying 2759784 bytes. 

UPDATE-2

After having GC logs enabled, when the process freezes, this is the tail of the GC log.

2020-09-02T06:51:11.286+0000: 86020.549: Total time for which application threads were stopped: 0.0001978 seconds, Stopping threads took: 0.0000666 seconds 2020-09-02T06:51:11.286+0000: 86020.550: Application time: 0.0000610 seconds 2020-09-02T06:51:11.286+0000: 86020.550: Total time for which application threads were stopped: 0.0001793 seconds, Stopping threads took: 0.0000589 seconds 2020-09-02T06:51:11.287+0000: 86020.550: Application time: 0.0003371 seconds 2020-09-02T06:51:11.287+0000: 86020.550: Total time for which application threads were stopped: 0.0001749 seconds, Stopping threads took: 0.0000283 seconds 2020-09-02T06:51:11.287+0000: 86020.550: Application time: 0.0001277 seconds 2020-09-02T06:51:11.287+0000: 86020.550: Total time for which application threads were stopped: 0.0001554 seconds, Stopping threads took: 0.0000364 seconds 2020-09-02T06:51:11.287+0000: 86020.551: Application time: 0.0000400 seconds 2020-09-02T06:51:11.287+0000: 86020.551: Total time for which application threads were stopped: 0.0001082 seconds, Stopping threads took: 0.0000158 seconds 2020-09-02T06:51:11.288+0000: 86020.552: Application time: 0.0010649 seconds 2020-09-02T06:51:11.288+0000: 86020.552: Total time for which application threads were stopped: 0.0001945 seconds, Stopping threads took: 0.0000571 seconds 2020-09-02T06:51:11.289+0000: 86020.552: Application time: 0.0001078 seconds 2020-09-02T06:51:11.289+0000: 86020.552: Total time for which application threads were stopped: 0.0001852 seconds, Stopping threads took: 0.0000336 seconds 2020-09-02T06:51:11.289+0000: 86020.552: Application time: 0.0000366 seconds 2020-09-02T06:51:11.289+0000: 86020.552: Total time for which application threads were stopped: 0.0000910 seconds, Stopping threads took: 0.0000180 seconds 2020-09-02T06:51:11.289+0000: 86020.552: Application time: 0.0000412 seconds 2020-09-02T06:51:11.289+0000: 86020.553: Total time for which application threads were 
4
  • It looks like I am able to resume the process if I send a kill -SIGCONT $(PID) as well, sign that maybe kernel have sent a kill -SIGSTOP to the process, maybe due lack of resources. The machines where I see this issues, are where there is a lot of load. Commented Aug 11, 2020 at 16:20
  • 1
    At first it looked like it is the issue from this question stackoverflow.com/questions/34251580/… But it looks like the bug was fixed in the current kernel version I am using root@hostname /]# rpm -q --changelog kernel-2.6.32-754.29.1.el6.x86_64 | grep 'get_futex_key_refs' - [kernel] futex: Ensure get_futex_key_refs() always implies a barrier (Larry Woodman) [1167405] Commented Aug 11, 2020 at 17:23
  • 2
    We've been struggling against a similar issue, and this was indeed a kernel bug. Upgrading to Linux 4.x helped. Commented Aug 14, 2020 at 22:22
  • 1
    One possibility could be that at that time full GC is happening, you may like to change the GC to CMS, also take thread dump at that time and see if any code creating to many objects or if there is eny memory leak and some objects are not being GCed? Commented Aug 21, 2020 at 8:35

1 Answer 1

1

If you receive AttachNotSupportedException while getting thread dump with jcmd try to run jcmd under the same user as java process runs. See com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.