This is a follow up question to How can I connect to a remote machine? and the excellent answer provided by ybeltukov.
My system is now configured as described in the answer with my MacBook running the FrontEnd and a Linux server running the remote kernels. SSH and a VPN connect the two. Provided the network link is maintained at all times, it works very well.
However, if the connection is severed for any reason, the screen sessions on the remote machine continue to exist and MathKernel remains on the process list, but my FrontEnd cannot reconnect. On evaluating a cell, the FrontEnd waits around for about thirty seconds before ceasing to wait for the evaluation. No error is produced in the notebook and the value of Out[] is not updated. All the non-native symbols in the Frontend turn to a dark blue colour (not the undefined symbol blue, but a new dark blue that I hadn't seen before).
In the Messages window, I receive the following error: "The kernel remote.kernel.com failed to connect to the front end. (Error = MLECONNECT). You should try running the kernel connection outside the front end."
Inspecting screen -wipe on the remote machine, I see it has created multiple copies of the "Respawner" screen session.
If I kill -15 the MathKernel pids on the remote machine, the problem is solved, in the sense evaluating a cell will cause a new MathKernel instance to be created on the remote machine and a proper connection established, but I have to run my 30 hour computation again.
Is there a reliable way to reconnect to remote kernels?
I believe the problem stems from changing the value of $ParentLink on the remote machine. Once $ParentLink is set, the console stops responding to commands. I can see this if I connect to the screen session with a console to inspect it.
Mathematica 10.0 for Linux x86 (64-bit) Copyright 1988-2014 Wolfram Research, Inc. In[1]:= $ParentLink=LinkCreate["[email protected],[email protected]",LinkMode->Connect,LinkProtocol->"TCPIP"]; $ParentLink=LinkCreate["[email protected],[email protected]",LinkMode->Connect,LinkProtocol->"TCPIP"]; $ParentLink=LinkCreate["[email protected],[email protected]",LinkMode->Connect,LinkProtocol->"TCPIP"]; $ParentLink=LinkCreate["[email protected],[email protected]",LinkMode->Connect,LinkProtocol->"TCPIP"]; You can see, the Kernel is ignoring further instructions passed to it for execution, therefore the link back to the FrontEnd at its new place is never established. The first link works successfully but maybe it disconnects the console from the kernel.
Is this how $ParentLink is supposed to behave?
DumpSave) just after a long calculation. $\endgroup$