Nagios: CRITICAL - Socket timeout after 10 seconds

Question

I've been running nagios for about two years, but recently this problem started appearing with one of my services.

I'm getting

CRITICAL - Socket timeout after 10 seconds

for a check_http -H my.host.com -f follow -u /abc/def check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow works just fine, i.e. it's only when I use the -u argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:

GET /abc/def HTTP/1.0 User-Agent: check_http/v1861 (nagios-plugins 1.4.11) Connection: close Host: my.host.com CRITICAL - Socket timeout after 10 seconds

... which does not tell me what's going wrong.

Any ideas how I could resolve this?

Thanks!

Have you tried adding -4 or -6 to the check_http options? I've had this problem before where I had to force IPv4 for a check. — Starfish
– Starfish, Commented Nov 13, 2011 at 8:07
Thanks, I gave it a try. With -4 I get the same error. With -6 I get: Name or service not known HTTP CRITICAL - Unable to open TCP socket — fulv
– fulv, Commented Nov 14, 2011 at 9:37
Can you post the output of your wget? I'm assuming since you are using follow that the target URL does a redirection. — Starfish
– Starfish, Commented Nov 15, 2011 at 19:17
The -f follow might not really be necessary in this case, I just have it part of the command I use for all my services, because some of them do redirect. — fulv
– fulv, Commented Nov 17, 2011 at 7:06
Here is the output from wget (with some obfuscation): --2011-11-16 23:04:34-- my.host.com/abc/def Resolving my.host.com... 174.xxx.yyy.zzz Connecting to my.host.com|174.xxx.yyy.zzz|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6324686 (6.0M) [text/html] Saving to: def' 100%[==========================================================================================>] 6,324,686 5.97M/s in 1.0s 2011-11-16 23:04:36 (5.97 MB/s) - acr' saved [6324686/6324686] — fulv
– fulv, Commented Nov 17, 2011 at 7:09

Yuck · Accepted Answer · 2011-12-20 13:20:02Z

18

Try using the -N option of check_http.

I ran into similar problems, and in my case the web server didn't terminate the connection after sending the response (https was working, http wasn't). check_http tries to read from the open socket until the server closes the connection. If that doesn't happen then the timeout occurs.

The -N option tells check_http to receive only the header, but not the content of the page / document.

edited Dec 20, 2011 at 13:20

Yuck

51.1k13 gold badges108 silver badges136 bronze badges

answered Dec 20, 2011 at 13:18

rwf

1961 silver badge3 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

fulv Over a year ago

Thank you, finally my service is not in "PROBLEM" state anymore!

Cosimo Over a year ago

Cheers for the solution, however that connections are not terminated is a sign of a possible problem in the stack. Can OP comment on what was the change that triggered it, if known?

Vegard Over a year ago

Had the same problem and it was due to an "optimising" network appliance.

domi27 Over a year ago

Information for Check_MK users : in WATO this option is named "Don't wait for document body" - fixed the issue for me too

sweetfa · Accepted Answer · 2014-04-15 00:52:43Z

I tracked my issue down to an issue with the security providers configured in the most recent version of OpenSUSE.

From summary of other web pages it appears to be an issue with an attempt to use TLSv2 protocol which does not appear to work correctly, or is missing something in the default configurations to allow it to work.

To overcome the problem I commented out the security provider in question from the JRE security configuration file.

#security.provider.10=sun.security.pkcs11.SunPKCS11

The security.provider. value may be different in your configuration, but essentially the SunPKCS11 provider is at issue.

This configuration is normally found in

$JAVA_HOME/lib/security/java.security

of the JRE that you are using.

Fabio Pedrazzoli Grazioli · Accepted Answer · 2014-06-02 09:41:33Z

Fixed with this url in nrpe.cfg: (on Deb 6.0 Squeeze using nagios-nrpe-server)

command[check_http]=/usr/lib/nagios/plugins/check_http -H localhost -p 8080 -N -u /login?from=%2F

ElementalStorm · Accepted Answer · 2017-03-02 17:53:19Z

For whoever is interested, I stumbled in this problem too and the problem ended up being in mod_itk on the web server.

A patch is available, even if it seems it's not included in the current CentOS or Debian packages:

https://lists.err.no/pipermail/mpm-itk/2015-September/000925.html

Sven Eberth · Accepted Answer · 2021-06-22 23:24:59Z

In my case /etc/postfix/main.cf file was not good configured. My mailserverrelay was not defined and was also very restrictive. I should to add:

relayhost = mailrelay.ext.example.com smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination

Collectives™ on Stack Overflow

Nagios: CRITICAL - Socket timeout after 10 seconds

5 Answers 5

4 Comments

Comments

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

4 Comments

Comments

Comments

Comments

Comments

Related