7

I've been running nagios for about two years, but recently this problem started appearing with one of my services.

I'm getting

CRITICAL - Socket timeout after 10 seconds 

for a check_http -H my.host.com -f follow -u /abc/def check, which used to work fine. No other services are reporting this problem. The remote site is up and healthy, and I can do a wget http://my.host.com/abc/def from the nagios server, and it downloads the response just fine. Also, doing a check_http -H my.host.com -f follow works just fine, i.e. it's only when I use the -u argument that things break. I also tried passing it a different user agent string, no difference. I tried increasing the timeout, no luck. I tried with -v, but all it get is:

GET /abc/def HTTP/1.0 User-Agent: check_http/v1861 (nagios-plugins 1.4.11) Connection: close Host: my.host.com CRITICAL - Socket timeout after 10 seconds 

... which does not tell me what's going wrong.

Any ideas how I could resolve this?

Thanks!

7
  • Have you tried adding -4 or -6 to the check_http options? I've had this problem before where I had to force IPv4 for a check. Commented Nov 13, 2011 at 8:07
  • Thanks, I gave it a try. With -4 I get the same error. With -6 I get: Name or service not known HTTP CRITICAL - Unable to open TCP socket Commented Nov 14, 2011 at 9:37
  • Can you post the output of your wget? I'm assuming since you are using follow that the target URL does a redirection. Commented Nov 15, 2011 at 19:17
  • The -f follow might not really be necessary in this case, I just have it part of the command I use for all my services, because some of them do redirect. Commented Nov 17, 2011 at 7:06
  • Here is the output from wget (with some obfuscation): --2011-11-16 23:04:34-- my.host.com/abc/def Resolving my.host.com... 174.xxx.yyy.zzz Connecting to my.host.com|174.xxx.yyy.zzz|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 6324686 (6.0M) [text/html] Saving to: def' 100%[==========================================================================================>] 6,324,686 5.97M/s in 1.0s 2011-11-16 23:04:36 (5.97 MB/s) - acr' saved [6324686/6324686] Commented Nov 17, 2011 at 7:09

5 Answers 5

18

Try using the -N option of check_http.

I ran into similar problems, and in my case the web server didn't terminate the connection after sending the response (https was working, http wasn't). check_http tries to read from the open socket until the server closes the connection. If that doesn't happen then the timeout occurs.

The -N option tells check_http to receive only the header, but not the content of the page / document.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you, finally my service is not in "PROBLEM" state anymore!
Cheers for the solution, however that connections are not terminated is a sign of a possible problem in the stack. Can OP comment on what was the change that triggered it, if known?
Had the same problem and it was due to an "optimising" network appliance.
Information for Check_MK users : in WATO this option is named "Don't wait for document body" - fixed the issue for me too
1

I tracked my issue down to an issue with the security providers configured in the most recent version of OpenSUSE.

From summary of other web pages it appears to be an issue with an attempt to use TLSv2 protocol which does not appear to work correctly, or is missing something in the default configurations to allow it to work.

To overcome the problem I commented out the security provider in question from the JRE security configuration file.

#security.provider.10=sun.security.pkcs11.SunPKCS11 

The security.provider. value may be different in your configuration, but essentially the SunPKCS11 provider is at issue.

This configuration is normally found in

$JAVA_HOME/lib/security/java.security 

of the JRE that you are using.

Comments

0

Fixed with this url in nrpe.cfg: (on Deb 6.0 Squeeze using nagios-nrpe-server)

command[check_http]=/usr/lib/nagios/plugins/check_http -H localhost -p 8080 -N -u /login?from=%2F 

Comments

0

For whoever is interested, I stumbled in this problem too and the problem ended up being in mod_itk on the web server.

A patch is available, even if it seems it's not included in the current CentOS or Debian packages:

https://lists.err.no/pipermail/mpm-itk/2015-September/000925.html

Comments

0

In my case /etc/postfix/main.cf file was not good configured. My mailserverrelay was not defined and was also very restrictive. I should to add:

relayhost = mailrelay.ext.example.com smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.