Watching the network traffic with WireShark reveals that the second and subsequent accesses to the URL fail due to an internal server error.
The initial request looks like this:
GET /group/comp.soft-sys.math.mathematica/about HTTP/1.1 User-agent: Mathematica/8.0.1.0.0 PM/1.3.1 Host: groups.google.com
and yields a successful response with the following headers:
HTTP/1.1 200 OK Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Cache-Control: no-cache, must-revalidate Set-Cookie: NID=65=tsm2...ijos;Domain=.google.com;Path=/;Expires=Sun, 05-May-2013 17:41:45 GMT;HttpOnly P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info." Content-Type: text/html; charset=ISO-8859-1 Set-Cookie: PREF=ID=59d3...eBf7; expires=Mon, 03-Nov-2014 17:41:45 GMT; path=/; domain=.google.com X-Content-Type-Options: nosniff Date: Sat, 03 Nov 2012 17:41:45 GMT Server: GWS-GRFE/0.50 X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN Transfer-Encoding: chunked
The second and subsequent requests include the returned cookies:
GET /group/comp.soft-sys.math.mathematica/about HTTP/1.1 User-agent: Mathematica/8.0.1.0.0 PM/1.3.1 Host: groups.google.com Cookie: $Version=0; NID=65=tsm2...ijos; $Path=/; $Domain=.google.com Cookie: $Version=0; PREF=ID=59d3...eBf7; $Path=/; $Domain=.google.com
This elicits a response that indicates a server error:
HTTP/1.1 500 Internal Server Error Pragma: no-cache Expires: Fri, 01 Jan 1990 00:00:00 GMT Cache-Control: no-cache, must-revalidate Content-Type: text/html; charset=UTF-8 X-Content-Type-Options: nosniff Date: Sat, 03 Nov 2012 17:39:13 GMT Server: GWS-GRFE/0.50 Content-Length: 1100 X-XSS-Protection: 1; mode=block X-Frame-Options: SAMEORIGIN
The body of the response is a generic "server error" message that offers no additional clues:
The server encountered an error and could not complete your request.If the problem persists, please report your problem and mention this error message and the query that caused it. That...s all we know.
After the Mathematica kernel has been restarted, the Import command once again generates an HTTP request that has no cookies -- which succeeds. So it would appear that the presence of the cookies in the request confuses the server. It is possible that this is an intentional behaviour intended to discourage site access by robots. Perhaps an intermediate cache is struggling with the no-cache, must-revalidate header. Or maybe it is just a programming error in the server.
There is no way to control request cookies through Import or FetchURL. Restarting the kernel is a pretty severe way to work around this problem. A slightly less severe action would be to RestartPacletManager[], but that is still pretty severe.
Workaround
We could just bypass Import altogether and go directly to JLink:
Needs["JLink`"] httpGet[url_String] := JavaBlock@Module[{http, get} , http = JavaNew["org.apache.commons.httpclient.HttpClient"] ; get = JavaNew["org.apache.commons.httpclient.methods.GetMethod", url] ; http@executeMethod[get] ; get@getResponseBodyAsString[] ]
Unlike the Paclet Manager, this code uses a fresh HTTPClient each time, so cookies are never retained and this request can be issued successfully any number of times.
httpGet["http://groups.google.com/group/comp.soft-sys.math.mathematica/about"] (* <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" ... <title>comp.soft-sys.math.mathematica | Google Groups</title> ... *) httpGet["http://groups.google.com/group/comp.soft-sys.math.mathematica/about"] (* <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" .... *)