0

thanks for reading.

Such an annoying problem occurred to me,I'm deserving for someone to help me. I am using httpcomponent(new version of former httpclient) in java to open some urls and scrap contents.And multihtread is used to improve performance.

So it is the problem:

1.threads share a HttpClient

1)Defination

private static final ThreadSafeClientConnManager cm = new ThreadSafeClientConnManager(); private static HttpHost proxy = new HttpHost("127.0.0.1",8086,"http"); private static DefaultHttpClient http = new DefaultHttpClient(cm); 

2)and in my inital function

cm.setMaxTotal(100); http.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy); 

3)and then is my thread function

public static String getUrl(String url, String Chareset) { HttpGet get = new HttpGet(url);//uri get.setHeader("Content-Type", "text/html"); get.setHeader("User-Agent","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50215;)"); get.setHeader("Accept-Charset", Chareset+";q=0.7,*;q=0.7");//"utf-8;q=0.7,*;q=0.7"); get.getParams().setParameter("http.socket.timeout",new Integer(CONNECTION_TIMEOUT));//20000 String result = ""; try { HttpResponse response = http.execute(get); if (response.getStatusLine().getStatusCode() != 200){//statusCode != HttpStatus.SC_OK) { System.err.println("HttpGet Method failed: " + response.getStatusLine());//httpGet.getStatusLine() } HttpEntity entity = response.getEntity(); if (entity != null) { result = EntityUtils.toString(entity); EntityUtils.consume(entity); entity = null; } } catch(java.net.SocketException ee) { ee.printStackTrace(); Logger.getLogger(DBManager.class.getName()).log(Level.SEVERE, null, ee); } catch (IOException e) { //throw new Exception(e); Logger.getLogger(DBManager.class.getName()).log(Level.SEVERE, null, e);//TODO Debug } finally { get.abort();//releaseConnection();//TODO http.getConnectionManager().shutdown();? get = null; } return result; } 

4)And then I create 10 threads to call the getUrl() function,but after about 1000 loops,shit happens:

**HttpGet Method failed: HTTP/1.0 503 Service Unavailable** 

But I used IE and the proxy to open the url ,it's opened successfully.So that means nothing wrong with my proxy.

So what's wrong?

2.Then I changed the creation of httpclient to the getUrl() function,so threads don't share HttpClient,like that:

public static String getUrl(String url, String Chareset) { HttpGet get = new HttpGet(url);//uri get.setHeader("Content-Type", "text/html"); get.setHeader("User-Agent","Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50215;)"); get.setHeader("Accept-Charset", Chareset+";q=0.7,*;q=0.7");//"utf-8;q=0.7,*;q=0.7"); get.getParams().setParameter("http.socket.timeout",new Integer(CONNECTION_TIMEOUT));//20000 DefaultHttpClient http = new DefaultHttpClient(cm);//threads dont't share it http.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy); String result = ""; try { HttpResponse response = http.execute(get); if (response.getStatusLine().getStatusCode() != 200){//statusCode != HttpStatus.SC_OK) { System.err.println("HttpGet Method failed: " + response.getStatusLine());//httpGet.getStatusLine() } HttpEntity entity = response.getEntity(); if (entity != null) { result = EntityUtils.toString(entity); EntityUtils.consume(entity); entity = null; } } catch(java.net.SocketException ee) { ee.printStackTrace(); Logger.getLogger(DBManager.class.getName()).log(Level.SEVERE, null, ee); } catch (IOException e) { //throw new Exception(e); Logger.getLogger(DBManager.class.getName()).log(Level.SEVERE, null, e);//TODO Debug } finally { get.abort();//releaseConnection();//TODO http.getConnectionManager().shutdown();? get = null; http = null;//clean almost all the resources } return result; } 

and then after about 600 loops of 10 threads,another shit happens:

**Exception in thread "Thread-11" java.lang.OutOfMemoryError: Java heap space**

Exception occurs in result = EntityUtils.toString(entity); line

So,really need some help.

Thanks!

2
  • It turns out to be that i am being denied.In order not to be denied,I have to use the second way:new a httpclient in every loop of every thread.And the Java gc() is too slow,so i think sleep() may be a good idea ,or i can reset the prog with a flag to indicate the progress.Thanks all the answers! Commented Mar 2, 2012 at 1:03
  • 1
    I finally figured it out.This way works.And somewhere else has memory leak.Sorry... Commented Mar 2, 2012 at 10:13

2 Answers 2

2

503 means service unavailable, therefore the service is down. Now it could be due to the fact that you are actually accessing the same service over and over and ends up with an error or denies you the service because of such load.

The second error is quite clear: no more memory because you used it all. Either your program is leaking memory or you should increase your heap size using -Xmx256m, -Xmx512m, -Xmx1G, etc... There are tons of answers on SO for these issues.

Sign up to request clarification or add additional context in comments.

9 Comments

Firstly i can open the url through IE and the same proxy at the same time while the program is running,so why it's not denying me! And the sencond question,i know it is no more memory because i used it all.I just don't konw why.The getUrl() function is almost all the code.So tell me where my program is leaking memory instead talking about sth i already know,ok?
Different clients result in different responses. It is not because IE works that HTTPClient should work. Different sessions, different agents, etc... You should definitely not rely on IE to say that it works or not. Also, maybe the service on the other end goes down because of 1000 requests, but just a few of them is maybe no issue
You see the User-Agent field?That's firefox on my PC.I fake it as firefox,and Open the url using firefox is ok too!
User agent does not make you fake a browser. You are definitely not reusing the browser cookies, for example, sor you are not sharing the same session. Also, as I said, maybe a few requests is ok for the server but 1000 may be not.
I tried firefox and IE to open the same url while this program is running.So i think it's not denying me.
|
1

The answer given by Guillaume sounds perfectly reasonable to me. As far as you second problem is concerned the reason for OutOfMemoryError is quite simple. DefaultHttpClient objects are very expensive and by creating a new instance for each and every request you are depleting your system resources much faster. Besides, generally EntityUtils#toString is to be avoided for anything other than simple tests. One should consume HTTP response messages as a content stream without buffering the entire response body in memory.

5 Comments

One should consume HTTP response messages as a content stream without buffering the entire response body in memory.Then what should I do?entity.getContent? Thanks.
I doubt about that answer. All urls are diffrent,there are mills of them. Every thread opens a diffrent url. If url a denies me,why does url b deny me too? And I use IE and Firefox to open one of the url using the same proxy at the same time,and it's successful. So I think the logic is quite right. There's maybe sth I should do to clean the resources after I opened one url.
@Rusty: yes, you should be using InputStream returned by HttpEntity#getContent and reading only enough data to get the work done.
@Rusty: I suspect if you open ten instances of IE and script them to execute 1000 requests in a tight loop, you will start seeing 503 as well. I double-checked your code snippet (1) and could not spot any issues with resource deallocation.
:Thanks,pal.I finally figure out.I am being denied.In order not to be denied,I have to use the second way:new a httpclient in every loop of every thread.And the Java gc() is too slow,so i think sleep may be a good idea ,or i can reset the prog with a flag to to indicate progress.Thanks anyway

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.