3

I'm building this program in visual studio 2010 using C# .Net4.0 The goal is to use thread and queue to improve performance.

I have a list of urls I need to process.

string[] urls = { url1, url2, url3, etc.} //up to 50 urls 

I have a function that will take in each url and process them.

public void processUrl(string url) { //some operation } 

Originally, I created a for-loop to go through each urls.

for (i = 0; i < urls.length; i++) processUrl(urls[i]); 

The method works, but the program is slow as it was going through urls one after another.

So the idea is to use threading to reduce the time, but I'm not too sure how to approach that.

Say I want to create 5 threads to process at the same time.

When I start the program, it will start processing the first 5 urls. When one is done, the program start process the 6th url; when another one is done, the program starts processing the 7th url, and so on.

The problem is, I don't know how to actually create a 'queue' of urls and be able to go through the queue and process.

Can anyone help me with this?

-- EDIT at 1:42PM --

I ran into another issue when I was running 5 process at the same time.

The processUrl function involve writing to log file. And if multiple processes timeout at the same time, they are writing to the same log file at the same time and I think that's throwing an error.

I'm assuming that's the issue because the error message I got was "The process cannot access the file 'data.log' because it is being used by another process."

6
  • Parallel.For is probably going to be more useful in your case. Note that having "more" threads does not necessarily make your program faster. Commented Jul 5, 2013 at 16:16
  • Rewrite processUrl as an async method, await on async methods for IO and fire off your requests in parallel. No explicit thread code required. Commented Jul 5, 2013 at 16:22
  • async , await available only from .net framework 4.5 Commented Jul 5, 2013 at 16:26
  • @spender Would be far nicer, but would also require upgrading to VS 2012 :( Commented Jul 5, 2013 at 16:26
  • @srsyogesh You can do it in .NET 4 with the async targeting pack, but still requires VS 2012, not 2010... Commented Jul 5, 2013 at 16:27

3 Answers 3

2

The simplest option would be to just use Parallel.ForEach. Provided processUrl is thread safe, you could write:

Parallel.ForEach(urls, processUrl); 

I wouldn't suggest restricting to 5 threads (the scheduler will automatically scale normally), but this can be done via:

Parallel.ForEach(urls, new ParallelOptions { MaxDegreeOfParallelism = 5}, processUrl); 

That being said, URL processing is, by its nature, typically IO bound, and not CPU bound. If you could use Visual Studio 2012, a better option would be to rework this to use the new async support in the language. This would require changing your method to something more like:

public async Task ProcessUrlAsync(string url) { // Use await with async methods in the implementation... 

You could then use the new async support in the loop:

// Create an enumerable to Tasks - this will start all async operations.. var tasks = urls.Select(url => ProcessUrlAsync(url)); await Task.WhenAll(tasks); // "Await" until they all complete 
Sign up to request clarification or add additional context in comments.

6 Comments

Can be simplified even further to Parallel.ForEach(urls, processUrl). I also believe the asker is using .NET 4.0, so will need to use the async targeting pack if they wished to adopt the async pattern
@Lukazoid Yes - I mentioned that in there - Just changed the call, though - good suggestion
@Reed - how would I know if my processUrl is thread safe? I'm kinda new to threading so not quite sure how that works.
@sora0419 You'd need to make sure it doesn't use any "state" that's shared with other types or methods. That's the main issue - if it works without touching any other fields/properties/etc, it's potentially okay. (Not sure what it actually does...)
@Reed processUrl basically takes in a url and return the text content of that url. and is it possible to keep track of how many urls actually successfully go through? (as some urls might timeout sometimes)
|
1

Use a Parallel Foreach with the Max Degree of Parallelism set to the number of threads you want (or leave it empty and let .NET do the work for you)

ParallelOptions parallelOptions = new ParallelOptions(); parallelOptions.MaxDegreeOfParallelism = 5; Parallel.ForEach(urls, parallelOptions, url => { processUrl(url); }); 

2 Comments

Is there a way to set fire time also? there's a 'write to log' in side the processUrl function, and if multiple processes are timing out, they are writing timeout message to the same log file at the same time and I think that throw an error.
Logging is a different story. No there isnt a time to fire cause threads are going to fire simultaneously. If you update with some code what you are trying to with w.r.t logging I will update my answer accordingly
0

If you really want to create threads to accomplish you task in place of using parallel execution:

Suppose that I want one thread for each URL:

string[] urls = {"url1", "url2", "url3"}; 

I just start a new Thread instance for each URL (or each 5 url's):

foreach (var thread in urls.Select(url => new Thread(() => DownloadUrl(url)))) thread.Start(); 

And the method to download your URL:

private static void DownloadUrl(string url) { Console.WriteLine(url); } 

5 Comments

Would likely be better to use the TPL, or even the thread pool, than to fire off a Thread manually per item...
Creating a new thread might be costly operation , you can re use threads from Thread pool using QueueUserWorkItem class.
Yes, I agree with the use of ThreadPool, but if he will be using a fixed number of threads (in the example just 5), we can avoid it.
@gustavodidomenico post specified "//up to 50 urls" - Besides, if you're not used to threading, I think the TPL is simpler to use than Thread ;)
I agree with you as well. But read my first line of the post. And I said five threads not five URL's.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.