0

I have software that loads a list of up to 10,000 URLs which are used to scrape insurance prices for my website.

I have a single thread running at the moment per request which loads each URL from the list and fetches the data. What I want to do is run 20-30 requests per time. What's the best way to launch 20-30 threads at once whilst looping through the results from the textfile?

3
  • You could load the entire list at one go and hand off a chunk from it (say 50 URLs) to each new thread that you spawn, till you reach the max thread count (say 20 threads). Tweak the numbers as necessary. Commented Feb 28, 2012 at 14:34
  • I would probably scale back your ambitions, with that number of outbound request any website will rack up the bandwidth charges at an astronomical rate. Commented Feb 28, 2012 at 14:40
  • May be of interest: stackoverflow.com/questions/8853907/… Commented Feb 28, 2012 at 14:45

2 Answers 2

3

Take a look at the Task Parallel Library and especially the Parallel.ForEach method.

Sign up to request clarification or add additional context in comments.

Comments

1

If you are on .NET 4 then you can take a look at TPL and something like the following.

const string path = @"c:\urls.txt"; string[] urls = File.ReadAllLines(path); var options = new ParallelOptions() { MaxDegreeOfParallelism = 20}; Parallel.ForEach(urls, options, url => { // Call your scraper here Debug.WriteLine(url); }); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.