Skip to main content
added 148 characters in body
Source Link
Ben
  • 5.1k
  • 2
  • 21
  • 23

request.get() is an async call. Regular for loop would not wouldwork. In the code above, callback() gets called every successful request.get(). You need something that you can control the flow such as async.each(), async.eachLimit(), async.eachSeries() and etc, so that callback() is called only once.

I would recommend using async.eachLimit() over async.each() in this scenario to throttle the maximum number of request.get() so you don't flush the server with too many request.get() at a time. In the example below, I use 5 as a max number of requests processing concurrently but you can change the value your server ecan handle:

function getNewsTitles(targets, subs, callback) { async.eachLimit(targets, 5, function (current, eachCb) { request.get({ uri: current, encoding: null }, function(err, response, body) { if (!err && response.statusCode == 200) { var $ = cheerio.load(iconv.decode(body, 'EUC-KR')); var subject = $('.articleSubject a'); for (var i = 0; i < subject.length; i++) { subs.push(subject[i].attribs.title); } if (subs.length == (targets.length - 2) * 20 + 2) { // when error or not, below log shows one time. console.log('doubt here too'); } } eachCb(null); // must be called for every iteration of async.eacieachLimit() }); }, function (err) { callback(null); // all items have been processed, call this callback only once }); } 

request.get() is an async call. Regular for loop would not would. You need something that you can control the flow such as async.each(), async.eachLimit(), async.eachSeries() and etc.

I would recommend using async.eachLimit() to throttle the maximum number of request.get() so you don't flush the server with too many request.get() at a time. In the example below, I use 5 as a max number of requests processing concurrently but you can change the value your server ecan handle:

function getNewsTitles(targets, subs, callback) { async.eachLimit(targets, 5, function (current, eachCb) { request.get({ uri: current, encoding: null }, function(err, response, body) { if (!err && response.statusCode == 200) { var $ = cheerio.load(iconv.decode(body, 'EUC-KR')); var subject = $('.articleSubject a'); for (var i = 0; i < subject.length; i++) { subs.push(subject[i].attribs.title); } if (subs.length == (targets.length - 2) * 20 + 2) { // when error or not, below log shows one time. console.log('doubt here too'); } } eachCb(null); // must be called for every iteration of async.eaci() }); }, function (err) { callback(null); // all items have been processed, call this callback once }); } 

request.get() is an async call. Regular for loop would not work. In the code above, callback() gets called every successful request.get(). You need something that you can control the flow such as async.each(), async.eachLimit(), async.eachSeries() and etc, so that callback() is called only once.

I would recommend using async.eachLimit() over async.each() in this scenario to throttle the maximum number of request.get() so you don't flush the server with too many request.get() at a time. In the example below, I use 5 as a max number of requests processing concurrently but you can change the value your server ecan handle:

function getNewsTitles(targets, subs, callback) { async.eachLimit(targets, 5, function (current, eachCb) { request.get({ uri: current, encoding: null }, function(err, response, body) { if (!err && response.statusCode == 200) { var $ = cheerio.load(iconv.decode(body, 'EUC-KR')); var subject = $('.articleSubject a'); for (var i = 0; i < subject.length; i++) { subs.push(subject[i].attribs.title); } if (subs.length == (targets.length - 2) * 20 + 2) { // when error or not, below log shows one time. console.log('doubt here too'); } } eachCb(null); // must be called for every iteration of async.eachLimit() }); }, function (err) { callback(null); // all items have been processed, call this callback only once }); } 
Source Link
Ben
  • 5.1k
  • 2
  • 21
  • 23

request.get() is an async call. Regular for loop would not would. You need something that you can control the flow such as async.each(), async.eachLimit(), async.eachSeries() and etc.

I would recommend using async.eachLimit() to throttle the maximum number of request.get() so you don't flush the server with too many request.get() at a time. In the example below, I use 5 as a max number of requests processing concurrently but you can change the value your server ecan handle:

function getNewsTitles(targets, subs, callback) { async.eachLimit(targets, 5, function (current, eachCb) { request.get({ uri: current, encoding: null }, function(err, response, body) { if (!err && response.statusCode == 200) { var $ = cheerio.load(iconv.decode(body, 'EUC-KR')); var subject = $('.articleSubject a'); for (var i = 0; i < subject.length; i++) { subs.push(subject[i].attribs.title); } if (subs.length == (targets.length - 2) * 20 + 2) { // when error or not, below log shows one time. console.log('doubt here too'); } } eachCb(null); // must be called for every iteration of async.eaci() }); }, function (err) { callback(null); // all items have been processed, call this callback once }); }