0

How should I optimise my time in making requests

link=['http://youtube.com/watch?v=JfLt7ia_mLg', 'http://youtube.com/watch?v=RiYRxPWQnbE' 'http://youtube.com/watch?v=tC7pBOPgqic' 'http://youtube.com/watch?v=3EXl9xl8yOk' 'http://youtube.com/watch?v=3vb1yIBXjlM' 'http://youtube.com/watch?v=8UBY0N9fWtk' 'http://youtube.com/watch?v=uRPf9uDplD8' 'http://youtube.com/watch?v=Coattwt5iyg' 'http://youtube.com/watch?v=WaprDDYFpjE' 'http://youtube.com/watch?v=Pm5B-iRlZfI' 'http://youtube.com/watch?v=op3hW7tSYCE' 'http://youtube.com/watch?v=ogYN9bbU8bs' 'http://youtube.com/watch?v=ObF8Wz4X4Jg' 'http://youtube.com/watch?v=x1el0wiePt4' 'http://youtube.com/watch?v=kkeMYeAIcXg' 'http://youtube.com/watch?v=zUdfNvqmTOY' 'http://youtube.com/watch?v=0ONtIsEaTGE' 'http://youtube.com/watch?v=7QedW6FcHgQ' 'http://youtube.com/watch?v=Sb33c9e1XbY'] 

I have a list of 15-20 links of youtube search result of first page Now the task is to get the likes,dislikes,view count from each video url and for that what I had done is

def parse(url,i,arr): req=requests.get(url) soup = bs4.BeautifulSoup(req.text,"lxml")#, 'html5lib') try: likes=int(soup.find("button",attrs={"title": "I like this"}).getText().__str__().replace(",","")) except: likes=0 try: dislikes=int(soup.find("button",attrs={"title": "I dislike this"}).getText().__str__().replace(",","")) except: dislikes=0 try: view=int(soup.find("div",attrs={"class": "watch-view-count"}).getText().__str__().split()[0].replace(",","")) except: view=0 arr[i]=(likes,dislikes,view,url) time.sleep(0.3) def parse_list(link): arr=len(link)*[0] threadarr=len(link)*[0] import threading a=time.clock() for i in range(len(link)): threadarr[i]=threading.Thread(target=parse,args=(link[i],i,arr)) threadarr[i].start() for i in range(len(link)): threadarr[i].join() print(time.clock()-a) return arr arr=parse_list(link) 

Now I am getting the populated result array in about 6 seconds.Is there any faster way I can get my array(arr) so that it takes quite less time than 6 secs

my array first 4 elements look like so that you get a rough idea

[(105, 11, 2836, 'http://youtube.com/watch?v=JfLt7ia_mLg'), (32, 18, 5420, 'http://youtube.com/watch?v=RiYRxPWQnbE'), (45, 3, 7988, 'http://youtube.com/watch?v=tC7pBOPgqic'), (106, 38, 4968, 'http://youtube.com/watch?v=3EXl9xl8yOk')] Thanks in advance :) 
1
  • 2
    If your code works, but you're looking for some improvements, you should ask your question on CodeReview Commented Aug 25, 2017 at 5:47

2 Answers 2

1

I would use multiprocessing Pool object for that particular case.

import requests import bs4 from multiprocessing import Pool, cpu_count links = [ 'http://youtube.com/watch?v=JfLt7ia_mLg', 'http://youtube.com/watch?v=RiYRxPWQnbE', 'http://youtube.com/watch?v=tC7pBOPgqic', 'http://youtube.com/watch?v=3EXl9xl8yOk' ] def parse_url(url): req=requests.get(url) soup = bs4.BeautifulSoup(req.text,"lxml")#, 'html5lib') try: likes=int(soup.find("button", attrs={"title": "I like this"}).getText().__str__().replace(",","")) except: likes=0 try: dislikes=int(soup.find("button", attrs={"title": "I dislike this"}).getText().__str__().replace(",","")) except: dislikes=0 try: view=int(soup.find("div", attrs={"class": "watch-view-count"}).getText().__str__().split()[0].replace(",","")) except: view=0 return (likes, dislikes, view, url) pool = Pool(cpu_count) # number of processes data = pool.map(parse_url, links) # this is where your results are 

This is cleaner as you only have one function to write and you end up with exactly the same results.

Sign up to request clarification or add additional context in comments.

1 Comment

error :TypeError: '<' not supported between instances of 'method' and 'int'
0

This is not a workaround but it can save your script from using "try/except block" which definitely plays a role to somewhat slow the operation down.

for url in links: response = requests.get(url).text soup = BeautifulSoup(response,"html.parser") for item in soup.select("div#watch-header"): view = item.select("div.watch-view-count")[0].text likes = item.select("button[title~='like'] span.yt-uix-button-content")[0].text dislikes = item.select("button[title~='dislike'] span.yt-uix-button-content")[0].text print(view, likes, dislikes) 

3 Comments

try,except are somewhat necessary for me to use in my program since some videos are also disabled to show likes and dislikes etc
But the links you have provided above have got no issues without them. I tested it..
Is this among them you pasted above?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.