9

I am trying to host a webscraping function on aws lambda and am running into webdriver errors for selenium. Could someone show me how you go about adding the chromedriver.exe file and how do you get the pathing to work in AWS Lambda function. This is the portion of my function that has to do with selenium,

from selenium import webdriver from selenium.webdriver.common.by import By from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.support.ui import Select from selenium.webdriver.chrome.service import Service import pandas as pd import mysql.connector from sqlalchemy import create_engine url = 'https://covid19criticalcare.com/pharmacies/' driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.maximize_window() driver.get(url) wait = WebDriverWait(driver, 5) 
  1. I tried creating a lambda layer with the chromedriver.exe file

  2. I followed this guide (https://dev.to/awscommunity-asean/creating-an-api-that-runs-selenium-via-aws-lambda-3ck3) but I couldn't add the headless chromium because of the file size pushing me over my function limit (my pandas and numpy dependence layers have taken up most of my space)

  3. I tried driver = webdriver.Chrome(with a path variable) and tried different pathing but wasn't sure what the beginning of the path would be since its on a lambda function.

2 Answers 2

20

I've been struggling adding selenium to the aws lambda for last couple days. I have a web scraping function (uses selenium and google api) which extracts data from a website and writes the outputs to a google spreadsheet. Let me explain what i did step by step and how i finally succeeded so you don't have to deal with it as much as me:

1- I tried to add selenium as a layer described here https://www.youtube.com/watch?v=jWqbYiHudt8. What i ended up was, i was succesfull with adding selenium but deployment package is over 250mb (describe lambda quotaas here: How to increase the maximum size of the AWS lambda deployment package (RequestEntityTooLargeException)?) so it did not work.

2- To overcome deployment package size, it is a good option to add as container images(10 gb deployment package size limit). Here is a good explanation of adding as container images https://cloudbytes.dev/snippets/run-selenium-in-aws-lambda-for-ui-testing#using-the-github-repository-directly . i tried it but i could not able to deploy as described due to missing/wrong webdrivers(the shell script seems to be wrong)

3- And finally, i was fully able to publish my selenium function as docker image as described here https://github.com/umihico/docker-selenium-lambda.

There are lots of discussions about which version work with what. The most important issue about selenium is, you have to be careful about package and driver version when deploying to aws lambda.

Sign up to request clarification or add additional context in comments.

2 Comments

Can I know how to change the linux-chrome version to so it can be used for chrome driver 107.0.5304.18? (the zip link on your Dockerfile)
Not the case that selenium requires Dockerising your Lambda! I'll add an alternative approach as a separate answer, I got a zipped size of 70MB. Previously was 44MB (in the midst of an upgrade now, not too clear on why the size increased yet). I think chrome-aws-lambda is a missing piece of the puzzle here npmjs.com/package/chrome-aws-lambda
1

I created a guide for building a serverless architecture on aws using sam. The example I used was for a web scraper using selenium that scrapes a website and writes the data to a csv and stores it in an s3 bucket.

In case someone finds it useful - https://medium.com/@karthiks3000/aws-serverless-architecture-with-sam-part-1-7d22203c10bd

The post that deals with adding selenium to a lambda is here - https://medium.com/@karthiks3000/aws-serverless-architecture-with-sam-part-4-688873f5742

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.