24

I am using a remote selenium webdriver to perform some tests. At some point, however, I need to download a file and check its contents.

I am using the remote webdriver as follows (in python):

PROXY = ... prefs = { "profile.default_content_settings.popups":0, "download.prompt_for_download": "false", "download.default_directory": os.getcwd(), } chrome_options = Options() chrome_options.add_argument("--disable-extensions") chrome_options.add_experimental_option("prefs", prefs) webdriver.DesiredCapabilities.CHROME['proxy'] = { "httpProxy":PROXY, "ftpProxy":PROXY, "sslProxy":PROXY, "noProxy":None, "proxyType":"MANUAL", "class":"org.openqa.selenium.Proxy", "autodetect":False } driver = webdriver.Remote( command_executor='http://aaa.bbb.ccc:4444/wd/hub', desired_capabilities=DesiredCapabilities.CHROME) 

With a 'normal' webdriver I am able to download the file without issues on the local computer. Then I can use the testing code to e.g. verify the content of the downloaded file (which can change depending on test parameters). It is not a test of the download itself, but I need a way to verify the contents of the generated file ...

But how to do that using a remote webdriver? I have not found anything helpful anywhere...

9
  • What's the issue u r facing? Any error log? In case your browser runs on a remote host (due to the node setup) you might want to check write permissions of the browser default download directory. Also you can set this per driver via browser.download.dir for FF profile and download.default_directory for Chrome options. Commented Nov 2, 2017 at 8:10
  • @ekostadinov: Please see updated question; I added the complete options I am using, including the download-directory options... Commented Nov 2, 2017 at 9:28
  • 3
    You haven't answered the question of what is the issue you are facing. Commented Nov 6, 2017 at 16:09
  • 1
    I need to get the file to the place where it can be accessed by the test script... Commented Nov 7, 2017 at 13:57
  • 1
    I feel like you need a share drive to store those downloaded files. Commented Nov 10, 2017 at 8:40

7 Answers 7

19

The Selenium API doesn't provide a way to get a file downloaded on a remote machine.

But it's still possible with Selenium alone depending on the browser.

With Chrome the downloaded files can be listed by navigating chrome://downloads/ and retrieved with an injected <input type="file"> in the page :

from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait import os, time, base64 def get_downloaded_files(driver): if not driver.current_url.startswith("chrome://downloads"): driver.get("chrome://downloads/") return driver.execute_script( \ "return downloads.Manager.get().items_ " " .filter(e => e.state === 'COMPLETE') " " .map(e => e.filePath || e.file_path); " ) def get_file_content(driver, path): elem = driver.execute_script( \ "var input = window.document.createElement('INPUT'); " "input.setAttribute('type', 'file'); " "input.hidden = true; " "input.onchange = function (e) { e.stopPropagation() }; " "return window.document.documentElement.appendChild(input); " ) elem._execute('sendKeysToElement', {'value': [ path ], 'text': path}) result = driver.execute_async_script( \ "var input = arguments[0], callback = arguments[1]; " "var reader = new FileReader(); " "reader.onload = function (ev) { callback(reader.result) }; " "reader.onerror = function (ex) { callback(ex.message) }; " "reader.readAsDataURL(input.files[0]); " "input.remove(); " , elem) if not result.startswith('data:') : raise Exception("Failed to get file content: %s" % result) return base64.b64decode(result[result.find('base64,') + 7:]) capabilities_chrome = { \ 'browserName': 'chrome', # 'proxy': { \ # 'proxyType': 'manual', # 'sslProxy': '50.59.162.78:8088', # 'httpProxy': '50.59.162.78:8088' # }, 'goog:chromeOptions': { \ 'args': [ ], 'prefs': { \ # 'download.default_directory': "", # 'download.directory_upgrade': True, 'download.prompt_for_download': False, 'plugins.always_open_pdf_externally': True, 'safebrowsing_for_trusted_sources_enabled': False } } } driver = webdriver.Chrome(desired_capabilities=capabilities_chrome) #driver = webdriver.Remote('http://127.0.0.1:5555/wd/hub', capabilities_chrome) # download a pdf file driver.get("https://www.mozilla.org/en-US/foundation/documents") driver.find_element_by_css_selector("[href$='.pdf']").click() # list all the completed remote files (waits for at least one) files = WebDriverWait(driver, 20, 1).until(get_downloaded_files) # get the content of the first file remotely content = get_file_content(driver, files[0]) # save the content in a local file in the working directory with open(os.path.basename(files[0]), 'wb') as f: f.write(content) 

With Firefox, the files can be directly listed and retrieved by calling the browser API with a script by switching the context :

from selenium import webdriver from selenium.webdriver.support.ui import WebDriverWait import os, time, base64 def get_file_names_moz(driver): driver.command_executor._commands["SET_CONTEXT"] = ("POST", "/session/$sessionId/moz/context") driver.execute("SET_CONTEXT", {"context": "chrome"}) return driver.execute_async_script(""" var { Downloads } = Components.utils.import('resource://gre/modules/Downloads.jsm', {}); Downloads.getList(Downloads.ALL) .then(list => list.getAll()) .then(entries => entries.filter(e => e.succeeded).map(e => e.target.path)) .then(arguments[0]); """) driver.execute("SET_CONTEXT", {"context": "content"}) def get_file_content_moz(driver, path): driver.execute("SET_CONTEXT", {"context": "chrome"}) result = driver.execute_async_script(""" var { OS } = Cu.import("resource://gre/modules/osfile.jsm", {}); OS.File.read(arguments[0]).then(function(data) { var base64 = Cc["@mozilla.org/scriptablebase64encoder;1"].getService(Ci.nsIScriptableBase64Encoder); var stream = Cc['@mozilla.org/io/arraybuffer-input-stream;1'].createInstance(Ci.nsIArrayBufferInputStream); stream.setData(data.buffer, 0, data.length); return base64.encodeToString(stream, data.length); }).then(arguments[1]); """, path) driver.execute("SET_CONTEXT", {"context": "content"}) return base64.b64decode(result) capabilities_moz = { \ 'browserName': 'firefox', 'marionette': True, 'acceptInsecureCerts': True, 'moz:firefoxOptions': { \ 'args': [], 'prefs': { # 'network.proxy.type': 1, # 'network.proxy.http': '12.157.129.35', 'network.proxy.http_port': 8080, # 'network.proxy.ssl': '12.157.129.35', 'network.proxy.ssl_port': 8080, 'browser.download.dir': '', 'browser.helperApps.neverAsk.saveToDisk': 'application/octet-stream,application/pdf', 'browser.download.useDownloadDir': True, 'browser.download.manager.showWhenStarting': False, 'browser.download.animateNotifications': False, 'browser.safebrowsing.downloads.enabled': False, 'browser.download.folderList': 2, 'pdfjs.disabled': True } } } # launch Firefox # driver = webdriver.Firefox(capabilities=capabilities_moz) driver = webdriver.Remote('http://127.0.0.1:5555/wd/hub', capabilities_moz) # download a pdf file driver.get("https://www.mozilla.org/en-US/foundation/documents") driver.find_element_by_css_selector("[href$='.pdf']").click() # list all the downloaded files (waits for at least one) files = WebDriverWait(driver, 20, 1).until(get_file_names_moz) # get the content of the last downloaded file content = get_file_content_moz(driver, files[0]) # save the content in a local file in the working directory with open(os.path.basename(files[0]), 'wb') as f: f.write(content) 
Sign up to request clarification or add additional context in comments.

19 Comments

I know how to download a file using the web driver. But this was not my question...
@Alex, your question is how to download a file with a remote instance of the driver which is exactly what I provided. In the example the file is downloaded on the remote machine and retrieved on the client machine. With the content of the file on the client side you can easily validate the file.
Sorry I misread parts of your answer. But in my case I also need to set a proxy in those desiredCapabilities. Currently the setup is something like: PROXY = "some.proxy.ch:80" webdriver.DesiredCapabilities.CHROME['proxy'] = { "httpProxy":PROXY, "ftpProxy":PROXY, "sslProxy":PROXY, "noProxy":None, "proxyType":"MANUAL", "class":"org.openqa.selenium.Proxy", "autodetect":False } Will this work with your suggested solution?
@Savvy, my bad the command is not implemented in the Java client. Have a look at Take full page screenshot for an example on how to call a command.
@FlorentB. It looks like it's not working anymore
|
6

Webdriver:

If you are using webdriver, means your code uses the internal Selenium client and server code to communicate with browser instance. And the downloaded files are stored in the local machine which can be directly accessed by using languages like java, python, .Net, node.js, ...

Remote WebDriver [Selenium-Grid]:

If you are using Remote webdriver means you are using GRID concept, The main purpose of the Gird is To distribute your tests over multiple machines or virtual machines (VMs). Form this your code uses Selenium client to communicate with Selenium Grid Server, which passes instruction to the Registered node with the specified browser. Form their Grid Node will pass the instructions form browser-specific driver to browser instance. Here the downloads takes place to the file-system | hard-disk of that system, but users don't have access to the file-system on the virtual machines where the browser is running.

  • By using javascript if we can access the file, then we can convert the file to base64-String and return to the client code. But for security reasons Javascript will not allow to read files form Disk.

  • If Selenium Grid hub and node's are in same system, and they are in public Network then you may change the path of the downloaded file to Some of the public downloaded paths like ../Tomcat/webapps/Root/CutrentTimeFolder/file.pdf. By using the public URL you can access the file directly.

For example downloading the file[] from Root folder of tomcat.

System.out.println("FireFox Driver Path « "+ geckodriverCloudRootPath); File temp = File.createTempFile("geckodriver", null); temp.setExecutable(true); FileUtils.copyURLToFile(new URL( geckodriverCloudRootPath ), temp); System.setProperty("webdriver.gecko.driver", temp.getAbsolutePath() ); capabilities.setCapability("marionette", true); 
  • If Selenium Grid hub and node are not in same system, the you may not get the downloaded file, because Grid Hub will be in public network[WAN] and Node will in private network[LAN] of the organisation.

You can change browser's downloading files path to a specified folder on hard disk. By using the below code.

String downloadFilepath = "E:\\download"; HashMap<String, Object> chromePrefs = new HashMap<String, Object>(); chromePrefs.put("profile.default_content_settings.popups", 0); chromePrefs.put("download.default_directory", downloadFilepath); ChromeOptions options = new ChromeOptions(); HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>(); options.setExperimentalOption("prefs", chromePrefs); options.addArguments("--test-type"); options.addArguments("--disable-extensions"); //to disable browser extension popup DesiredCapabilities cap = DesiredCapabilities.chrome(); cap.setCapability(ChromeOptions.CAPABILITY, chromeOptionsMap); cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true); cap.setCapability(ChromeOptions.CAPABILITY, options); RemoteWebDriver driver = new ChromeDriver(cap); 

@ See

Comments

6

@FlorentB's answer for Chrome works up until Chrome version 79. For newer versions, the function get_downloaded_files needed to be updated as the downloads.Manager is not accessible anymore. However, this updated version should work with previous versions as well.

def get_downloaded_files(driver): if not driver.current_url.startswith("chrome://downloads"): driver.get("chrome://downloads/") return driver.execute_script( \ "return document.querySelector('downloads-manager') " " .shadowRoot.querySelector('#downloadsList') " " .items.filter(e => e.state === 'COMPLETE') " " .map(e => e.filePath || e.file_path || e.fileUrl || e.file_url); ") 

1 Comment

For anyone doing this Dec 2023 onwards, you need to replace e.state === 'COMPLETE' with e.state === 2. Thanks for the pointless change Chrome team.
2

This is just the Java version of @Florent 's answer above. With a lot of guidance from him and some digging and tweaking I was finally able to get it to work for Java. I figured I could save other people some time by laying it out here.

Firefox

First we need to create a custom firefox driver because we need to use the SET_CONTEXT command which is not implemented in the Java client(as of selenium - 3.141.59)

public class CustomFirefoxDriver extends RemoteWebDriver{ public CustomFirefoxDriver(URL RemoteWebDriverUrl, FirefoxOptions options) throws Exception { super(RemoteWebDriverUrl, options); CommandInfo cmd = new CommandInfo("/session/:sessionId/moz/context", HttpMethod.POST); Method defineCommand = HttpCommandExecutor.class.getDeclaredMethod("defineCommand", String.class, CommandInfo.class); defineCommand.setAccessible(true); defineCommand.invoke(super.getCommandExecutor(), "SET_CONTEXT", cmd); } public Object setContext(String context) { return execute("SET_CONTEXT", ImmutableMap.of("context", context)).getValue(); } } 

The code below retrieves the content of a downloaded .xls file and saves it as a file(temp.xls) in the same directory where the Java class is run. In Firefox this is fairly straightforward as we can use the browser API

public String getDownloadedFileNameBySubStringFirefox(String Matcher) { String fileName = ""; ((CustomFirefoxDriver) driver).setContext("chrome"); String script = "var { Downloads } = Components.utils.import('resource://gre/modules/Downloads.jsm', {});" + "Downloads.getList(Downloads.ALL).then(list => list.getAll())" + ".then(entries => entries.filter(e => e.succeeded).map(e => e.target.path))" + ".then(arguments[0]);"; String fileNameList = js.executeAsyncScript(script).toString(); String name = fileNameList.substring(1, fileNameList.length() -1); if(name.contains(Matcher)) { fileName = name; } ((CustomFirefoxDriver) driver).setContext("content"); return fileName; } public void getDownloadedFileContentFirefox(String fileIdentifier) { String filePath = getDownloadedFileNameBySubStringFirefox(fileIdentifier); ((CustomFirefoxDriver) driver).setContext("chrome"); String script = "var { OS } = Cu.import(\"resource://gre/modules/osfile.jsm\", {});" + "OS.File.read(arguments[0]).then(function(data) {" + "var base64 = Cc[\"@mozilla.org/scriptablebase64encoder;1\"].getService(Ci.nsIScriptableBase64Encoder);" + "var stream = Cc['@mozilla.org/io/arraybuffer-input-stream;1'].createInstance(Ci.nsIArrayBufferInputStream);" + "stream.setData(data.buffer, 0, data.length);" + "return base64.encodeToString(stream, data.length);" + "}).then(arguments[1]);" ; Object base64FileContent = js.executeAsyncScript(script, filePath);//.toString(); try { Files.write(Paths.get("temp.xls"), DatatypeConverter.parseBase64Binary(base64FileContent.toString())); } catch (IOException i) { System.out.println(i.getMessage()); } } 

Chrome

We need to employ a different approach to achieve the same goal in Chrome. We append an input file element to the Downloads page and pass the file location to this element. Once this element points to our required file, we use it to read its content.

public String getDownloadedFileNameBySubStringChrome(String Matcher) { String file = ""; //The script below returns the list of files as a list of the form '[$FileName1, $FileName2...]' // with the most recently downloaded file listed first. String script = "return downloads.Manager.get().items_.filter(e => e.state === 'COMPLETE').map(e => e.file_url);" ; if(!driver.getCurrentUrl().startsWith("chrome://downloads/")) { driver.get("chrome://downloads/"); } String fileNameList = js.executeScript(script).toString(); //Removing square brackets fileNameList = fileNameList.substring(1, fileNameList.length() -1); String [] fileNames = fileNameList.split(","); for(int i=0; i<fileNames.length; i++) { if(fileNames[i].trim().contains(Matcher)) { file = fileNames[i].trim(); break; } } return file; } public void getDownloadedFileContentChrome(String fileIdentifier) { //This causes the user to be navigated to the Chrome Downloads page String fileName = getDownloadedFileNameBySubStringChrome(fileIdentifier); //Remove "file://" from the file path fileName = fileName.substring(7); String script = "var input = window.document.createElement('INPUT'); " + "input.setAttribute('type', 'file'); " + "input.setAttribute('id', 'downloadedFileContent'); " + "input.hidden = true; " + "input.onchange = function (e) { e.stopPropagation() }; " + "return window.document.documentElement.appendChild(input); " ; WebElement fileContent = (WebElement) js.executeScript(script); fileContent.sendKeys(fileName); String asyncScript = "var input = arguments[0], callback = arguments[1]; " + "var reader = new FileReader(); " + "reader.onload = function (ev) { callback(reader.result) }; " + "reader.onerror = function (ex) { callback(ex.message) }; " + "reader.readAsDataURL(input.files[0]); " + "input.remove(); " ; String content = js.executeAsyncScript(asyncScript, fileContent).toString(); int fromIndex = content.indexOf("base64,") +7 ; content = content.substring(fromIndex); try { Files.write(Paths.get("temp.xls"), DatatypeConverter.parseBase64Binary(content)); } catch (IOException i) { System.out.println(i.getMessage()); } } 

The reason I needed this setup was because my test suite was running on a Jenkin's server; and the Selenium Grid hub and Node set up it was pointing to was running in Docker containers(https://github.com/SeleniumHQ/docker-selenium) on a different server. Once again, this is just a Java translation of @Florent 's answer above. Please refer it for more info.

2 Comments

Where is the js object coming from?
I found it IJavaScriptExecutor js = (IJavaScriptExecutor)driver;
1

I found this article on Medium. It references another tutorial that may help.

https://lindajosiah.medium.com/python-selenium-docker-downloading-and-saving-files-ebb9ab8b2039

I am using a docker image for the python download script and a docker stack for the selenium hub.

Source: https://github.com/SeleniumHQ/docker-selenium/blob/trunk/docker-compose-v2.yml

version: '2' services: chrome: image: selenium/node-chrome:4.8.1-20230306 shm_size: 2gb depends_on: - selenium-hub environment: - SE_EVENT_BUS_HOST=selenium-hub - SE_EVENT_BUS_PUBLISH_PORT=4442 - SE_EVENT_BUS_SUBSCRIBE_PORT=4443 ports: - "6900:5900" networks: - scraper-service volumes: - ./downloads:/home/seluser/Downloads // <= link a local directory to the downloads location selenium-hub: image: selenium/hub:4.8.1-20230306 ports: - "4442:4442" - "4443:4443" - "4444:4444" networks: - scraper-service networks: scraper-service: external: true 

And then I get set the download directory in my python script.

from selenium import webdriver options = webdriver.ChromeOptions() options.add_experimental_option("prefs", { "download.default_directory": "/home/seluser/Downloads/", // <= link to the downloads location "download.prompt_for_download": False, "download.directory_upgrade": True, "safebrowsing_for_trusted_sources_enabled": False, "safebrowsing.enabled": False }) chrome = webdriver.Remote( command_executor='http://selenium-hub:4444/wd/hub', options=options) 

You can really set any external volume you want.

1 Comment

Consider volume permission of the container and the host or other containers. Selenium is using a certain userid and groupid. By fixing that this solution works with Compose. Thx.
0

This works for PHP php-webdriver in 2020 for Chrome:

$downloaddir = "/tmp/"; $host = 'http://ipaddress:4444/wd/hub'; try { $options = new ChromeOptions(); $options->setExperimentalOption("prefs",["safebrowsing.enabled" => "true", "download.default_directory" => $downloaddir]); $options->addArguments( array("disable-extensions",'safebrowsing-disable-extension-blacklist','safebrowsing-disable-download-protection') ); $caps = DesiredCapabilities::chrome(); $caps->setCapability(ChromeOptions::CAPABILITY, $options); $caps->setCapability("unexpectedAlertBehaviour","accept"); $driver = RemoteWebDriver::create($host, $caps); $driver->manage()->window()->setPosition(new WebDriverPoint(500,0)); $driver->manage()->window()->setSize(new WebDriverDimension(1280,1000)); $driver->get("https://file-examples.com/index.php/sample-documents-download/sample-rtf-download/"); sleep(1); $driver->findElement(WebDriverBy::xpath("//table//tr//td[contains(., 'rtf')]//ancestor::tr[1]//a"))->click(); sleep(1); $driver->get('chrome://downloads/'); sleep(1); // $inject = "return downloads.Manager.get().items_.filter(e => e.state === 'COMPLETE').map(e => e.filePath || e.file_path); "; $inject = "return document.querySelector('downloads-manager').shadowRoot.querySelector('downloads-item').shadowRoot.querySelector('a').innerText;"; $filename = $driver->executeScript(" $inject" ); echo "File name: $filename<br>"; $driver->executeScript( "var input = window.document.createElement('INPUT'); ". "input.setAttribute('type', 'file'); ". "input.hidden = true; ". "input.onchange = function (e) { e.stopPropagation() }; ". "return window.document.documentElement.appendChild(input); " ); $elem1 = $driver->findElement(WebDriverBy::xpath("//input[@type='file']")); $elem1->sendKeys($downloaddir.$filename); $result = $driver->executeAsyncScript( "var input = arguments[0], callback = arguments[1]; ". "var reader = new FileReader(); ". "reader.onload = function (ev) { callback(reader.result) }; ". "reader.onerror = function (ex) { callback(ex.message) }; ". "reader.readAsDataURL(input.files[0]); ". "input.remove(); " , [$elem1]); $coding = 'base64,'; $cstart = strpos( $result, 'base64,' ); if ( $cstart !== false ) $result = base64_decode(substr( $result, $cstart + strlen($coding) )); echo "File content: <br>$result<br>"; $driver->quit(); } catch (Exception $e) { echo 'Caught exception: ', $e->getMessage(), "\n"; } 

1 Comment

While it’s acceptable to provide code-only answers, it’s often more useful for the community if you can also provide an explanation of the code and help people understand how it solves the problem. That can reduce the number of follow-up questions, and help new developers understand the underlying concepts. Would you mind updating your question with additional detail?
0

If for some reason (pylint) you would like to avoid accessing a protected member (elem._execute), then the line:

elem._execute('sendKeysToElement', {'value': [ path ], 'text': path}) 

in @FlorentB's answer can be rewritten as:

elem.parent.execute('sendKeysToElement', {'value': [ path ], 'text': path, 'id': elem.id}) 

Source: https://github.com/SeleniumHQ/selenium/blob/trunk/py/selenium/webdriver/remote/webelement.py in line 703, 708, and 727

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.