4

I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.

Is there a way to do so? I thought about a text-based browser or something similar.

7
  • You might want to check out Perl's WWW::Mechanize module. Commented May 24, 2013 at 11:10
  • We would need to know how the login is performed, what the task is, what the webpage looks like (simple html, php, javascript etc). You question is not answerable in its current form. Commented May 24, 2013 at 11:11
  • @terdon play.google.com/apps/publish ;) I guess the issue is, that there is uploading a file involved. Commented May 24, 2013 at 11:15
  • curl has file upload posts, why not just cron a curl script? Commented May 24, 2013 at 11:40
  • @lynks Would this work? I not what curl is, but not more. Commented May 24, 2013 at 12:33

3 Answers 3

4

Have a look at WWW::Mechanize (Examples at http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod). It takes your webpage as object and makes all elements accessible via methods.

For example

$m->get("https://lists.ccs.neu.edu/bin/admindb/$listname"); $m->set_visible( $password ); $m->click; 

There are ports for (al least) ruby and python, too.

1
  • You can use the Firefox Live HTTP Headers plugin to record a session between your browser and the website, so that you understand the full path interaction, too. What cookies do you need to store? Present? What forms are called with what variables and hidden variables, etc. Once you have all that, it should be achievable to automate the task. Commented May 24, 2013 at 12:30
2

You can run Selenium on a headless installation on your server, e.g. by programming the actions in python using pyvirtualdisplay.

pyvirtualdisplay allows you to use a xvfb, xepher or xvnc screen so you can do screenshot (or take a remote peek to see what is going on).


On Ubuntu 12.04 install:

sudo apt-get install python-pip tightvncserver xtightvncviewer sudo pip install selenium pyvirtualdisplay 

and run the following (this is using the newer Selenium2 API, the older API is still available as well):

import subprocess from pyvirtualdisplay import Display from selenium import webdriver def browse_it(port=None): browser = webdriver.Firefox() page = browser.get('http://unix.stackexchange.com/questions') for question in browser.find_elements_by_class_name('question-hyperlink'): print question.text if port: print '--------\nconnect using:\n vncviewer ' + \ 'localhost:{}\nand click the xmessage to quit'.format(port) subprocess.call(['xmessage', 'click to quit']) browser.quit() def browse_it_hidden(rfbport=5904): with Display(backend='xvnc', rfbport=str(rfbport)) as disp: browse_it(rfbport) if __name__ == '__main__': browse_it_hidden() 

The xmessage prevents the browser to quit, in testing environments you would not want this. You can also call browse_it() directly to test in the foreground.

The results of Selenium's find_element.....() do not provide things like selecting the parent element of an element you just found. Something that you might expect from HTML parsing packages (I read somewhere this is on purpose). These limitations can be kind of hassle if you do scraping of pages you have no control over. When testing your own site, just make sure you generate all of the elements that you want to test with an id or unique class so they can be selected without hassle.

2
  • This sounds interesting, is there a documentation for getting started? Commented May 24, 2013 at 15:34
  • 1
    @Leandros I extended this with an example. If browsing look for the newer Selenium2 API examples, like I used here. There is an awful lot of selenium (1) examples around that still work, but take somewhat more effort to set up. Commented May 25, 2013 at 13:03
1

You could use either of:

  • Perl with WWW::Mechanize or even roll out your own using their HTTPClient
  • Selenium/WebDriver
  • a Google Chrome or Firefox Extension (existing or one that you write)
  • a shell script using curl and wget (you'll need to save and resend session data)
  • HtmlUnit
  • ...

Basically any language that lets you query a networked resource would do...

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.