I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.
Is there a way to do so? I thought about a text-based browser or something similar.
I want to automate a task which can only be done on a website (with prior login) on my debian server. There is no public API available, so I can't use one.
Is there a way to do so? I thought about a text-based browser or something similar.
Have a look at WWW::Mechanize (Examples at http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod). It takes your webpage as object and makes all elements accessible via methods.
For example
$m->get("https://lists.ccs.neu.edu/bin/admindb/$listname"); $m->set_visible( $password ); $m->click; There are ports for (al least) ruby and python, too.
Live HTTP Headers plugin to record a session between your browser and the website, so that you understand the full path interaction, too. What cookies do you need to store? Present? What forms are called with what variables and hidden variables, etc. Once you have all that, it should be achievable to automate the task. You can run Selenium on a headless installation on your server, e.g. by programming the actions in python using pyvirtualdisplay.
pyvirtualdisplay allows you to use a xvfb, xepher or xvnc screen so you can do screenshot (or take a remote peek to see what is going on).
On Ubuntu 12.04 install:
sudo apt-get install python-pip tightvncserver xtightvncviewer sudo pip install selenium pyvirtualdisplay and run the following (this is using the newer Selenium2 API, the older API is still available as well):
import subprocess from pyvirtualdisplay import Display from selenium import webdriver def browse_it(port=None): browser = webdriver.Firefox() page = browser.get('http://unix.stackexchange.com/questions') for question in browser.find_elements_by_class_name('question-hyperlink'): print question.text if port: print '--------\nconnect using:\n vncviewer ' + \ 'localhost:{}\nand click the xmessage to quit'.format(port) subprocess.call(['xmessage', 'click to quit']) browser.quit() def browse_it_hidden(rfbport=5904): with Display(backend='xvnc', rfbport=str(rfbport)) as disp: browse_it(rfbport) if __name__ == '__main__': browse_it_hidden() The xmessage prevents the browser to quit, in testing environments you would not want this. You can also call browse_it() directly to test in the foreground.
The results of Selenium's find_element.....() do not provide things like selecting the parent element of an element you just found. Something that you might expect from HTML parsing packages (I read somewhere this is on purpose). These limitations can be kind of hassle if you do scraping of pages you have no control over. When testing your own site, just make sure you generate all of the elements that you want to test with an id or unique class so they can be selected without hassle.
You could use either of:
Basically any language that lets you query a networked resource would do...