I am trying to scrape the "Traits" table from this website https://www.ebi.ac.uk/gwas/genes/SAMD12 (actually, the URL can change according to my necessity, but the structure will be the same).
The problem is that my knowledge is quite limited in web scraping, and I can't get this table using the basic BeautifulSoup workflow I've seen up to here.
Here's my code:
import requests from bs4 import BeautifulSoup url = 'https://www.ebi.ac.uk/gwas/genes/SAMD12' page = requests.get(url) I'm looking for the "efotrait-table":
efotrait = soup.find('div', id='efotrait-table-loading') print(efotrait.prettify()) <div class="row" id="efotrait-table-loading" style="margin-top:20px"> <div class="panel panel-default" id="efotrait_panel"> <div class="panel-heading background-color-primary-accent"> <h3 class="panel-title"> <span class="efotrait_label"> Traits </span> <span class="efotrait_count badge available-data-btn-badge"> </span> </h3> <span class="pull-right"> <span class="clickable" onclick="toggleSidebar('#efotrait_panel span.clickable')" style="margin-left:25px"> <span class="glyphicon glyphicon-chevron-up"> </span> </span> </span> </div> <div class="panel-body"> <table class="table table-striped borderless" data-export-types="['csv']" data-filter-control="true" data-flat="true" data-icons="icons" data-search="true" data-show-columns="true" data-show-export="true" data-show-multi-sort="false" data-sort-name="numberAssociations" data-sort-order="desc" id="efotrait-table"> </table> </div> </div> </div> Specifically, this one:
soup.select('table#efotrait-table')[0] <table class="table table-striped borderless" data-export-types="['csv']" data-filter-control="true" data-flat="true" data-icons="icons" data-search="true" data-show-columns="true" data-show-export="true" data-show-multi-sort="false" data-sort-name="numberAssociations" data-sort-order="desc" id="efotrait-table"> </table> As you can see, the table's content doesn't show up. In the website, there's an option for saving the table as csv. It would be awesome if I get this downloadable link somehow. But when I click in the link in order to copy it, I get "javascript:void(0)" instead. I've not studied javascript, should I?
The table is hidden, and even if it's not, I would need to interactively select more rows per page to get the whole table (and the URL doesn't change, so I can't get the table either).
I would like to know a way to get access to this table programmatically (unstructured info), then the minors about organizing the table will be fine. Any clues for how doing that (or what I should study) will be greatly appreciated.
Thanks in advance