Rich Context API integrations for federating discovery services and metadata exchange across multiple scholarly infrastructure providers.
Development of the Rich Context knowledge graph uses this library to:
- identify dataset links to research publications
- locate open access publications
- reconcile journal references
- reconcile author profiles
- reconcile keyword taxonomy
This library has been guided by collaborative work on community building and metadata exchange to improve Scholarly Infrastructure, held at the 2019 Rich Context Workshop.
Prerequisites:
- Python 3.x
- Beautiful Soup
- Biopython.Entrez
- Crossref Commons
- Dimensions CLI
- Requests
- Requests-Cache
- Selenium
- xmltodict
To install from PyPi:
pip install richcontext.scholapi If you install directly from this Git repo, be sure to install the dependencies as well:
pip install -r requirements.txt Then copy the configuration file template rc_template.cfg to rc.cfg and populate it with your credentials.
NB: be careful not to commit the rc.cfg file in Git since by definition it will contain sensitive data, e.g., your passwords.
Parameters used in the configuration file include:
| parameter | value |
|---|---|
chrome_exe_path | path/to/chrome.exe |
core_apikey | CORE API key |
dimensions_password | Dimensions API password |
elsevier_api_key | Elsvier API key |
email | personal email address |
orcid_secret | ORCID API key |
repec_token | RePEc API token |
Download the ChromeDriver webdriver for the Chrome brower to enable use of Selenium. This will be run in a "headless" mode.
For a good (though slightly dated) tutorial for installing and testing Selenium on Ubuntu Linux, see: https://christopher.su/2015/selenium-chromedriver-ubuntu/
from richcontext import scholapi as rc_scholapi # initialize the federated API access schol = rc_scholapi.ScholInfraAPI(config_file="rc.cfg", logger=None) source = schol.openaire # search parameters for example publications title = "Deal or no deal? The prevalence and nutritional quality of price promotions among U.S. food and beverage purchases." # run it... if source.has_credentials(): response = source.title_search(title) # report results if response.message: # error case print(response.message) else: print(response.meta) source.report_perf(response.timing) First, be sure that you're testing the source and not from an installed library.
Then run unit tests on the APIs for which you have credentials and generate a coverage report:
coverage run -m unittest discover Then create GitHub issues among the submodules for any failed tests.
Also, you can generate a coverage report and upload that via:
coverage report bash <(curl -s https://codecov.io/bash) -t @.cc_token Test coverage reports can be viewed at https://codecov.io/gh/Coleridge-Initiative/RCApi
APIs used to retrieve metadata:
-
PubMed family
-
Scholix family
-
OA family
-
Misc.
See the coding examples in the test.py unit test for usage patterns per supported API.
ChromeDriver
If you encounter an exception about the ChromeDriver version, for example:
selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 78 Then check your instance of the Chrome browser to find its release number, then go to https://chromedriver.chromium.org/downloads to download the corresponding required version of ChromeDriver.
For more background about open access publications see:
Piwowar H, Priem J, Larivière V, Alperin JP, Matthias L, Norlander B, Farley A, West J, Haustein S. 2017.
The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles
PeerJ Preprints 5:e3119v1
https://doi.org/10.7287/peerj.preprints.3119v1
If you'd like to contribute, please see our listings of good first issues.
For info about joining the AI team working on Rich Context, see https://github.com/Coleridge-Initiative/RCGraph/blob/master/SKILLS.md
Contributors: @ceteri, @IanMulvany, @srand525, @ernestogimeno, @lobodemonte, plus many thanks for the inspiring 2019 Rich Context Workshop notes by @metasj, and guidance from @claytonrsh, @Juliaingridlane.