3

I like to use org to take notes on PDFs. I find Zotero a bit "scattered" and too linked to the papers themselves rather than just being notes.

However, manually extracting information from PDFs can be a pain. Is there a way to pull metadata out of a pdf from a URL and insert it into org-mode?

The metadata I want is the title, authors and institution, and a citation probably.

Progress so far

  • Tried papis cli, was put off by the "backend" concept for different paper sources
  • Considered bibtex - but feels like it takes a bit of setting up and I'm not heavy weight yet
  • This answer mentioned cb2bib that could be wrapped to do what I want - however it needs to be build from source.
  • I'm looking into zotero-cli to see if it'll do what I want.
6
  • What metadata are you looking for? Isn't it enough to store the URL as a link in the Org mode file? Commented Jul 14, 2023 at 14:15
  • I added this information. I guess most of it would be in a citation to the pdf. The title is useful for search against and remember when talking to other people. People in academia are weird about citations, sometimes I am lazy and quote one author and feel guilty. Commented Jul 14, 2023 at 14:18
  • 3
    org-ref github.com/jkitchin/org-ref provides a lot of tools for working with pdf metadata, particularly for academic writing via bibtex etc. Commented Jul 14, 2023 at 14:57
  • 1
    Nowadays, Org mode comes with org-cite which might or might not help (I have not used it). I remembered org-ref but I see that @Tyler has already recommended it. I'm not sure what you mean by "getting a citation": just unformatted text, containing the metadata you mentioned? IME, "citations" are entries in a database, so they need to conform to the database schema. Commented Jul 14, 2023 at 15:35
  • 2
    The usual tools for reference management in Emacs include org-ref, Zotero (which can be configured to create a local bibtex database), and github.com/tmalsburg/helm-bibtex. It sounds like you want something more lightweight and customized to your workflow. I suspect this may end up being more work than what it would take to use one of the existing tools, so it might be worth taking a look. I think ivy-bibtex, which is part of helm-bibtex, is likely the simplest option. Commented Jul 14, 2023 at 15:49

1 Answer 1

1

Okay... I was pretty keen to get his set up so spent a while fiddling.

Zotero

As Tyler says I think Zotero with helm-bibtex is the way to go, and this still seems pretty lightweight. There's a little bit of magic need to get Zotero to create a bibtex file and keep it up-to-date, for this you can use zotero better bibtex.

  • Install Zotero
  • Download better bibtex for zotero. Firefox will try to open the xpi extension itself - so right click and the download link and click "save as".
  • Go to Tools > Addons and add add the better bibtex extension that you downloaded
  • Restart zotero
  • Go to your library. Right click on a collection, click export, setting the exporter to better bibtex and click "keep uptodate". Export to ~/references.bib
  • Install helm-bibtex
  • Set the bibliography path (setq bibtex-completion-bibliography '("~/library.bib")
  • helm-bibtex can then open references from your bibliography. If you press tab in helm you can also insert citations
  • To insert a citation you can use
(defun my-helm-bibtex-cite () (interactive) (let ((helm-source-bibtex (copy-alist helm-source-bibtex))) (helm-add-action-to-source "Reference 4" 'helm-bibtex-insert-reference helm-source-bibtex 0) (helm-bibtex))) 

Including URLs

This works well enough... unless you want urls. The code to creating cites is all rather hard coded and "monkey patch'y". A similar approach could be used for custom interfaces

Zotero does not include urls by default - because apparently bibtex clients don't support urls. So you need to enable an option in better bibtex in zotero

(defun my-helm-bibtex-cite-with-url () (interactive) (let ((helm-source-bibtex (copy-alist helm-source-bibtex))) (helm-add-action-to-source "Reference" 'my-helm-bibtex-insert-reference-and-url helm-source-bibtex 0) (helm-bibtex))) ;; hack won't work for multiple entries (helm-bibtex-helmify-action my-bibtex-completion-insert-reference-and-url my-helm-bibtex-insert-reference-and-url) (defun my-bibtex-completion-insert-reference-and-url (keys) "Insert references for entries in KEYS." (let* ((refs (--map (s-word-wrap fill-column (concat "\n- " (bibtex-completion-apa-format-reference it))) keys))) (insert (s-format "${url} ${doi}" 'bibtex-completion-apa-get-value (bibtex-completion-get-entry (car keys))) "\n" (s-join "\n" refs) "\n"))) 

User experience

Copying and pasting a URL is wonderfully simple. Things aren't so simple with zotero... though it's not too bad.

  • You can add zotero connect to your toolbar in firefox (no shortcut unfortunately) by right-clicking.
  • You still need to look up added entry once you've added it. But the bibtex file seems to be updated pretty instantly and you often will only need to type a few characters.

Not using zotero

So before this I also did it in a command line way... given the time now I think it was quicker to do than use zotero. But the zotero solution has a few benefits (zotero is kept up-to-date, files are downloaded, zotero has quite a nice interfaces for some things, zotero allows annotation, etc etc).

This script (called cite-fetch) will fetch citations from arxiv (and only arxiv). There is also a way of getting citations in bibtex format from crossref - but arxiv citations appeared to be missing.

#!/usr/bin/python3 import argparse import io import subprocess from subprocess import PIPE import tempfile from pathlib import Path from urllib.parse import urlparse, urlunparse import requests HERE = Path(__file__).parent EXPORT = HERE / "simple.bibtexconv" parser = argparse.ArgumentParser() parser.add_argument("url") parser.add_argument( "-f", "--format", type=str, help="What format to output in. (Defaults to bibtex)", choices=("simple", "bibtex"), default="bibtex", ) args = parser.parse_args() parts = urlparse(args.url) if parts.path.startswith("/pdf/"): new_path = Path("/bibtex") / Path(parts.path).relative_to(Path("/pdf")) elif parts.path.startswith("/abs/"): new_path = Path("/bibtex") / Path(parts.path).relative_to(Path("/abs")) else: raise ValueError(args.url) new_parts = parts._replace(path=str(new_path)) new_path = urlunparse(new_parts) response = requests.get(new_path) content = response.text if args.format == "bibtex": print(content) elif args.format == "simple": with tempfile.TemporaryDirectory() as temp_dir: temp_dir = Path(temp_dir) bib_file = temp_dir / "bib.bib" with open(bib_file, "w") as stream: stream.write(response.text) with open(EXPORT) as stream: template = stream.read() p = subprocess.Popen( ["bibtexconv", bib_file], stdin=PIPE, stdout=PIPE, stderr=PIPE ) output, _ = p.communicate(template.encode("utf-8")) print(args.url) print(output.decode("utf8")) 

You can then use this after copying a url from the clipboard to insert a citation.

(defun my-insert-cite (url) (interactive (list (shell-command-to-string "pbpaste || xclip -o"))) (insert (shell-command-to-string (format "cite-fetch -f simple %s" url)))) 

Alternatives to helm-bibtex

It looks like there are a few alternatives to helm-bibtex that could similarly read a citation generate by better bibtex in zotero.

org-citar in one choice - which can also read CSL citation libraries - which seems like it might be a more common standard.

1
  • The helm-bibtex code to parse bibtex all seems a little ad hoc and regexpy. So do complex things I think either using a python library or converting to bibjson might be the way to go. Commented Jul 22, 2023 at 23:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.