PDFx - Extract references and metadata from PDF documents

Reading over many scientific papers and its references recently, I thought it would be great to be able to download all the references at once… This inspired me to write a little tool to do just that, and now it’s done and released under the Apache open source license:

https://github.com/metachris/pdfx

Features

Extract references and metadata from a given PDF
Detects pdf, url, arxiv and doi references
Fast, parallel download of all referenced PDFs
Find broken hyperlinks (using the -c flag)
Output as text or JSON (using the -j flag)
Extract the PDF text (using the --text flag)
Use as command-line tool or Python package
Compatible with Python 2 and 3
Works with local and online pdfs

source