So far, I have always maintained my publication list manually as a LaTeX file, adding new publications as they were accepted or made public. However, doing so was an extremely tedious and frankly dull task, more so because some application processes require you to submit separate lists that include all publications, refereed (i.e. peer-reviewed) only, first-author only, first-author peer-reviewed only, and other permutations.
Therefore, I recently thought about how to streamline generating my publication list. My main aim was to automatise it as much as possible so that I could produce an up-to-date list quickly whenever needed. Another goal was to allow me to customise the selection of publications that appear in the list, particularly regarding first or coauthorship and peer-reviewed status. All the above requirements sounded like a programmatic approach might be best. Perhaps I could write a short script to build the list.
Indeed, I found a good solution that utilises a Python library to query the NASA Astrophysics Data System (ADS) database and automatically generates LaTeX code from the results. In particular, I used the ads
Python library. It is available from its GitHub repository and through PyPI. All you need to do is run a command similar to pip install ads
to install it on your system.
Then follow the quickstart tutorial on the ADS Read the Docs website, register a user account on the NASA ADS website, and create a new API key. Copy the API token string into a specific file on your local machine, and you are ready to query the ADS database using the ads
Python library!
Here is an example code that shows you how to query all publications in ADS, where I am a coauthor between the (arbitrary) years 2010 and 2026 inclusive.
from ads import SearchQuery
query_str = (
'=author:("Jankowski, Fabian") OR =author:("Jankowski, F.") =year:2010-2026'
)
query = SearchQuery(
q=query_str,
fl=[
"id",
"bibcode",
"author",
"title",
"volume",
"issue",
"page",
"page_range",
"pub",
"pubdate",
"year",
],
max_pages=10000, # retrieve all papers
sort="pubdate",
)
papers = list(query)
The query string q
is the same as you use on the ADS web interface. A leading equal sign denotes keywords. The list given in the fl
parameter instructs the ads
library on what publication attributes to download. Most of those are documented on ADS. By default, the library only downloads one page of 50 entries, which in this case are the 50 most recent publications. To download the data for all publications, you must set the max_pages
parameter to a number high enough to include all requested publications. 10k pages, times 50 entries per page, are definitely enough for me for the foreseeable future. ;-)
As ADS only allows you to make a limited number of queries per time interval, it is helpful to check how many queries the ads
library made and how many you still have left in the current interval. You can do that like this.
print(query.response.get_ratelimits())
Once you have the list of papers, you can pre-process them as required and output their properties in your publication list.
print("Total number of papers: {0}".format(len(papers)))
# sort by pubdate
papers.sort(key=lambda x: x.pubdate, reverse=True)
for i, item in enumerate(papers):
# Generate your publication list output here.
...
Each publication in the papers
list is an object with various attributes and methods. The fl
list above already contains some of the most interesting publication attributes. Namely, those are: author
(full authors list), title
(publication title), volume
(journal volume), issue
(journal issue), page
(start page of the publication in the journal’s issue), page_range
(start to end page range), pub
(journal), and pubdate
(publication date).
Based on those attributes, generating a nicely formatted entry for each paper in the publication list is reasonably straightforward. How you should format the list depends on the requested information and your personal taste.
I hope this helps!
Cheers.