A simple Python script to make a literature table

Geeky post again – no math this time, but computer code.

I’m sure people have done this before, but I thought it would be a nice opportunity to practice my Python skills to write a small script for the following problem.. Usually when I read a scientific article I watch out for the following elements:

  • Innovation: what does the study do what others haven’t done before?
  • Method: what method did they use?
  • Data: where did they get their data from?
  • Results: what are the main results?
  • Relevance: who benefits from this research, and how?

I also like to place the research in one of the four quadrants in this post. I find it helpful to make an overview of these questions in a table:

1st author Year Journal Quadrant Innovation Method Data Results Relevance
Kompas 2005 Journal of Productivity Analysis 4 Estimates efficiency gains quota trade for Southeast Trawl Fishery, AU Stochastic frontier analysis AFMA and ABARE survey data ITQs gave efficiency gains Policy debate on ITQs
Kompas 2006 Pacific Economic Bulletin 3 Estimates optimal effort levels and allocation across species Multifleet, multispecies, multiregion bioeconomic model SPC data Effort reduction needed; optimal stocks larger than BMSY Policy debate on MEY

But here’s the problem: I usually make my notes in a bibtex file (as a good geek should), which looks like this:

@ARTICLE{Kompas2006PacEconBull,
author = {Kompas, T. and Che, T.N.},
title = {Economic profit and optimal effort in the Western and Central Pacific tuna fisheries},
journal = {Pacific Economic Bulletin},
year = {2006},
volume = {21},
pages = {46-62},
number = {3},
data = {SPC data},
innovation = {Estimates optimal effort levels and allocation across species},
quadrant = {3},
keywords = {tuna; bioeconomic model; optimisation; Pacific},
method = {Multifleet, multispecies, multiregion bioeconomic model},
results = {Effort reduction needed; optimal stocks larger than BMSY},
relevance = {Policy debate on MEY}
}

@ARTICLE{Kompas2005JProdAnalysis,
author = {Kompas, Tom and Che, Tuong Nhu},
title = {Efficiency gains and cost reductions from individual transferable quotas: A stochastic cost frontier for the Australian South East fishery},
journal = {Journal of Productivity Analysis},
year = {2005},
volume = {23},
pages = {285-307},
number = {3},
quadrant = {3},
data = {AFMA and ABARE survey data},
innovation = {Estimates efficiency gains quota trade for Southeast Trawl Fishery, AU},
keywords = {individual transferable quotas; stochastic cost frontier; fishery efficiency; ITQs},
method = {Stochastic frontier analysis},
relevance = {Policy debate on ITQs.},
results = {ITQs gave efficiency gains}
}

I don’t want to copy it all by hand, so I wrote this little script in Python to convert all entries in the bibtex file to a csv file:

import csv
from bibtexparser.bparser import BibTexParser
from dicttoxml import dicttoxml
from operator import itemgetter

def readFirstAuthor(inpList,num):
author1 = ""
x = inpList[num]['author']
for j in x:
if j != ',':
author1+=j
else:
break
return author1

def selectDict(inpList,name):
outObj = []
for i in range(len(inpList)):
if name in inpList[i]['author'] and \
            inpList[i]['type']=='article':
outObj.append(inpList[i])
return(outObj)

def selectFieldsDict(inpList,fieldNames):
outObj = []
for i in range(len(inpList)):
temp = {}
for n in fieldNames:
if n == 'author':
author1 = readFirstAuthor(inpList,i)
temp['author'] = author1
else:
if n in inpList[i]:
temp[n] = inpList[i][n]
else:
temp[n] = 'blank'
outObj.append(temp)
return(outObj)

fieldnames = ['author','year','journal','quadrant',\
    'innovation','method','data','results','relevance']

with open('BibTexFile.bib', 'r') as bibfile:
bp = BibTexParser(bibfile)

record_list = bp.get_entry_list()
record_dict = bp.get_entry_dict()

dictSelection = selectDict(record_list,'Kompas')

fieldSelection = selectFieldsDict(dictSelection,fieldnames)


test = sorted(fieldSelection, key=itemgetter('year'))


test_file = open('output.csv','wb')
csvwriter = csv.DictWriter(test_file, delimiter=',',\
    fieldnames=fieldnames)
csvwriter.writerow(dict((fn,fn) for fn in fieldnames))
for row in test:
csvwriter.writerow(row)
test_file.close()

If you are a Python developer: any comments on this are welcome. I’m sure it’s not perfect.

Why (for the time being) I’m sticking with R

I’m a big fan of open source software. OK, I know the Dutch have a reputation for being stingy but let’s face it: much of the software we use in economics (Stata, Matlab, Maple) is terribly expensive. So the only time I can use these programs is at the office (which, I admit, should be considered a healthy thing). To be able to work on my laptop when I’m at home (or in a hotel room, or in an airplane, for that matter) I try to work as much as I can with their open source equivalents as much as I can.

One of the programmes I’ve been using is R (a horrible name to Google for by the way), but in a sort of on-and-off way. It is less user-friendly than Matlab, much slower than Matlab, and contains fewer possibilities for statistical analysis than Stata. So I’m still fiddling around with programming languages like C++ (probably even faster than Matlab, but rabidly user-hostile) and Python (more user-friendly than C++, and perhaps as fast as Matlab) for calculations.

Slowly, however, I’m coming round to R, in my teaching as well as in my research, for a number of reasons:

  • Marine biologists use it a lot, and using the same software helps the communication – it also makes it more likely that you can ask a close colleague how this @#%! package works.
  • By the same token: some of my students, i.e. those who have taken marine ecology modelling courses, know it already.
  • I can use it in my environmental valuation classes (statistics) as well as in my resource economics classes (modelling), so that again, some students in one course know it from another course I’m teaching.
  • It seems that R finally has a decent package to do conditional logit and probit (or, as others call it, alternative-specific multinomial logit and probit).

If only they could make it a lot faster, because it is too slow for value function iteration.