Using Python for authorship attribution in Renaissance drama.
About a third of the plays written for the London theatres of 400 years
ago -- when Shakespeare was writing -- were published anonymously, so we
don't know who wrote them. For the last century, investigators have
counted things in these plays in an attempt to decide who wrote them.
Such inquiries have been helped by advances in technology. We can now
store these texts electronically and automate much of the counting.
These advances have led to a refinement in methods for attributing
authorship, with scholars having a battery of reliable tests for
determining the author or authors of a piece of writing.
Scholars who specialise in authorship attribution tend to give their
attention almost exclusively to Shakespeare's writing and few use
automated means to produce their results. This talk -- and the programs
behind it -- addresses both these facts. It will show how authorship
attribution, by a number of methods, can be automated in Python and
yield worthwhile results for the study of Renaissance drama. By using
its inbuilt data structures and a few libraries (like pandas, numpy, and
math), we can write programs that find likely candidate authors for
writing from the period.
Discovering more about authorship in the Renaissance is vital to our
understanding of the period. Slowly, ideas that placed Shakespeare alone
as a solitary genius are being replaced by models of a more
collaborative theatre industry, where people co-authored plays more
frequently than previously thought. This talk will help to continue this
work as well as showing how researchers can automate their endeavours in
a way that others can replicate and understand with only a little