Skip to main content

Research Question!

1 min read

Suppose I give you a book but don’t tell you the title or what kind of book it is supposed to be. Could you figure out the genre just by reading it? Probably; we all have some sense of what makes a fantasy novel different from an autobiography. I am curious if a computer could similarly distinguish different genres of writing - specifically, could one use topic modeling to classify a book? Suppose we develop some algorithm to somehow rate the “genre-similarity” of a huge literary corpus. What genres would we find there? Would they match up with the genres we are familiar with, or would the algorithm group together what we would consider very different novels? I imagine that we would find weird clusters of books, since this is essentially topic modeling but on a larger scale, and the topics we discovered when working with topic modeling were sometimes humanly coherent and sometimes not. But to see the final product of this type of analysis - some big picture of a huge number of literary works, grouped together by some features of language, I think would be pretty cool.