Language Log is one of my favorite daily blog reads, and today it had among other things a post about an online “term mining” tool (Termine) that analyzes text to attempt to identify multi-word terms (e.g., technical terms) within the text. The intended target for this tools is biomedical documents (the default setting for the online tool presumes that you’re submitting such a document), but I thought it might be fun to run the Book of Mormon (1830 text) through it. I tracked down an online version of the 1830 text (with no modern versification, etc.), saved it as a text file on my hard drive, then uploaded it to the online tool (changing the POS Tagger setting to “Tree Tagger 3.1”, described as “more suited to generic text”).
The results are after the jump. The “Score” (value to the right) is not just a straight frequency count but also indicates how often the term (or portions thereof) appears in other terms, which is why you end up with fractional values for some terms. It’s explained in more detail at the website for the tool (where it’s referred as the “C-Value”).
The results are interesting, if a bit mixed. Termine obviously assumes that it is parsing modern technical English and so trips up a bit when trying to parse archaic English in the Book of Mormon. So, for example, the single most frequent ‘term’ is “thou hast” (with “thou art” coming in at #6). Likewise, sequences such as “thy <noun>”, “o <noun>”, “<verb> ye/thou” and “art <word>” are mistakenly identified as terms and show up in the list.
(Also, for reasons that totally baffle me, in Termine’s results all the terms based on kings’ names [“King Benjamin”] have the word “king” replaced with “kingbolt” [e.g., “kingbolt benjamin”]. I have no idea why — it’s certainly not in the text — and I may contact the tool’s authors to find out what’s happening. I’ve change those back in the table below.)
There are no great surprises in the results, though there are some interesting terms high on the list. After the top ones that you would expect — “Lord God”, “Holy Ghost” and “Jesus Christ” — comes “beloved brethren”. And the socio-political nature of the Book of Mormon is reflected in the next two top terms: “judgment seat” and “chief judge”. Interestingly, from what I can tell, “judgment seat” is used strictly as a religious term (“judgment seat of God/Christ”) by in the personal writings, sermons, and editorial comments of Nephi1, Jacob, Mormon2 and Moroni, and strictly as a political term in the sections in-between (Alma 1 through 3 Nephi 7).
A bit further down the list is “foolish traditions”, which only appears within a subsection of the book of Alma (chapters 8, 21, 30 and 31) but is used by four different groups/individuals — the people of Ammonihah, an Amalekite [probably “Amlicite”; cf. this article] living among the Lamanites, the antichrist Korihor, and the Zoramites — in describing the Christ-centered Nephite religious beliefs. Apparently, it was a popular, if short-lived, derogatory phrase among those not of or opposed to the Nephite “Church of God“. I also have to wonder if the phrase wasn’t originally used or popularized by Alma2 and the sons of King Mosiah during their rebellious phase, since all four recorded incidents of its use are aimed at Alma2 or one of the sons of Mosiah. (To be strictly accurate, Korihor uses “foolish traditions” with some other leaders, then talks about “silly traditions” to Alma2). On the other hand, the Nephites tend to refer to the traditions of the Lamanites as “incorrect” or sometimes “wicked”, so there may be a bit of tit-for-tat going on here.
Anyway, have fun! The complete table is after the jump. ..bruce..