Assembling the Text
Wednesday, June 24, 2009 at 8:52PM Given the obvious difficulty in working with either the manuscript or published text-form, I'm fortunate that the subject of this textual analysis is so readily available in digital form.
The first step in assembling the text into analysable form within Tinderbox is to acquire the my copy of the text. Given the significant investment Christians have made in making this text available, that is readily achieved. (I'm looking forward to the day when every published text is as readily available.)
Nipping over to Bible Gateway begins to get us close. Close because while my purpose is to analyse the scriptural text, this passage still has translator's headings, versification, footnotes and cross-references embedded in the text. Most of these have to be removed.
First John, Chapter 1, English Standard Version, as transmitted from Bible Gateway.
I copied and pasted this chapter, as well as each subsequent chapter, of First John into Microsoft Word.
First John, Chapter 1, English Standard Version, as pasted into Microsoft Word 2003 running in Windows XP Pro under Parallels v4 on Mac OS X 10.5.7.
Microsoft Word is invaluable in this regard, because it is the only non-programmatic way I know of searching and replacing text using regular expressions. The full list of regular expressions used are as follows:
| Search | Replace |
| \(?\) | <space> |
| \(??\) | <space> |
| \[?\] | <space> |
First John, Chapter 1, English Standard Version, in Microsoft Word 2003 having cross-references and footnotes removed.
At this point, I save the document into a different version, because I'm going to take a slight detour. While I definitely want to get the text into Tinderbox, I'd first like to do a quick and coarse corpus analysis; because, hey, I'm curious.
Quick Lexical Study
I want to rank the lexical items in order of use, so that I can get a quick check on the thematic focus of the author. The thing to do is to cleanse the text of anything that is not a lexical item.
Step one is to manually delete the headings. It's a short text, so it doesn't take too long. (It's quicker than figuring out how to make Microsoft Word return all paragraphs that do not contain a verse number. If I were examining, say, the book of Isaiah, I might just stop to figure this out.)
| Search | Replace |
| [0-9] | <space> |
| --- | <space> |
| , | <space> |
| . | <space> |
| " | <space> |
| ^p | <space> |
Then run this replacement multiple times until it returns a result of zero replacements.
| Search | Replace |
| <space><space> | <space> |
Finally, switch off wildcard search, and run this replacement.
| Search | Replace |
| ? | <space> |
Select the entire text, and toggle Shift+F3 until it is all lowercased, to obtain the following text.
First John, Chapter 1, English Standard Version, in Microsoft Word 2003 having been purged of all non-lexical items.
Here's the part where it stops feeling like manual labour.
I searched for all spaces, replacing them with paragraph marks.
| Search | Replace |
| <space> | ^p |
Then sorted all paragraphs alphabetically, to get a complete list of all uses of each word, in alphabetical order.
Copy this entire text, and pasting it into Excel, gives us the power of using pivot to tables to automatically count the number of uses of each word.
The following image shows four areas of working. Column A contains the complete listing of all words copied from Microsoft Word. Columns D and E show the pivot table counts. I copy the pivot table into Columns G and H (Paste Special>Values) so that I can work with it. I manually stemmed the words so that abide, abides, abiding all get included in the one count. I then resorted Columns J and K according to descending numeric order. Finally, I was able to begin tagging the words according to parts of speech, in order to get at my target: lexical item counts.
Lexical Analysis of the words of First John, in Microsoft Excel 2003, running in Windows XP Pro under Parallels v4 on Mac OS X 10.5.7.
Results of Lexical Analysis
Most Frequent Verbs in First John
46 love
34 know
23 abide
?? sin [ NOTE 1 ]
13 commandment [ NOTE 2 ]
11 hear
11 keep
10 testify
[ NOTE 1:I'll have to look more closely to see how many instances refer to sinning (verb), sin (as state) or sin (as thing)]
[ NOTE 2:this is a nominalised verb; I'll have to look at the usages to determine the extent to which it should be treated as a verb. e.g. "keep the commandment" is verbal; whereas "when we sin against the holy commandments" would be a nominal use. ]
Most frequent Nouns or Pronouns in First John
83 we
66 god
57 you
51 him
45 he
42 his
39 us
23 world
22 son
?? sin [ NOTE 1 ]
15 brother
15 our
14 children
13 father
13 spirit
13 jesus
Most Frequent Adjectives in First John
7 darkness [ NOTE 3 ]
6 beloved[NOTE4]
6 eternal
6 evil [ NOTE 4 ]
6 light [ NOTE 4 ]
[ NOTE 3: A nominalised adjective. I'm interested in tagging these into concepts, not just parts of speech. ]
[ NOTE 4: I'll have to read the context to determine whether the usages are adjectival, nominal or verbal. ]
Tentative Conclusions
My gloss on the above analysis is that I might expect the themes in this text to be exceptionally-relationally focused. Almost every one of the top nouns are relational; virtually all the top verbs are too; and the adjectives draw strong contrasts between good and evil, so I might expect the text to thematise distinctly polar relationships, with some behavioural uses too.
Interlude
In the previous section, I could have described my process as simply cleansing the document and then running Athelstan's excellent MonoConc Pro!
But then not everyone reading this would be able to roll their own at home. Nor would I have a chance to point out the distinct benefits of digitising what can be quite laborious activity. Of course, that is my purpose here: To build a case for digitising certain practices that today occur manually. And just as important: notice that if I had used MonoConc Pro, I wouldn't have the power to freely stemmatize nor colour-code the results. It is just as vital to notice that flexible digital tools are just as important as having digital tools at all, for if digital tools have insufficient flexibility, the knowledge worker is forced to route around the damage ...
First John, English Standard Version, as a cleansed corpus file in Athelstan's MonoConc Pro, running in Windows XP Pro under Parallels v4 on Mac OS X 10.5.7.
Assembling the text in Tinderbox
Turning back to my saved copy of First John in Microsoft Word, I manually highlighted individual chapters, and ran an interesting but simple search and replace on each successive chapter.
| Search | Replace |
| (<[0-9][0-9]>) | ^p1 John 1:\1^t |
| (<[0-9]>) | ^p1 John 1:\1^t |
The first statement says to find all two-digit numbers that comprise an entire word, and replace them with a paragraph mark, following by the text "first-John-chapter-one" then the verse number found by the search, then insert a tab. The second statement is just like the first, only it finds single-digit numbers.
The paragraph marks and tabs are very important for Tinderbox's processing.
First John, English Standard Version, fully versified, ready for import into Tinderbox.
I selected this entire text, and then pasted it into a Tinderbox note.
First John, English Standard Version, fully versified, in a Tinderbox note.
Now the work gets easy, because I simply use Tinderbox's Explode function, to generate a note for each verse in the entire text.
From a textual analysis point of view, having the text split into verses doesn't make a lot of sense. It would be better to break the verses into clause complexes (i.e. sentences) and clauses. But from a biblical studies perspective, verses are a usual method of demarcating textual components.
Tinderbox Outline, displaying one note for each verse in First John.
This image obviously reveals some post-processing that I've performed on the verses in the range 1 John 1:5 through 1:10. I'll share that with you in the next installment.
For now though, I'd just like to draw your attention to a few concordance-like word searches in Tinderbox.
Results stemming from Tinderbox agents.
Here is the abide agent.
Tinderbox agent, demonstrating searching for abide, abides, abiding.You may care to notice two small details.
Firstly, I've selected each note in the Tinderbox file containing a verse, and made it inherit from a prototype I've called Biblical Verse. This prototype contains absolutely no behaviour. I put it in place purely as a type marker. Of course, sometime in the future, if I want biblical verses to exhibit some look or behaviour, I have a ready facility for achieving that.
Secondly, that the order of the aliases is set to SiblingOrder, which therefore conforms it to the order of the biblical text. This is particularly important to sort verse 10 after verse 9, rather than following verse 1.
Next Article: Inductive Analysis
Reader Comments (3)
Wonderful post! This is very useful to many readers like me. Being a student of computing and
Canada Viagra, I am requiring myself to read articles more often and your writing just caught my interest. Thank you so much!
Good to know all this kind of steps I learn more every time I read this post, at the same time I learn how to find cialis online thanks for sharing !
Thanks for your sharing,gucci handbags on sale this article is very good, I like it very much, as you learn a lot!