Introduction
The GettysburgCFL design target enables the coding of texts in a range of humanities disciplines, particularly targeting sociological and linguistic textual analysis. While the GettysburgCFL is not completely generalised, it forms a well-conceived prototype that can be adapted to fit a range of textual coding needs. It demonstrates the Tinderbox community's current known best-practice in the coding of texts within Tinderbox.
You can download the GettysburgCFL Tinderbox document.
You can read the user documentation.
This article describes the design of the mechanisms so that you can adapt it to your own textual coding needs. It is part of a larger series exploring the range of practices involved in textual analysis, the extent to which the current Tinderbox affordances support the required practices, and what further affordances may benefit textual analysts involved in analyzing legal and literary texts.
Footnote tool
GettysburgCFL anticipates analysts importing the text into Tinderbox as a single note. That single note is decomposed into constituent parts (paragraphs, sentences, clauses, phrases, words, morphemes) using a combination of Tinderbox Explode tool (for paragraphs), and the Tinderbox Footnote tool.
See the user documentation for details.
Codes
The codes with which you wish to assign to units of text should be created as notes inside the Codes section. Each code consists of:
- Name attribute: Set to the human-readable name of the code.
- CodeName attribute: Set to the machine-readable name of the code. In my case, I'm outputting HTML-like code fragments, so the machine-readable names conform to the subset of characters that are valid in HTML codes.
- Prototype: For the subset of your codes that are mutually exclusive, you may optionally choose to make the note realizing your code a prototype. If you do choose to use a prototype, assign a Color or BorderColor that visually represents the code in your mind.
Codes can form a hierarchy. You can see the way I have formed a hierarchy within the Ideational set of codes.
You may only use prototype inheritance when your codes are mutually exclusive. If a subset of codes are mutually exclusive, you can use prototype inheritance on that subset; don't apply inheritance to the remainder. If your mutually exclusive subset forms the dominant coding priority, then it definitely makes sense to use prototype inheritance.
Prototype inheritance is achieved by using both links and rules, as described below.
Links
GettysburgCFL uses several typed links. The link-types are leveraged by rules to automate assignment of prototypes and code fragments to the notes. Because of this, the name of the typed link and must match naming within the rules.
GettysburgCFL uses the following typed links:
- •Ideational—the dominant coding priority in GettysburgCFL.
- •CodeLink—all codes that are not guaranteed to be mutually exclusive with the dominant coding priority are assigned using the type CodeLink.
- •StartCodes—the beginning of a code assigned to a note where the closing code for the syntagm does not exist within the note. This allows for decomposition of the document to suit the dominant coding priority, while allowing other codes to be assigned across multiple notes.
- •EndCodes—used to link the note that ends the coded syntagm that was begun using the •StartCodes link type assigned to a different note.
- •ContainedCodes—used to indicate that the syntagm held by the note participates in a syntagm that spans notes, but this particular note neither begins nor ends the syntagm.
You may wish to generalize the •Ideational name, which is specific to a coding system derived from Systemic Functional Linguistics. If you do, it should probably be called something like: •PrototypedLink. If you do change the name of this link type, you also have to change the rules written in the note named TextPart. (The rules are in the note, not in the note's Rules attribute.)
Rules
All the rules that activate the links are stored in a single note: TextPart. Here's an explanation of each rule.
$Prototype=links.outbound.•Ideational.$Name;
This rule takes the name of the Code, and assigns it as the Prototype of the note being coded. It assumes that the codes to which •Ideational links are assigned are Prototypes.
$Codes=$Codes+links.outbound.•Ideational.$CodeName;
$Codes=$Codes+links.outbound.•CodeLink.$CodeName;
These rules find the machine-readable code fragment held by the Code and assigns it to the set attribute $Codes held in the coded note. The first statement does it for the dominant coding priority; the second statement for all other codes.
$StartCodes=$StartCodes+links.outbound.•StartCodes.$CodeName;
$ContainedCodes=$ContainedCodes+links.outbound.•ContainedCodes.$CodeName;
$EndCodes=$EndCodes+links.outbound.•EndCodes.$CodeName;
These rules take the codes assigned through each of the typed links, and assign it to the set attribute held by the note.
$AllCodes=$StartCodes+$Codes+$ContainedCodes+$EndCodes;
This is a convenience function for the analyst. It combines all the different types of codes into a single set attribute.
Rule Assignment
It is essential that the rules encoded in TextPart are assigned to all the notes containing source text fragments. GettysburgCFL achieves this through the AssignTextPartRule agent. The agent's query is:
descendedFrom(Gettysburg Address)
Unless you're analyzing the Gettysberg Address, you will have to change the name of the note, and therefore the name of the text pattern in this query. If you have multiple documents, you will need to change the scope of the query to ensure that all notes containing text fragments are retrieved by the query.
The agent's action is:
$Rule=$Text(TextPart)
which simply means to take the text from the TextPart note and assign it to the Rule attribute for all the notes retrieved by the query.
Why didn't I just use prototype inheritance?
When I perform textual analysis using Tinderbox, I typically assign the decomposed text notes a Prototype named something like, "SourceText." I use it to hold attributes that I want all my source notes to share.
GettysbergCFL does not conform to that pattern, because I am concerned that the assignment and reassignment of prototypes may cause inconsistent inheritance across all the syntagm fragments. Should that occur, the mechanism is likely to break down.
Notionally, this shouldn't occur if all code-prototypes inherited from the SourceText. (Much like all Smalltalk classes inherit from Object.) But I hadn't conceived of that design in time to apply it to this document; and, one might still run into problems, especially if you get adventurous trying to automate additional facets of the analytical mechanism. Also, I have no particular default Prototype set in this document, which would be essential were you wanting to use prototype inheritance.
Cleanup
When you code a note, then change your mind about the applicability of the coding, you potentially leave the old code recorded within the note's attributes, even though you've removed the links. To clean these up:
- Enable the Cleanup_RunOnceThenSwitchOff agent.
- Select File > Update now.
- Switch off the agent.
The agent completely clears out all code fragment fields in all source text notes. This allows the code fragment fields to be rebuilt from the current links only.
Nakakoji template
This is the Nakakoji template I'm using to export the text.
^value(format($StartCodes, "", "<", ">", ""))^^value(format($Codes, "", "<", ">", ""))^^if(^children^)^^children(/TEMPLATES2/•LGOutput)^^else^ ^title^^endIf^^value(format($Codes, "", "</", ">", ""))^^value(format($EndCodes, "", "</", ">", ""))^
It creates text output along these lines:
<Clause_Independent> <Circumstance_Temporal><Theme> Fourscore and seven years ago</Circumstance_Temporal></Theme><Rheme><Nominal_Actor> our fathers</Nominal_Actor><Process_Relational_Existence> brought forth</Process_Relational_Existence><Circumstance_Locative> on this continent</Circumstance_Locative><Nominal_Goal> a new nation</Nominal_Goal></Rheme> </Clause_Independent>
The core of the Nakakoji template is this decision:
^if(^children^)^^children(/TEMPLATES2/•LGOutput)^^else^ ^title^^endIf^
It asks: "Are you a leaf node, or not? If you have children, I'll go and see what they want me to do. But if you're a leaf node, I'll output your text."
The wrapping code before the core decision is:
^value(format($StartCodes, "", "<", ">", ""))^^value(format($Codes, "", "<", ">", ""))^
It says: "Let me format all your StartCodes and Codes fragments as HTML-like tags." The matching segment at the back does the same thing, excepting that it issues the closing HTML-like tags.
Template Summary: The template says, "I will descend the Tinderbox outline containing the text. All the codes assigned at any level of rank are output into the Nakakoji, but only leaf nodes emit text."
For this reason, a text can be coded at varying levels of depth, but any one branch must be coded at the same level of depth to ensure no text is skipped.
Next article: Visualising Textual Analyses
Fact-checking Nick Carr's ass
Nick Carr really needs a fact checker. He recently opined:
He challenges his readers to:
So I did. And I found Nick to be wrong. Quite wrong.
Problematic aspects of Carr’s piece
It’s a delicious irony that the man who asserts that the internet is shallowing our brains demonstrates his thesis with a slap dash thought piece rife with holes. Here are just some of the holes in Carr’s piece:
So, I’ve taken up Nick’s challenge. I compared my “information expenditure” in 2009 with my father’s “information expenditure” in 1985. When I did, I found Carr’s assertion mistaken. Information expenditure is significantly less than it was 24 years ago.
Evidence
Subjects: Loryn Jenkins (2009), Elwyn Jenkins (1985)
I have interpreted Carr’s assertion opening assertion as meaning: “As a proportion of gross annual income, never before in history have people paid as much for information as they do today.”
I chose to compare my family’s 2009 “information expenditures” with my parental family in 1985. While any single point-in-time comparison is merely anecdotal, this is as valid a single-point comparison can be due to similarity in our respective stages in life, responsibilities in life, level of education and mental outlook.
Scope
Carr’s claim is about the cost of information, but his summary challenges the reader only to calculate subscription costs. Despite including information access costs, he doesn’t include the costs of the computer hardware or software with which that information is also accessed. Nor does he include information products directly purchased.
While I would argue that information per se excludes information access, communication and entertainment, I have opted to follow Carr’s challenge and include them all:
Information gathering
My information comes directly from my family’s Quicken record of expenditures from 2009. My father’s information comes from his recollection of expenditures and income in 1985, gathered during a telephone call this evening.
The data
In 1985, Elwyn Jenkins’ family spent around 32% of gross annual income on “information;” whereas in 2009, my family spent around 10% of gross annual income on “information.” Here’s what the break-down looks like.
From Nick Carr’s original list of categories, I don’t subscribe to Cable TV, Satellite radio, Netflix, wifi hotspot access or TiVO. Additionally to his list, I included the purchase of books, movies (theatre), DVDs, computer hardware and peripherals, computer software, hand-held game consoles, fax, domain names, website and children’s books. Here’s the break-down as a percentage of my gross annual income.
Information Expenditures by Loryn Jenkins’ Family, 2009, as a percentage of gross income
The information categories in my father’s 1985 expenditure have been selected to be analogous with the classes of expenditure in my 2009 data.
Information Expenditures by Elwyn Jenkins’ Family, 1985, as a percentage of gross income
Commentary - Elwyn Jenkins’ expenditure
I now know why my mother and father knelt beside their bed every week, with hand-written family budget and cash projections, identifying which bills to pay and which expenditures to make. After taking taxation into account, the family in which I grew up spent an astounding 45% of disposable income on “information.”
Long distance calling in Australia in the mid-1980s was very expensive. My family had a rather outsized budget for that because my father organised weekly conference calls, frequently including long-distance guest speakers. Moreover, bulletin board services being accessed induced timed long-distance calls.
In 1985, my father’s computing purchases included a “high-resolution” dot matrix printer, a Commodore 128D which had a 64-character wide color screen. In 1985, Dad purchased the Commodore 128D for himself, handing down to me and my brother the Commodore 64. (Hey, we had fun typing in machine code from magazine print-outs to get our own games.)
I recall the evening an Encyclopaedia Britannica sales rep called on my parents. He explained the educational value of the Encyclopaedia Brittanica; and my father choosing the three-year subscription plan to pay for the enormous cost of those books, so that his children could access the information to support our education. At the time, an encyclopedia set was an essential investment for any educated family.
My father reports purchasing an absurdly high number of books: two to three per week. I suspect this data may be inflated somewhat. Nevertheless, even if he only purchased half the reported volume, it doesn’t materially affect this analysis.
You may have cause to question the inclusion of fuel in this list. Living in Albany, Western Australia, my father was 4.5 hours drive from the nearest university. In pursuing his B.Ed. (Honours) course, he was forced to drive to university weekly. In following Nick Carr’s inclusion of information access costs, it’s only fair to include my father’s information access costs too.
Commentary - Loryn Jenkins’ expenditure
My list of “information expenditure” includes many more categories than does my father’s, and a few categories that are absent.
The most notable absent category is the encyclopedia—I will simply never purchase an encyclopedia for my family. Not because my family is less well educated; on the contrary, my children have far greater access to information than I ever did as a child.
Also absent is postage. While my mother sent a letter to her mother every week—she couldn’t often afford to place an international call to her mother in New Zealand—she compensated by sending letters. If I want to communicate now, I can quite freely afford to call anyone, anywhere in the world via phone or Skype or email; as I have done in communicating with my brother, who lived in Atlanta, GA from 1998 through 2009.
Categories that did not exist on my father’s list include cell phone services, online computer backup, music, hand-held game consoles, console games, fax, domain name purchases and website operating costs.
Analysis
The 1985 expenditures and the 2009 expenditures have been reported as a percentage of gross family income. You might wonder whether the comparison is distorted by a CTO (of a small software development company) pulling in greater income than a high-school teacher in the mid-1980s.
To answer that question, I quickly Alt-tabbed to Safari, and entered a google search, “cpi index australia 1985”. The search page returned me a link to a website that allowed me to specify my starting and ending years, and then produced the CPI inflation index (quarterly and annually) plotted between 1985 and 2009. (Something my father would never have attempted to do on a rainy Saturday evening in 1985.)
The answer is that in inflation-adjusted terms, my father’s reported 1985 income is 35% greater than my 2009 income. In comparing costs in inflation-adjusted terms, my father’s 1985 expenditures are 421% of my 2009 expenditures. Turning that around, I spend only 24% of what my father did on “information expenditure” at the same stage in life.
So whether you examine it as a percentage of gross annual income, or in inflation-adjusted terms, my father out-spent my purchase of information between 3 and 4 times.
While my father lived in a small country town in the most isolated state in Australia, and needed to drive to the nearest city for his education, I live in a far-flung city on the global stage. I mean this in physical, psychological and communication terms. Psychologically and in communication terms, I’m closer to Mark Bernstein (Boston), Mark Anderson (London), Dave Winer (Berkely, California), Joel Spolsky (New York), Paul Graham (Boston) and Slava Pestov than I do my brother-in-law who lives just two hours drive from my house and works in the mining industry. In terms of my capacity to gain knowledge, I can acquire university-level information from MIT OpenCourseWare, iTunesU, Google Scholar and Google Books. I can reach out and communicate with researchers and programmers and authors and other knowledge creators regardless of where they are located.
Conclusion
Given that my father, at the same stage in life, outspent me between 3 and 4 times in his purchase of information, information access, information-based entertainment and communications, Nick Carr’s charge that we spend more on information than ever before is proven false.
Furthermore, the quality of information, information access, information-based entertainment and communications is far greater in 2009 than in 1985. My father in 1985 had limited capacity to communicate with peers internationally. He had no capacity to publish information to thousands of people in 70 different countries. He had no capacity in 1985 to pull CPI indexes at will on a Saturday evening. He had no accounting software with which to summarise his expenditure. He had no spreadsheet software with which to calculate the numbers I’m marshalling here. He had no capacity to stroll down memory lane by looking through a satellite view of his home-town, like I’ve just done (my bike route from home to school). And he certainly had no capacity to communicate this all directly to you, gentle reader.
So I agree with Carr that the quantity of information available, being consumed, and being produced, is much greater now. But I thoroughly disagree that with the assertion that “Never before in history have people paid as much for information as they do today.” Wrong. Quite wrong.
My father paid 3 to 4 times more, for more limited access to information in 1985 than I do in 2009; and no capacity to unilaterally publish information like I’m doing right now. Information abounds.