Recent Articles
« Tacking Noguchi Shelving onto Tinderbox | Main | Coding with Footnotes and Links »
Sunday
Jul262009

Coding with Footnotes and Links—The Mechanics

Introduction


The GettysburgCFL design target enables the coding of texts in a range of humanities disciplines, particularly targeting sociological and linguistic textual analysis. While the GettysburgCFL is not completely generalised, it forms a well-conceived prototype that can be adapted to fit a range of textual coding needs. It demonstrates the Tinderbox community's current known best-practice in the coding of texts within Tinderbox.

You can download the GettysburgCFL Tinderbox document.
You can read the user documentation.

This article describes the design of the mechanisms so that you can adapt it to your own textual coding needs. It is part of a larger series exploring the range of practices involved in textual analysis, the extent to which the current Tinderbox affordances support the required practices, and what further affordances may benefit textual analysts involved in analyzing legal and literary texts.

Footnote tool


GettysburgCFL anticipates analysts importing the text into Tinderbox as a single note. That single note is decomposed into constituent parts (paragraphs, sentences, clauses, phrases, words, morphemes) using a combination of Tinderbox Explode tool (for paragraphs), and the Tinderbox Footnote tool.

See the user documentation for details.

Codes


The codes with which you wish to assign to units of text should be created as notes inside the Codes section. Each code consists of:

  1. Name attribute: Set to the human-readable name of the code.
  2. CodeName attribute: Set to the machine-readable name of the code. In my case, I'm outputting HTML-like code fragments, so the machine-readable names conform to the subset of characters that are valid in HTML codes.
  3. Prototype: For the subset of your codes that are mutually exclusive, you may optionally choose to make the note realizing your code a prototype. If you do choose to use a prototype, assign a Color or BorderColor that visually represents the code in your mind.

Codes can form a hierarchy. You can see the way I have formed a hierarchy within the Ideational set of codes.

You may only use prototype inheritance when your codes are mutually exclusive. If a subset of codes are mutually exclusive, you can use prototype inheritance on that subset; don't apply inheritance to the remainder. If your mutually exclusive subset forms the dominant coding priority, then it definitely makes sense to use prototype inheritance.

Prototype inheritance is achieved by using both links and rules, as described below.

Links


GettysburgCFL uses several typed links. The link-types are leveraged by rules to automate assignment of prototypes and code fragments to the notes. Because of this, the name of the typed link and must match naming within the rules.

GettysburgCFL uses the following typed links:

  • •Ideational—the dominant coding priority in GettysburgCFL.
  • •CodeLink—all codes that are not guaranteed to be mutually exclusive with the dominant coding priority are assigned using the type CodeLink.
  • •StartCodes—the beginning of a code assigned to a note where the closing code for the syntagm does not exist within the note. This allows for decomposition of the document to suit the dominant coding priority, while allowing other codes to be assigned across multiple notes.
  • •EndCodes—used to link the note that ends the coded syntagm that was begun using the •StartCodes link type assigned to a different note.
  • •ContainedCodes—used to indicate that the syntagm held by the note participates in a syntagm that spans notes, but this particular note neither begins nor ends the syntagm.


You may wish to generalize the •Ideational name, which is specific to a coding system derived from Systemic Functional Linguistics. If you do, it should probably be called something like: •PrototypedLink. If you do change the name of this link type, you also have to change the rules written in the note named TextPart. (The rules are in the note, not in the note's Rules attribute.)

Rules


All the rules that activate the links are stored in a single note: TextPart. Here's an explanation of each rule.

$Prototype=links.outbound.•Ideational.$Name;

This rule takes the name of the Code, and assigns it as the Prototype of the note being coded. It assumes that the codes to which •Ideational links are assigned are Prototypes.

$Codes=$Codes+links.outbound.•Ideational.$CodeName;
$Codes=$Codes+links.outbound.•CodeLink.$CodeName;

These rules find the machine-readable code fragment held by the Code and assigns it to the set attribute $Codes held in the coded note. The first statement does it for the dominant coding priority; the second statement for all other codes.

$StartCodes=$StartCodes+links.outbound.•StartCodes.$CodeName;
$ContainedCodes=$ContainedCodes+links.outbound.•ContainedCodes.$CodeName;
$EndCodes=$EndCodes+links.outbound.•EndCodes.$CodeName;

These rules take the codes assigned through each of the typed links, and assign it to the set attribute held by the note.

$AllCodes=$StartCodes+$Codes+$ContainedCodes+$EndCodes;

This is a convenience function for the analyst. It combines all the different types of codes into a single set attribute.

Rule Assignment


It is essential that the rules encoded in TextPart are assigned to all the notes containing source text fragments. GettysburgCFL achieves this through the AssignTextPartRule agent. The agent's query is:

descendedFrom(Gettysburg Address)

Unless you're analyzing the Gettysberg Address, you will have to change the name of the note, and therefore the name of the text pattern in this query. If you have multiple documents, you will need to change the scope of the query to ensure that all notes containing text fragments are retrieved by the query.

The agent's action is:

$Rule=$Text(TextPart)

which simply means to take the text from the TextPart note and assign it to the Rule attribute for all the notes retrieved by the query.

Why didn't I just use prototype inheritance?
When I perform textual analysis using Tinderbox, I typically assign the decomposed text notes a Prototype named something like, "SourceText." I use it to hold attributes that I want all my source notes to share.

GettysbergCFL does not conform to that pattern, because I am concerned that the assignment and reassignment of prototypes may cause inconsistent inheritance across all the syntagm fragments. Should that occur, the mechanism is likely to break down.

Notionally, this shouldn't occur if all code-prototypes inherited from the SourceText. (Much like all Smalltalk classes inherit from Object.) But I hadn't conceived of that design in time to apply it to this document; and, one might still run into problems, especially if you get adventurous trying to automate additional facets of the analytical mechanism. Also, I have no particular default Prototype set in this document, which would be essential were you wanting to use prototype inheritance.

Cleanup


When you code a note, then change your mind about the applicability of the coding, you potentially leave the old code recorded within the note's attributes, even though you've removed the links. To clean these up:

  1. Enable the Cleanup_RunOnceThenSwitchOff agent.
  2. Select File > Update now.
  3. Switch off the agent.

The agent completely clears out all code fragment fields in all source text notes. This allows the code fragment fields to be rebuilt from the current links only.

Nakakoji template


This is the Nakakoji template I'm using to export the text.

^value(format($StartCodes, "", "<", ">", ""))^^value(format($Codes, "", "<", ">", ""))^^if(^children^)^^children(/TEMPLATES2/•LGOutput)^^else^ ^title^^endIf^^value(format($Codes, "", "</", ">", ""))^^value(format($EndCodes, "", "</", ">", ""))^

It creates text output along these lines:

<Clause_Independent> <Circumstance_Temporal><Theme> Fourscore and seven years ago</Circumstance_Temporal></Theme><Rheme><Nominal_Actor> our fathers</Nominal_Actor><Process_Relational_Existence> brought forth</Process_Relational_Existence><Circumstance_Locative> on this continent</Circumstance_Locative><Nominal_Goal> a new nation</Nominal_Goal></Rheme> </Clause_Independent>

The core of the Nakakoji template is this decision:

^if(^children^)^^children(/TEMPLATES2/•LGOutput)^^else^ ^title^^endIf^

It asks: "Are you a leaf node, or not? If you have children, I'll go and see what they want me to do. But if you're a leaf node, I'll output your text."

The wrapping code before the core decision is:

^value(format($StartCodes, "", "<", ">", ""))^^value(format($Codes, "", "<", ">", ""))^

It says: "Let me format all your StartCodes and Codes fragments as HTML-like tags." The matching segment at the back does the same thing, excepting that it issues the closing HTML-like tags.

Template Summary: The template says, "I will descend the Tinderbox outline containing the text. All the codes assigned at any level of rank are output into the Nakakoji, but only leaf nodes emit text."

For this reason, a text can be coded at varying levels of depth, but any one branch must be coded at the same level of depth to ensure no text is skipped.

 

Next article: Visualising Textual Analyses

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>