This URL:
http://trips.ihmc.us/parser/cgi/parse?input=this+is+an+example Will cause the TRIPS Parser to parse the string "this is an example" with its default options. It will send back an XML document with lots of different information (see "Output Format" below). Which parts of the output you use may depend on your application.
There are different instances of the TRIPS parser with different default settings. See the index for a full list. The one above, parse, is a generic one. Most have the same browser interface as parse, but step, drum, and cwmsreader are slightly different in that they are meant to take a paragraph as input instead of a single sentence, so they have a multiline <textarea> instead of a single-line <input type="text">. drum also omits most of the TextTagger options for simplicity.
For example, this URL:
http://trips.ihmc.us/parser/cgi/drum?input=The+oncogenes+are+KRAS%2C+PIK3CA%2C+and+BRAF. will parse the string "The oncogenes are KRAS, PIK3CA, and BRAF." using settings from the DRUM system.
For longer inputs, you should use the POST request method instead of GET, so that the URL doesn't get too long.
The URL for the web API is the same as for the website: http://trips.ihmc.us/parser/cgi/parse (change the word after the last slash for parsers from other TRIPS systems, e.g. cabot). You can use either the GET or the POST HTTP request methods, either will work. The form displayed on the single-sentence parsing websites use GET, but the paragraph parsing websites use POST to allow longer input texts. The drum and cwmsreader versions of the site also provide extractions (terms, events, etc.), in addition to the other versions' output of words, tags, parse trees, and logical forms.
All parameters are optional. If you omit input, you'll get a response with no <utt>s in it.
input parse, or a paragraph for drum. Putting a whole paper here is discouraged; use run-pmcid instead. tag-type <tags> element, and also be passed to the main TRIPS Parser module for further processing. The format of this option is presently beyond the scope of this document. parse, cabot, and drum have different default values for this parameter, so you can switch between them to switch to a different tag-type. Or you can use the tagger checkboxes in the browser interface to construct a tag-type to use. input-tags :start/:end arguments) to output (from the input tagger), regardless of any matching substrings. See the TextTagger README for more information. For example, if the input text is "The dishwasher broke the dishwasher." and you want the first instance of "dishwasher" to have the type ONT::person, and the second ONT::appliance, you can enable the input tagger in the tag-type parameter and put this in the input-tags parameter: ( (sense :lex "dishwasher" :start 4 :end 14 :lftype (ONT::person)) (sense :lex "dishwasher" :start 25 :end 35 :lftype (ONT::appliance)) )
input-terms :start/:end arguments) to output (from the terms_input tagger), for matching substrings of the input. See the TextTagger README for more information. For example, if you want "foo" to be tagged as a person's name, you can enable the terms_input tagger in the tag-type parameter and put this in the input-terms parameter: ( (sense :lex "foo" :lftype (ONT::person) :penn-pos (NNP)) (pos :lex "foo" :penn-pos (NNP)) )
no-sense-words senses-only-for-penn-poss split-mode (paragraph parsing only) split-sentences only splits on sentence breaks, while split-clauses (the default) additionally splits on commas and certain other punctuation if the resulting clauses are long enough. Note that the Parser may further split the utterances it gets from TextTagger into smaller fragments, regardless of this setting. semantic-skeleton-scoring (step only) step web parser in a web browser, but if you're using this API programmatically you need to give this parameter explicitly if you want to use this feature (this is due to how checkboxes work in web forms). number-parses-desired <alt-hyps>. The maximum allowed value is 10. component parser, but you can also use texttagger, which will only run the TextTagger component and not the Parser or the extractor, and will return a <texttagger-output> XML document instead of <trips-parser-output> (see TextTagger Output Format below). extsformat, tagsformat, treecontents, treeformat, lfformat output-parts debug section, which has only one possible format). The sections are debug, exts, words, tags, lf, and tree. The formats are lisp, xml, table, rdf, dot, svg, amr and lingo. The allowed combinations of section and format are summarized in the table below: lisp | xml | table | rdf | dot | svg | amr | lingo | |
|---|---|---|---|---|---|---|---|---|
debug | N/A | |||||||
exts | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
words | ✓ | ✓ | ||||||
tags | ✓ | ✓ | ✓ | |||||
lf | ✓ | ✓ | ✓ | ✓ | ✓ | |||
tree | ✓ | ✓ | ✓ | ✓ | ✓ | |||
output-parts=debug exts-xml words-lisp words-xml tags-lisp tags-xml lf-lisp lf-rdf tree-lisp tree-xml. Note that the overall structure of the output will be the same, with the selected parts either replacing or becoming siblings of the default parts in the same section. Also note that some systems may produce two versions of the extractions (exts): the raw and inferred extractions. These are not marked in the raw output, but they will always appear in that order. The web interface sends its response as an XML document, with an XSL stylesheet that instructs your web browser how to display the information it contains. If you're not using a web browser, you don't have to apply the stylesheet, so you can use the XML directly. This section describes the format of that XML response document. An outline of a typical response would be:
<?xml?> <?xml-stylesheet?> <!DOCTYPE> <trips-parser-output> <debug> ... </debug> <ekb> <input>...</input> <TERM>...</TERM> <EVENT>...</EVENT> ... </ekb> <utt> <words> <lisp>(...)</lisp> <word>...</word>... </words> <tags> <lisp>(...)</lisp> <word>...</word> <prefix>...</prefix> ... <prefer>...</prefer> ... </tags> <tree> <lisp>(...)</lisp> <UTT><S>...</S></UTT> </tree> <terms> <lisp>(...)</lisp> <rdf:RDF> <rdf:Description>...</rdf:Description> ... </rdf:RDF> </terms> <alt-hyps> <utt>...</utt> ... </alt-hyps> </utt> ... </trips-parser-output>
Some of the XML elements used above are described below. The DTD has a more complete list of the elements and attributes that can be used.
<trips-parser-output>The root element of the XML document is <trips-parser-output>. Its attributes are the query parameters used to generate the response, along with parser-build-date, containing the date and time that the system that generated the document was built. Its children are <debug> (containing information that might be useful for debugging the TRIPS Parser), <ekb> (containing extractions, only in DRUM version), and either a single <utt> or a <compound-communication-act> containing multiple <utt>s. The output of the paragraph parsers (drum and step) may contain multiple top-level <utt>s/<compound-communication-act>s. Each <utt> represents a fully-parsed sentence ("utterance") or fragment, and its contents are described below. The <ekb> applies to the input as a whole, so it's outside any of the <utt>s. Its contents are described later in this document.
In the event that a sentence or fragment completely fails to parse (e.g. because of an error), there will be an empty <failed-to-parse /> element instead of an <utt>.
<utt>Each <utt> has four children: <words>, <tags>, <tree>, and <terms>, and an optional fifth, <alt-hyps>. These are each described below. One feature the first four share is that they contain both a Lisp form and an XML form of the same information. The Lisp form is in the <lisp> child, and is the original Lisp S-expression output by TRIPS. The XML form is derived from it for ease of use (especially by the XSL stylesheet presenting the page in the web browser). This document mainly describes the XML form.
<words>The <words> element contains the list of words (and other tokens), as processed by the main TRIPS Parser module. This means they use all-caps, split certain endings off, and replace certain punctuation characters. For example, "." becomes <word>PUNC-PERIOD</word>, and "don't" becomes <word>DO-</word><word>N^T</word>. Note that the <tags> element also contains <word> elements, but these are different and should be processed differently.
<tags>The <tags> element contains the messages sent from the TextTagger preprocessing module to the main Parser module. There are three main types of message, <word>, <prefix> and <prefer>, but they share some arguments:
lex or text <word> uses lex (for "lexical item"), and <prefer> uses text, but they're really the same thing. start and end <utt>). Note that end is generally moved to the next start of a tag, so that there are no gaps. It's important to note that TextTagger's output is often ambiguous, because the parser hasn't decided yet which of several options to take.
<word>This message causes the Parser to build constituents based on the information in the message and add them directly to the chart. If there isn't enough information in the message itself to build constituents for a particular sense, it causes the Parser to look up the described word(s) in the lexicon (which may include using WordFinder to look it/them up in WordNet).
At most one <word> message will be output for each start/end span. Different senses of the same word are represented as <sense-info> children. Each child has attributes and (possibly children) describing a set of senses. Each attribute is a comma-separated list of options, and selecting one option from a POS attribute and one from a sense attribute gives you a single sense. The following may be specified in a <sense-info>, but are not required:
penn-parts-of-speech VB, VBD, VBG, VBN, VBP, VBZ. trips-parts-of-speech (deprecated) penn-parts-of-speech. wn-sense-keys :head_word:head_id, are empty, so TextTagger leaves them out (including the colons). ont-types ONT::referential-sem is removed when there are other (more specific) sense options. alternate-spellings lex attribute will retain the original spelling. score The children of a <sense-info> element are for domain-specific information related to the sense. These don't affect parsing but are carried through to the Parser's output (all of them will be carried through for a given <sense-info>, not just one). The different kinds of domain-specific-info elements are described in the table below.
In order to build constituents directly, without looking up the words, both part of speech (penn-parts-of-speech or trips-parts-of-speech) and sense (wn-sense-keys or ont-types) information must be present.
There can be multiple <sense-info> children of the same <word> because we might want to allow only certain combinations of POS, sense, and domain-specific info. For example, if we want to express the fact that the word "crashes" can either be a plural noun with the sense key "crash%1:11:03" or a verb in 3rd person singular present tense with the sense key "crash%2:30:10", but we want to exclude the possibility of mismatching the sense and the POS, we could use two different <sense-info>s:
<word lex="crashes" start="0" end="7"> <sense-info penn-parts-of-speech="NNS" wn-sense-keys="crash%1:11:03" /> <sense-info penn-parts-of-speech="VBZ" wn-sense-keys="crash%2:30:10" /> </word>
| element | argument (attribute or child element) | description |
|---|---|---|
<term> | Information looked up from ontologies specific to the Deep Reading for Understanding Mechanisms (DRUM) system. Many of them are OBO ontologies. | |
id | Identifier for a concept from one of the ontologies. | |
name | More or less human-readable name for the concept. | |
score | The maximum of the scores from the matches (see below). | |
<match score="..." input="..." matched="..." [via="..."] status="..." ... /> | Describes the way the input string matched the string from the ontology (which may differ in case and hyphenation). There may be more than one of these. The score is a number between 0 and 1, 1 being the best (but not a probability). input is the part of the input string, and matched is the string from the ontology that matched. status is the status of that string in relation to the id, as encoded in that ontology. Other attributes are mostly word counts for ways corresponding words in the input and matched strings match. These are used in computing the score. For terms where spelling correction or depluralization was used, via is the spelling of the term after such transformations, which only differs from matched in case and hyphenation. | |
species | Species this protein is from, "Homo sapiens (Human)" or "Mus musculus (Mouse)" (more may be added in the future). | |
dbxrefs | Comma-separated list of IDs of related concepts in other ontologies ("database cross-references"). | |
<map to="..." through="..." /> | Ontology mapping our system used to arrive at one of the ont-types, to, through another concept in the same ontology, through, which in some sense subsumes the tagged concept in that ontology. | |
ont-types | Comma-separated list of TRIPS ontology types. This might not match the same attribute in the containing <sense-info>, because we combine any two of ONT::gene, ONT::protein, and ONT::protein-family from all the <term>s into a single ONT::gene-protein sense in the <sense-info>. | |
<specialist> | Information looked up from the SPECIALIST Lexicon. | |
eui | Unique ID of the lexicon entry. | |
cat | Syntactic category. | |
citation-form | Uninflected form of the word, like you would find in the heading of a dictionary entry. | |
<complement> | Contains a complementation pattern from the lexicon entry. All of them are listed, not just the one that pertains to this use of the word. | |
<nominalization>, <nominalization-of>, <abbreviation>, <abbreviation-of>, <acronym>, <acronym-of> | These all contain nested <specialist> entries related to the main entry. | |
<mutation> | Information derived by parsing certain kinds of protein mutation specifications. | |
type | The type of mutation; one of substitution, deletion, or insertion. | |
old | Amino acid that was deleted. | |
new | Amino acid(s) that was/were inserted. | |
aa-index | Amino acid index where the mutation took place. | |
lower | Amino-acid-indexed site of the start of the mutation. | |
upper | Ditto for the end. | |
<aa-site> | name, letter, index | Protein site, identified by the amino acid and its index. |
<amino-acid> | name, letter | Normalized amino acid name and 1-letter code. |
<mirna> | Information derived by parsing certain miRNA (micro-RNA) names. | |
type | The type of miRNA name, as determined by the case of the "mir" part: mature, precursor-or-primary, or gene. | |
number | The part of the name after "mir-". | |
species | The name of the species, expanded from the abbreviation before "mir", if present. | |
<pitch> | Musical pitch. | |
letter | The letter (A-G) identifying the basic pitch, independent of any key. | |
scale-degree | The number (1-7) identifying the basic pitch relative to the key. | |
semitones-above-natural | Encodes accidentals on the letter-based pitch. Flats subtract 1 and sharps add 1. Natural is 0. If this argument is missing, it means you should use context to determine the value (it's not necessarily 0). | |
octave | The number (0-9) identifying the octave of the letter-based pitch. | |
<interval> | Musical interval between two pitches. | |
quality | The interval quality; one of diminished, minor, perfect, major, or augmented. | |
scale-degree-span | The number of scale degrees within the interval, e.g. 3̂ to 5̂ is a third interval and has scale-degree-span="3" (note: not 5 - 3 = 2). | |
<chord> | Musical chord. | |
quality | The chord quality; one of the interval qualities listed above, or half-diminished (which actually specifies the qualities of two of the intervals in a seventh chord). | |
inversion | The number of the inversion of the chord, e.g. ii6 is a ii chord in first inversion, so this argument would be 1. | |
<root> | Contains the <pitch> that the chord is built up from, before inversion, e.g. the root of a ii6 chord is 2̂. | |
<bass> | Contains the lowest <pitch> in the chord, after inversion, e.g. the bass of a "G/B bass" chord is B. | |
<intervals-above-bass> | Contains <interval>s between the bass pitch and the other pitches in the chord. Used for roman numeral/scale degree based chords. Omitted in the common case of a major triad. | |
<intervals-above-root> | Contains <interval>s between the root pitch and the other pitches in the chord. Used for letter-based chords. Omitted in the common case of a major triad. | |
<progression> | Musical chord progression. | |
<members> | Contains the <chord>s in the progression. | |
<pitch-sequence> | Sequence of musical pitches. It's unspecified whether this is melody or harmony. | |
<members> | Contains the <pitch>es in the sequence. | |
<place> | Generalized place in the world. | |
id | Identifier for a place, prefixed with the resource it's defined in. | |
<match ... /> | See the same element under <term> above. | |
<capital> | Capital of a country. | |
name | The name of the city. | |
country | The (ISO-3166-1 alpha-2) code for the country this city is the capital of. | |
<match ... /> | See the same element under <term> above. | |
<country> | Country. | |
name | The official name of the country. | |
code | The (ISO-3166-1 alpha-2) code for the country. | |
<match ... /> | See the same element under <term> above. | |
<demonym> | Word describing the people of a country (or group of countries). | |
name | The demonym. | |
countries | The (ISO-3166-1 alpha-2) codes for the countries this demonym applies to. | |
<match ... /> | See the same element under <term> above. | |
<region> | Region of the world containing some countries. | |
name | The name of the region. | |
countries | The (ISO-3166-1 alpha-2) codes for the countries contained in this region. | |
<match ... /> | See the same element under <term> above. | |
<subregion> | Smaller region of the world containing some countries. | |
name | The name of the subregion. | |
countries | The (ISO-3166-1 alpha-2) codes for the countries contained in this subregion. | |
<match ... /> | See the same element under <term> above. | |
<units> | Units of measure expression. | |
units | Normalized units expression involving * for multiplication and ^ for (possibly negated) exponents, with units spelled out and sorted alphabetically. E.g. "km/s^2" becomes "kilometer*second^-2". | |
dimensions | Similarly-normalized dimensions expression, e.g. "length*time^-2". Dimensions are also sorted alphabetically; they don't necessarily align with the units. Dimensionless units have dimension 1, while units of unknown dimension have dimension unknown. |
<prefix><prefix> messages are exactly like <word> messages except that they indicate the tagged string is only a prefix, connected to the following word. This information is useful because the end is always moved up to the next start of a tag, so it's not otherwise obvious whether there is whitespace between the two tags/"words". But note that <prefix> is only used when that beginning part of a word is semantically a prefix, like "hyper-" or "mono-". There are other situations where you can get multiple <word>s with no whitespace between them, for example CamelCase words, and words with endings like "n't".
<prefer>This message causes the Parser to prefer certain kinds of consituents, without causing it to actually build any based on the contents of the message. These are used for things we don't have sense information for, but don't want to look up in the lexicon either (i.e. phrases, not words or multi-word lexical items). These attributes may be specified, and at least one is required:
penn-cats trips-cats (deprecated) penn-cats. <tree>The <tree> element contains the parse tree chosen as the best option by the parser. XML elements here map directly to nonterminals/syntactic categories. The leaves are just the words, as in the <words> element described above.
<terms>The <terms> element contains the logical form (as a list of LF "terms"). The logical form is described separately in the LF Documentation (pdf), using the Lisp format; only the mapping to XML is described here.
The XML format uses RDF to represent the LF as a graph. Each <rdf:Description> corresponds to an LF term, and its rdf:ID is the term's "variable". Other parts of the LF term become RDF properties in either the role namespace (for arguments/slots like semantic roles), or the LF namespace (for everything else). The atomic ontology type becomes LF:type, and the word that is sometimes paired with it becomes LF:word. What the LF documentation calls the "term constructor" becomes LF:indicator (there is some disagreement in the TRIPS codebase over what to call this thing). Most terms output by the web parser also have :start and :end character offsets (roughly corresponding to the span of the input text containing the phrase that the term's word is the head of). These are both in the LF namespace. Everything else is in the role namespace, though some of them aren't exactly semantic roles. So, generally speaking, a Lisp-format term like this:
(foo bar (:* ONT::baz W::glarch) :fred barney :wilma betty :start i :end j)
turns into an RDF resource description like this:
<rdf:Description rdf:ID="bar"> <LF:indicator>foo</LF:indicator> <LF:type>baz</LF:type> <LF:word>glarch</LF:word> <role:fred rdf:resource="#barney" /> <role:wilma>betty</role:wilma> <LF:start>i</LF:start> <LF:end>j</LF:end> </rdf:Description>
The difference between barney and betty in this example is that barney is the variable from another LF term, while betty is a literal value. Some arguments of LF terms take lists of variables, which aren't directly representable in the LF graph. These are split into separate graph edges for each variable in the list. The :TMA argument is also split into separate edges for each pair it contains. Here is a table showing all the ways these arguments are split:
| Lisp argument | RDF properties |
|---|---|
| :MEMBERS | role:MEMBER, role:MEMBER, ... |
| :MODS | role:MOD, role:MOD, ... |
| :AND | role:AND-ELEMENT, role:AND-ELEMENT, ... |
| :OR | role:OR-ELEMENT, role:OR-ELEMENT, ... |
| :SEQUENCE | role:SEQUENCE, role:SEQUENCE1, ... |
| :ACTS | role:ACT, role:ACT1, ... |
| :TMA ((TENSE ...) (PERF ...) (NEGATION ...) ...) | role:TENSE, role:PERF, role:NEGATION, ... |
<alt-hyps>The <alt-hyps> element contains <utt>s (or <compound-communication-act>s) representing alternative parsing hypotheses, in order from most to least preferred (the top choice is the parent element). These alternatives will not have the <tags> element (since the TextTagger output will be the same for all hypotheses), and the <tree> may be NIL, but the <words> and <terms> will be present.
The <alt-hyps> element will only appear as the child of a top-level <utt> or <compound-communication-act> element, not more deeply nested elements.
<ekb>Note: this section may be incomplete or out of date; the ekb.dtd is a better reference.
The <ekb> element contains the extraction knowledge base derived from the complete input. It currently has these kinds of child elements:
<input><EVENT><TERM><MODALITY><EPI><CC><input> element, and zero or more elements of the other types. These elements are outlined below in more detail (parts in square brackets are optional, and "..." means either "and so on", or "same format as before"): <input [type="text or article"]> [<paragraphs> <paragraph file="filename" id="paragraph ID"> paragraph text </paragraph> ... </paragraphs>] [<sentences> <sentence id="utterance number" pid="paragraph ID"> sentence text </sentence> ... </sentences>] </input> <TERM id="this term ID" [refid="another term ID"] [dbid="external database reference ID"] start="start character offset" end="end" paragraph="paragraph ID" uttnum="utterance number" lisp="lisp form" rule="extraction rule ID"> <type>TRIPS ontology type</type> [<drum-terms> <drum-term dbid name="concept name" match-score="score between 0 and 1" matched-name="string from the ontology"> <types> <type ...> ... </types> <xrefs> <xref dbid /> ... </xrefs> <species>species name</species> <members type="TRIPS ontology type"> <member id dbid type /> ... </members> </drum-term> ... </drum-terms>] [<mods> <mod or frequency or degree start end> <type ...> <value ...> <text ...> </mod or frequency or degree> ... </mods>] [<features> [<active>value of the "active" feature</active>] [<location id="ID of another term" />] [<mutation id="ID of another term" />] [<mutation>TRUE or FALSE</mutation>] [<site id="ID of another term"> <type ...> <text ...> </site>] [<site>... see sites under mutation below ...</site>] [<cell-line id="ID of another term" start end> <type ...> <text ...> </cell-line>] [<inevent id="ID of an event"> [<type ...> <text ...>] </inevent>] [<ptm type="TRIPS ontology type" event="ID of an event" />] [<bound-to id="ID of another term" event="ID of an event" />] </features>] [<not-features ...same as features, without active or inevent... >] [<name>name</name>] [<coref type="ONT::PRO or ONT::PRO-SET" [id="ID of another term"] />] [<equals id="ID of another term" provenance="rule" />] [<assoc-with id="ID of another term" />] [<members><member type="ONT::PROTEIN" dbid="external database reference ID" />...</members>] [<aggregate operator="AND or OR"><member id="id of another term" />...</aggregate>] [<components><component id="id of another term" />...</components>] [<mutation>ONT::TRUE</mutation>] [<mutation> <type>DELETION or SUBSTITUTION or INSERTION</type> [<pos-from> <site> [<name>amino acid name</name>] [<code>amino acid code</code>] [<pos>amino acid index</pos>] </site> </pos-from>] [<pos-to><site ...></pos-to>] [<pos ...>] [<aa-from><site ...></aa-from>] [<aa-to><site ...></aa-to>] [<insert> <aa> [<name>amino acid name</name>] [<code>amino acid code</code>] </aa> ... </insert>] </mutation>] <text [normalization="normalized version of text"]>text of the term as it appears in the input</text> </TERM> <EVENT same attributes as TERM, no dbid> <type ...> [<drum-terms ...>] [<negation>+ or -</negation>] [<polarity>ONT::POSITIVE or ONT::NEGATIVE</polarity>] [<force>ONT::TRUE or FALSE</force>] [<modality>ONT::word</modality>] [<epistemic-modality id="id of an EPI element" />] [<mods ...>] [<aggregate ...>] [<features> [<inevent> ...] </features>] <predicate id start end> <type ...> <text ...> </predicate> [<arg1 id role="role name" start end> <type ...> <text ...> </arg1>] [<arg2 ...>] ... [<site id start end> <type ...> <text ...> </site>] [<location id mod start end> <type ...> <text ...> </location>] [<from-location id start end> <type ...> <text ...> </from-location>] [<to-location id start end> <type ...> <text ...> </to-location>] [<cell-line id start end> <type ...> <text ...> </cell-line>] <text ...> </EVENT> <MODALITY same attributes as EVENT> <type ...> [<negation ...>] [<polarity ...>] [<epistemic-modality ...>] [<arg1 ...>] ... <text ...> </MODALITY> <EPI same attributes as EVENT> <type ...> [<negation ...>] [<polarity ...>] [<force ...>] [<modality ...>] [<arg1 ...>] ... <text ...> </EPI> <CC same attributes as EVENT> <type ...> [<negation ...>] [<polarity ...>] [<force ...>] [<modality ...>] [<epistemic-modality ...>] [<arg ...>] ... <text ...> </CC>
Note that the uttnum and paragraph attributes are not particularly meaningful in this context. In particular, uttnum does not number <utt> elements. The id attribute may or may not correspond to LF term IDs from the <terms> element described earlier.
When component=texttagger, only TextTagger runs, so there is less information in the response, and its top-level structure is slightly different:
<?xml?> <?xml-stylesheet?> <!DOCTYPE> <texttagger-output> <debug> ... </debug> <utterance> <tags> <lisp>(...)</lisp> <word>...</word> <prefix>...</prefix> ... <prefer>...</prefer> ... </tags> </utterance> ... </texttagger-output>
The new elements are described below.
<texttagger-output>This is the same as <trips-parser-output>, except it omits the attributes that are only relevant for components other than TextTagger (e.g. lfformat).
<utterance>This is like <utt>, except that it represents TextTagger's utterance segmentation instead of the Parser's, and its only child is <tags> (everything under <tags> is the same as in the Parser output format). Instead of a <words> child, it has a text attribute containing the part of the input text this utterance covers (<words> comes from the Parser, and represents more of a commitment to a particular word segmentation than TextTagger generally makes).