Overview

In this workshop we will talk about tagging your texts and tag checking. To do the tag checking of our texts we will use two tools developed by the CROW team. This is a step by step overview of this workshop:

      1.Introduction to tagging

      2.Precision and recall

      3.Tool to check precision

      4.Tool to check precision and recall

      5.Practice


1 Introduction to tagging

To annotate, or tag, a corpus means to add information to each individual token. This information can be grammatical, semantic, pragmatic etc. These tags (or labels) can be used to compare frequencies of determined linguistic feature across texts, or group of texts. In addition, adding Part-of-Speech information to our texts allows us to differentiate between words that have the same spelling (e.g. claim - noun, claim-verb).

In this workshop, we will be talking about part-of-speech annotation, or the first level of this tree-diagram.

There are different softwares or packages you could use to tag your text (TreeTagger, Stanford Parser, CLAWS, etc) each of these softwares has a different tagset and will give you a different output. A Tagset is the set of labels a tagger can identify, for instance, this is the tagset for CLAWS. The figure below compares the output of three taggers (Stanford Parser, CLAWS and the Biber Tagger) for the same text. What differences can you notice between outputs?

1.1 The Biber Tagger output

In this workshop, we will be doing tag checking in the tags of the Biber Tagger. The Biber Tagger outputs the text with each word in the text in the vertical followed by the tag. The tag and the word are separated by a space and every tag has seven tag fields followed by another space and the original CLAWS tag.

women N+CM+PLUR++++ NN2

Where women is the word in the text, ++ is used to separate tag fields, and NN2 is the original CLAWS tag.

If you would like to understand better the difference between the Biber Tagger and other taggers, I recommend Gray (2019).

1.2 Biber Tagger tagset

Going back to my previous example women N+CM+PLUR++++ NN2, we know that N+CM+PLUR++++ is the tag, but now we need to know what each tagfield means. Don’t forget that a tagfield is the label between + like this:

The fact that the Biber Tagger has seven tagfields means that we get up to seven pieces of information in one label, for instance, N+CM+PLUR++++ means that women is a noun N, but it is also a common noun CM, in its plural form PLUR. The table below shows the complete tagset.

ï..legend CLAWS7.Tag Biber.Tag category
dash
Y+DSH+++ punctuation
comma , Y+COM+++ punctuation
semicolon ; Y+SCOL+CLP++ punctuation
colon : Y+COL+CLP++ punctuation
exclamation mark ! Y+EXCL+CLP++ punctuation
question mark ? Y+QUE+CLP++ punctuation
single quote mark Y+APO+++ punctuation
double quote mark " Y+QUO+++ punctuation
left parenthesis ( Y+PAR+LEFT++ punctuation
right parenthesis ) Y+PAR+RIGHT++ punctuation
third person impersonal possessive determiner APPGE D+IMPR++3+GE++ determiners
singular first person possessive determiner APPGE D+PER+SING+1+GE++ determiners
second person possessive determiner APPGE D+PER++2+GE++ determiners
plural first person possessive determiner APPGE D+PER+PLUR+1+GE++ determiners
plural third person personal possessive determiner APPGE D+PER+PLUR+3+GE++ determiners
singular third person personal possessive determiner APPGE D+PER+SING+3+GE++ determiners
article AT D+AT+++++ determiners
definite article AT D+AT++DEF+++ determiners
singular definite article AT D+AT+SING+DEF+++ determiners
singular indefinite article AT1 D+AT+SING+INDEF+ determiners
multi-word infinitive marker BCL TO+++++MULTI+
coordinating conjunction CC C+C++++ coordinators
coordinating conjunction + clausal connector CC C+C+CLS+++ coordinators
coordinating conjunction + phrasal connector CC C+C+PHRS+++ coordinators
adversative coordinating cojunction CCB C+C+++ADVS+ conjunction
conditional subordinating conjunction CS C+S+CND++ subordinators
concessive subordinating conjunction CS C+S+CON++ subordinators
subordinating conjunction CS C+S+++ subordinators
WH subordinating conjunction CSW C+S+WH++++ subordinators
pronominal post-determiner DA D+DA+PRO++++ determiners
singular post-determiner DA1 D+DA+SING++++ determiners
plural post determiner DA2 D+DA+PLUR++++ determiners
comparative after-determiner DAR D+DA+ER++++ determiners
superlative after-determiner DAT D+DA+EST+++ determiners
pre-quantifier DB D+DB+++++ determiners
pre-quantifier / double conjunction DB2 D+DB+DOUB++ determiners
singular determiner DD D++SING++++ determiners
plural determiner DD D++PLUR+++
plural determiner with pronominal function DD D++PLUR++PRO++
singular determiner with pronominal function DD D++SING++PRO++ determiners
singular demonstrative determiner DD1 D+DEM+SING++++ determiners
singular demonstrative pronoun DD1 P+DEM+SING++ pronouns
plural demonstrative determiner DD2 D+DEM+SING++++ determiners
plural demonstrative pronoun DD2 P+DEM+PLUR++ pronouns
WH-determiner DDQ D++WH++++ WH-words
WH-determiner as clause head DDQ D++WH+CLS+++ WH-words
WH-determiner as queston word DDQ D++WH+QUE+++ WH-words
genitive WH-relativizer DDQGE D+REL+WH++++ WH-words
genitive WH-determiner DDQGE D++WH+GE+++ WH-words
genitive WH-determiner as a question word DDQGE D++WH+GE+QUE++ WH-words
genitive WH-determiner as extraposed clause head DDQGE D++WH+CLS+GE++EXT WH-words
WH-ever determiner DDQV D++WH+++EVER+ WH-words
WH-ever determiner as clause head DDQV D++WH+CLS++EVER+ WH-words
WH-ever determiner as question word DDQV D++WH+QUE++EVER++ WH-words
existential there EX P+EX+++++ pronouns
formula symbols FO FO++++ punctuation
unclassified word FU FU++++ unclassified
ain’t as an auxiliary verb FU VA+AI++++BE+ verbs
ain’t as a main verb FU VL+AI++++BE+ verbs
foreign word FW FW++++ foreign words
genitive marker GE G++++ genitive
multi-word preposition II/IO/IW I+++++MULTI+ prepositions
preposition II/IO/IW I++++ prepositions
stranded preposition II/IO/IW I+STR+++ prepositions
attributive adjective JJ J++ATRB++++ adjectives
predicative adjective JJ J++PRED+++ adjectives
attributive-only adjective JJ J++ATRB+++ONLY+ adjectives
comparative attributive adjective JJR J++ATRB+ER++ adjectives
comparative predicative adjective JJR J++PRED+ER++ adjectives
superlative attributive adjective JJT J++ATRB+EST++ adjectives
superlative predicative adjective JJT J++PRED+EST++ adjectives
cardinal number MC M+C+++ numbers
singular cardinal number greater than one MC M+C+SING+MORE++ numbers
year MC M+C+++ numbers
singular cardinal number equal to one MC1 M+C+SING+ONE++ numbers
plural cardinal number equal to one MC2 M+C+PLUR+ONE++ numbers
plural cardinal number not equal to one MC2 M+C+PLUR+MORE++ numbers
hyphenated number MCMC M+C+++HYPH++ numbers
ordinal number MD M+D+++ numbers
fraction MF M+F+++ numbers
singular noun of direction ND1 N++SING++DIR++ nouns
common noun neutral for number NN N+CM+NEUT++++ nouns
singular common noun NN1 N+CM+SING++ nouns
plural common noun NN2 N+CM+PLUR++ nouns
post-nominal titular noun NNA N+PR+++TIT+A nouns
pre-nominal titular noun NNB N+PR+++TIT+B nouns
plural locative noun NNL2 N++PLUR++L+ nouns
numeral noun neutral for number NNO N++++O++ numbers
plural numeral noun NNO2 N++PLUR++O+ numbers
unit of measurement neutral for number NNU N++++U++ nouns
singular unit of measurement NNU1 N++SING++U+ nouns
plural unit of measurement NNU2 N++PLUR++U+ nouns
proper noun neutral for number NP N+PR+++ nouns
singular proper noun NP1 N+PR+SING++ nouns
plural proper noun NP2 N+PR+PLUR++ nouns
singular day of the week NPD1 N+PR+SING++DAY++ nouns
plural day of the week NPD2 N+PR+PLUR++DAY++ nouns
singular month of the year NPM1 N+PR+SING++M++ nouns
plural month of the year NPM2 N+PR+PLUR++M+ nouns
neutral number indefinite pronoun PN P++NEUT+INDEF++++ pronouns
indefinite pronoun PN1 P+++INDEF+++ pronouns
WH-pronoun as question word PNQS P+WH+QUE++ WH-words
WH-ever pronoun PNQV P+WH++EVER+ WH-words
indefinite reflexive pronoun PNX1 P+++INDEF+X++ pronouns
possessive personal pronoun PPGE P+PER+++POS pronouns
third person impersonal pronoun PPH1 P+IM++3+ pronouns
singular third person personal object pronoun PPHO1 P+PER+SING+3+O pronouns
plural third person personal object pronoun PPHO2 P+PER+PLUR+3+O pronouns
singular third person personal subject pronoun PPHS1 P+PER+SING+3+S pronouns
plural third person personal subject pronoun PPHS2 P+PER+PLUR+3+S pronouns
singular first person object pronoun PPIO1 P+PER+SING+1+O pronouns
plural first person object pronoun PPIO2 P+PER+PLUR+1+O pronouns
singular first person subject pronoun PPIS1 P+PER+SING+1+S pronouns
plural first person subject pronoun PPIS2 P+PER+PLUR+1+S pronouns
singular reflexive first person pronoun PPX1 P+PER+SING+1+X pronouns
singular second person reflexive pronoun PPX1 P+PER+SING+2+X pronouns
singular reflexive third person personal pronoun PPX1 P+PER+SING+3+X pronouns
plural reflexive first person pronoun PPX2 P+PER+PLUR+1+X pronouns
plural reflexive third person personal pronoun PPX2 P+PER+PLUR+3+X pronouns
second person pronoun PPY P+PER++2++ pronouns
adverb after nominal head RA R+A+++ adverbs
adverb introducing appositional constructions REX R+++++APP+ adverbs
degree adverb RG R+++++DEG+ adverbs
wh-degree adverb RGQ R+G+WH++ WH-words
comparative degree adverb RGR R+++ER++DEG+ adverbs
comparative adverb (between aux. verb and main verb) RGR R+++ER+SPLT++ adverbs
superlative degree adverb RGT R+++EST++DEG+ adverbs
place marker adverb RL R+++++PL+ adverbs
locative adverb RL R+L+++++ adverbs
adverbial particle RP R+PART+++ adverbs
adverb RR R++++++ adverbs
WH-Adverb RRQ R++WH++ WH-words
wh-adverb as clause head RRQ R++WH+CLS+ WH-words
wh-adverb as question word RRQ R++WH+QUE+ WH-words
wh-adverb as extraposed clause head RRQ R+WH++CLS+++EXT WH-words
WH-ever adverb RRQV R++WH+++EVER+ WH-words
comparative adverb RRR R+++ER+++ adverbs
superlative adverb RRT R+++EST+++ adverbs
quasi-nominal adverb of time RT R+T+++ adverbs
infinitive marker TO TO++++ to
to as extraposed clause head TO TO++++++EXT to
interjection/filler UH UH++++ interjections
base form of be as auxiliary verb VB0 VA+BF++++BE+ verbs
be as main verb VB0 VL+0+++ verbs
being as a main verb VBG VL+ING+++++ verbs
being as an auxiliary verb VBG VA+ING++++BE+ verbs
being as a present progressive postnominal modifier VBG VL+ING++REL+NOUN++ verbs
be as an infinitive main verb VBI VL+INF++++BE+ verbs
contracted am as auxiliary verb VBM VA+AM+++CONT+BE+ verbs
contracted am as main verb VBM VL+AM+++CONT++ verbs
am as main verb VBM VL+M+++ verbs
been as an auxiliary verb VBN VA+EN++BE+++ verbs
been as a main verb VBN VL+EN+++++ verbs
contracted are as auxiliary verb VBR VA++++CONT+BE+ verbs
are as main verb VBR VL+Z+++ verbs
contracted are as main verb VBR VL+Z+++CONT verbs
is as an auxiliary verb VBZ VA+Z++++BE+ verbs
is as an auxiliary verb (contracted) VBZ VA+Z+++CONT+BE+ verbs
base form of do as auxiliary verb VDD VA+++++DO+ verbs
did as a main verb VDD VL+ED++++DO+ verbs
base form of do as main verb VDD0 VL+BF++++DO+ verbs
doing as a present progressive verb VDG VL+ING+PRG++DO+ verbs
doing as a present progressive postnominal modifier VDG VL+ING++REL++DO+ verbs
do as an infinitive main verb VDI VL+INF+++DO+ verbs
does as a main verb VDZ VL+EN++++DO+ verbs
does as an auxiliary verb VDZ VA+Z++++DO+ verbs
base form of have as main verb VH0 VL+BF++++HV+ verbs
contracted have as main verb VH0 VL+0+++CONT verbs
had as a past simple main verb VHD VL+ED++++HV+ verbs
having as a main verb VHG VL+ING++++HV+ verbs
have as an infinitive main verb VHI VL+ING++++HV+ verbs
had as a past participle main verb VHN VL+EN++++HV+ verbs
has as a main verb VHZ VL+Z++++HV+ verbs
necessity modal VM VM+++++NEC+ modals
necessity modal (contracted) VM VM++++CONT+NEC+ modals
possibility modal VM VM+++++POS+ modals
possibility modal (contracted) VM VM++++CONT+POS+ modals
prediction modal VM VM+++++PRDN+ modals
prediction modal (contracted) VM VM++++CONT+PRDN+ modals
multi-word periphrastic necessity modal VMK VM+++++NEC+MULTI modals
uninflected present tense and imperative base form of verb VV0 VL+BF+++++ verbs
present progressive verb VVG VL+ING+PROG++++ verbs
present progressive postnominal modifier VVG VL+ING++REL+NOUN++ verbs
present participle verb (indeterminate function) VVG VL+ING+++++ verbs
infinitive as main verb VVI VL+INF+++++ verbs
infinitive of be as auxiliary verb VVI VA+INF++++BE+ verbs
past participle verb VVN VL+EN+++++ verbs
perfect aspect past participle verb VVN VL+EN+PRF++++ verbs
agentless passive verb VVN VL+EN+PSV+AGLS++ verbs
by passive verb VVN VL+EN+PSV+BY++ verbs
third person verb VVZ VL++Z++++ verbs
not XX XX++++ adverbs
contracted not XX XX++++CONT++ adverbs
punctuation Y Y++++ punctiation
ellipsis YLIP Y+ELIP+++ punctuation
full-stop/period YSTP Y+PER+++ punctuation
singular letter of the alphabet ZZ1 ZZ++SING++ nouns
plural letter of the alphabet ZZ2 ZZ++PLUR++ nouns
percent sign Y+PCT+++ punctuation
multi-word coordinating conjunction C+C++++MULTI+ coordinators
multi-word subordinating conjunction C+S++++MULTI+ subordinators
causative subordinating conjunction C+S+CAUS++ subordinators
double conjunction C++DOUB++++ determiners
preposition from a prepositional verb I+PPVB+++ prepositions
singular past participle noun N+EN+SING++++ nouns
singular nominalization N++SING+NOM++ nouns
plural nominalization N++PLUR+NOM+ nouns
singular titular noun N+PR+SING+TIT++ nouns
singular present participle noun N+ING+SING++++ nouns
multi-word adverb R++++++MULTI adverbs
negative adverb R+++++NEG+ adverbs
adverb (between aux. verb and main verb) R++++SPLT++ adverbs
adverbial particle (between aux. verb and main verb) R+PART+++SPLT++ adverbs
adverb conjunct R+CJ+++ adverbs
conjunctive adverb (between aux. verb and main verb) R+CJ+++SPLT++ adverbs
discourse particle R++DSPT++ adverbs
discourse particle (between aux. verb and main verb) R++DSPT+SPLT+ adverbs
phrasal verb adverb R+PART+PHRV++ adverbs
place marker adverb (between aux. verb and main verb) R++++SPLT+PL+ adverbs
time marker adverb R+TM+++ adverbs
time marker adverb (between aux. verb and main verb) R+TM++SPLT+ adverbs
comparative time marker adverb R+TM+R++ adverbs
comparative time marker adverb (between aux. verb and main verb) R+TM+R+SPLT+ adverbs
place marker adverbial particle R+PART++++PL+ adverbs
place marker adverbial particle (between aux. verb and main verb) R+PART+++SPLT+PL+ adverbs
that as complementizer in adjective complement clause C+S+COMP+THT+ADJ+ that
that as complementizer in noun complement clause C+S+COMP+THT+NOUN+ that
that as relativizer in relative clause C+S+REL+++ that
that as relativizer in object-gap relative clause C+S+REL+OBJ++ that
that as relativizer in subject-gap relative clause C+S+REL+SUB++ that
that as complementizer in verb complement clause C+S+COMP+THT+VERB+ that
had as an auxiliary verb VA+ED++++HV+
WH-relativizer P+WH+REL+OBJ++ WH-words
WH-relativizer with object gap and prepositional fronting (pied piping) P+WH+REL+OBJ+PIED+ WH-words
WH-relativizer with subject gap P+WH+REL+SUB++ WH-words
WH-pronoun P++WH+++ WH-words
conditional subordinating conjunction as extraposed clause head C+S+CND+CLS+++EXT subordinators
WH subordinating conjunction as extraposed clause head C+S+WH+CLS+++EXT
WH-determiner as extraposed clause head D++WH+CLS+++EXT WH-words
plural indefinite article D+AT+PLUR+INDEF+++ determiners
plural indefnite article with pronominal function D+AT+PLUR+INDEF+PRO++
WH-determiner as extraposed clause head D+WH++CLS+EX WH-words
past participle adjective J+EN+++++ adjectives
present participle adjective J+ING+++++ adjectives
number M++++++ numbers
date M+T+++++ numbers
pre-modifying noun N+++++PMOD++ nouns
plural past participle noun N+EN+PLUR++++ nouns
plural present participle noun N+ING+PLUR++++
WH-pronoun as extraposed clause head P++WH+CLS+++EXT WH-words
dummy it P+IM++3+EXT pronouns
singular reflexive third person impersonal pronoun P+IMPR+SING+3+X
multi-word adverb (between aux. verb and main verb) R++++SPLT++MULTI adverbs
WH Adverb as subordinator R+S+WH+CLS+++ subordinators
that as a extraposed clause head THT+++CLS+++EXT that
are as an auxilliary verb VA+++++BE+
am as an auxilliary verb VA+AM++++BE+
be as a auxiliary verb VA+BF++++BE+
contracted have as an auxilliary verb VA+BF+++CONT+HV+
was or were as an auxilliary verb VA+ED++++BE+
contracted had as an auxilliary verb VA+ED+++CONT+HV+
done as an auxiliary verb VA+EN++++DO+
infinitive of do as an auxiliary verb VA+INF++++DO+
infinitive of have as an auxilliary verb VA+INF++++HV+
doing as an auxiliary verb VA+ING++++DO+
having as an auxilliary verb VA+ING++++HV+
has as an auxilliary verb VA+Z++++HV+
contracted has as an auxilliary verb VA+Z+++CONT+HV
base form of copular verb VL+++COP+++ verbs
past participle verb as adverbial clause VL+EN++CLS+ADVL++
past participle of copular verb VL+EN++COP+++
perfect aspect copular verb VL+EN+PERF+COP+++
infinitive of copular verb VL+INF++COP+++
doing as main verb VL+ING++++DO+
present participle verb in adverbial clause VL+ING++CLS+ADVL++
extraposed ing participle clause VL+ING++COMP+++EXT
present participle as object of prepositional phrase VL+ING++COMP+PREP++
present participle verb complement clause VL+ING++COMP+VERB++
present participle copular verb VL+ING++COP+++
present progressive copular verb VL+ING+PROG+COP+++
third person copular verb VL+Z++COP+++
present-tense semi-modal VM++++++MULTI modals
no tense semi-modal VM++NO++++MULTI modals
third-person semi-modal VM++Z+++MULTI
past-tense semi-modal VM+ED+++++MULTI modals
left bracket Y+BRAC+LEFT++ punctuation
right bracket Y+BRAC+RIGHT++ punctuation
dollar sign Y+DOL+++ punctuation
hashtag? Y+HASH+++++

You can download the table here.

2 Precision and Recall

After we tagged our texts, it is important to check the precision and recall of our tags to make sure that we are counting what we think we are counting. In the following text, for example, how many noun (NN) tags do we have? and how many are not accurate?

ï..Word Tag Lemma
This DT this
study NN study
uses NN use
F-MRI NP MRI
( ( (
functional NN functional
magnetic NN magnetic
resonance NN resonance
imaging NN imaging
) ) )
to TO to
investigate VV investigate
the DT the
brain NN brain
activity NN activity
in IN in
a DT a
set NN set
of IN of
cortical NN cortical
areas NNS area
in IN in
the DT the
task NN task
of IN of
main JJ main
idea NN idea
identification NN identification
, , ,
when WRB when
the DT the
topic NN topic
sentence NN sentence
was VBD be
presented VVN present
in IN in
first JJ first
versus CC versus
in IN in
last JJ last
position NN position
in IN in
a DT a
three-sentence NN three-sentence
paragraph NN paragraph
. SENT .
The DT the
participants NNS participant
were VBD be
eight CD eight
right-handed JJ right-handed
undergraduate JJ undergraduate
students NNS student
from IN from
Carnegie NP Carnegie
Mellon NP Mellon
University NP University
, , ,
six CD six
male NN male
and CC and
2 CD @card@
female NN female
, , ,
all DT all
native JJ native
speakers NNS speaker
of IN of
English NP English
. SENT .
Each DT each
participant NN participant
read VVD read
twelve NN twelve
paragraphs NNS paragraph
, , ,
six CD six
in IN in
which WDT which
the DT the
topic NN topic
sentence NN sentence
was VBD be
paragraph NN paragraph
initial JJ initial
and CC and
six CD six
in IN in
which WDT which
it PP it
was VBD be
paragraph NN paragraph
final JJ final
, , ,
and CC and
each DT each
paragraph NN paragraph
was VBD be
presented VVN present
word NN word
by IN by
word NN word
in IN in
the DT the
center NN center
of IN of
a DT a
screen NN screen
, , ,
inside IN inside
the DT the
scanner NN scanner
. SENT .
The DT the
major JJ major
finding NN finding
of IN of
the DT the
current JJ current
study NN study
is VBZ be
the DT the
differential JJ differential
response NN response
observed VVN observe
in IN in
the DT the
left JJ left
and CC and
right JJ right
hemispheres NNS hemisphere
as RB as
to TO to
the DT the
location NN location
of IN of
the DT the
topic NN topic
sentence NN sentence
within IN within
the DT the
paragraph NN paragraph
. SENT .
The DT the
left JJ left
temporal JJ temporal
region NN region
showed VVD show
greater JJR great
activation NN activation
when WRB when
the DT the
topic NN topic
sentence NN sentence
was VBD be
in IN in
final JJ final
position NN position
than IN than
in IN in
initial JJ initial
position NN position
. SENT .
The DT the
right JJ right
temporal JJ temporal
region NN region
, , ,
on IN on
the DT the
other JJ other
hand NN hand
, , ,
was VBD be
affected VVN affect
only RB only
by IN by
sentence NN sentence
type NN type
, , ,
showing VVG show
a DT a
greater JJR great
response NN response
to TO to
topic NN topic
sentences NNS sentence
than IN than
support NN support
sentences NNS sentence
, , ,
regardless RB regardless
of IN of
their PP$ their
location NN location
within IN within
the DT the
paragraph NN paragraph
. SENT .

2.1 Calculating precision and recall

Precision is the percentage of the tags that are correct. Recall is the percentage of the tags that were correctly identified. Confusing? I find the following image helpful when trying to understand the idea of precision and recall.

We calculate precision and recall by using the following formula.

For example, if we search for NN here is how we would count true/false positives and true/false negatives.

Original.Tag Correct.Tag Count
NN RB False Positive
NN NN True Positive
RB NN Fase Negative
NN NN True Positive
RB RB True Negative
RB RB True Negative

3 Tool to check precision

The New Biber Tag Checking Tool simply helps us search for one tag at a time. When we find error in the tag and fix, it gives us the precision for that specific tag.

  1. We need to decide on a tag to search for, the default search is set to PMOD right now, but you can change to any tag in the tagset.

  1. Input a tagged text

  1. Check if the highlighted words are correctly tagged

All the words that are highlighted contain the tag you input in your search.

  1. Fix the tags that are wrong

In this example, I deleted the PMOD.

  1. Check precision

Once you start making changes in the tags, this will appear on the top of your screen.

  1. Save the tag checked file

If you want to continue searching for the same tag just input another file, or if you want to search for another tag click on Clear Tool

In this workshop I am showing the tool online, but you can download the html file from github and run it offline too.

Bonus: if you are taking INF502 or you want to practice your Git skills, you can clone the repo to your computer, link here

4 Tool to check precision and recall

The Biber Tag Checking Tool looks the same as the previous tool, but the difference is that you can check the precision for every token in a text, this way you can check for both precision and recall.

  1. Input a tagged text

  1. Click on the first word in the text

  1. Use the << and >> menus to move to the next token

  1. Use the dropdown menus to fix incorrect tags

  1. Check the new tag

When you make changes to a tag the new Biber Tag will appear

  1. Track your progress

You can check the percentage of the text that you have tag checked by looking at the token # at the top of the page.

  1. Save your tag checked file

  1. Precision and Recall

To calculate precision and recall after tag checking, you can open your file with excel and compare the column with the original tags and the edited ones. It will look like this:

ï..Column1 Column2 Column3 Column4 Column5
Token Original_Biber_Tags Original_Claws_Tag New_Biber_Tags Comments
The DET+AT+SING+DEF+++ AT ++++++
publication N+CM+SING+NOM+++ NN1 P+COM+PLUR+INDEF++B+MULTI1
of I++++++ IO ++++++
The DET+AT+SING+DEF+++ AT ++++++
Curve N+PR+SING++++ NN1 ++++++
concept N+CM+SING++++ NN1 ++++++
of I++++++ IO ++++++
intelligence N+CM+SING+NOM+++ NN1 ++++++
has VL+Z+++++ VHZ DET+CM+++++
been VL+EN+PRF+COP+++ VBN ++++++
the DET+AT+SING+DEF+++ AT ++++++
pariah N+CM+SING++++ NN1 ++++++
expressing VL+ING++COMP+PREP++ VVG ++++++
a D+AT+SING+INDEF+++ AT1 ++++++
person N+CM+SING++++ NN1 DET+COL+RIGHT++++
’s GE++++++ GE ++++++
intelligence N+CM+SING+NOM+++ NN1 ++++++
intelligence ++++++ ++++++
Especially R++++++ RR ++++++
right’ J++ATRB++++ JJ ++++++
( Y+PAR+LEFT++++ ( ++++++
Miele N+PR+SING++++ NP1 EX+DOL+PPVB++++
1995 M+CARD+SING+MORE+++ MC ++++++
) Y+PAR+RIGHT++++ ) ++++++

This is just an example, the tags are not correct and I cut a part of the text

5 Precision and Recall on Excel

Looking at the results of the table below, we can calculate precision and recall by marking the ones that are true positive, false positives and false negatives.

ï..Original_Biber_Tags New_Biber_Tags Comments
DET+AT+SING+DEF+++
N+CM+SING+NOM+++ P+COM+PLUR+INDEF++B+MULTI1 False Positive
I++++++ ++++++
DET+AT+SING+DEF+++ ++++++
N+PR+SING+++PMOD+ JJ+COM+++++ False Positive
N+PR+SING++++ JJ+COM+++++ False Positive
I++++++ ++++++
M+CARD+SING+MORE+++ ++++++
Y+COM+++++ ++++++
DET+AT+SING+DEF+++ ++++++
N+CM+PLUR++++ JJ+COM+++++ False positive
P+IM++3+++ ++++++
VL+ED+++++ ++++++
C+CRD+++++ ++++++
N+CM+PLUR++++ ++++++ True Positive
P+IM++3+++ ++++++
VL+ED+++++ ++++++
VL+BF+++++ ++++++
VL+EN+PRF++++ ++++++
D+AT+SING+INDEF+++ ++++++
J++ATRB++++ N+CM+++++ False Negative
N+CM+SING++++ ++++++ True Positive
I++++++ ++++++
DET+AT+SING+DEF+++ ++++++
N+CM+PLUR+++TIME+ ++++++ True Positive
I++++++ ++++++
D+IMPR++3+V++ ++++++
N+CM+SING+NOM+++ ++++++ True Positive
Y+PAR+LEFT++++ ++++++
N+PR+SING++++ ++++++ True Positive
M+CARD+SING+MORE+++ ++++++
Y+PAR+RIGHT++++ ++++++
++++++ ++++++
D++SING++++ ++++++
VL+Z+++++ ++++++
R++++++ ++++++
I++++++ ++++++
D+IMPR++3+V++ ++++++
J+EN+ATRB++++ N+CM+++++ False Negative
J++ATRB++++ ++++++
N+CM+PLUR+NOM+++ JJ+COM+++++ False Positive
Y+COM+++++ ++++++
VL+ED+++++ ++++++
N+CM+PLUR+NOM+++ ++++++ True Positive
I++++++ ++++++
J++ATRB++++ ++++++
N+CM+SING++++ ++++++
Y+COM+++++ ++++++
C+CRD+++++ ++++++

If I am interested in the precision and recall of the tag for nouns, I have the following results:

ï..Precision N Recall N.1
True positives 6 True Positive 6
False positives 5 False Negative 2
All nouns identified 11 Real N of nouns 8
Precision True Positives/All nouns id Recall True Positives/Real N of nouns
Precision 0.545454545 Recall 0.75

6 Practice

  1. Select five tags from the list below. You can also choose a tag that you know will be important for your project
Tag Key
C+SRD+THT+COMP+VERB Verb complement clause
CND+ Conditionals
WH+CLS Wh-Clause
I+STR+++ Stranded preposition
ATRB Atributive Adjectives
C+SRD+THT+REL+ That relative clauses
WH+  Wh-Word
EX+  Existential There
VM+  Modal verbs
N+  Nouns

Group 1 - Precision

  1. Check the precision of these features for the first 200 words of 5 texts.

Group 2 - Precision and Recall

  1. Check the precision and recall of these features for the first 200 word of 5 texts

Report the results here

Tip:

The tool does not work if you add extra +++, therefore, if you are looking for nouns you want to do just N+ and not N+++++.

If you find issues with the tool or the tagset while doing your homework, document then here.