Overview

In this workshop we will talk about tagging your texts and tag checking. To do the tag checking of our texts we will use two tools developed by the CROW team. This is a step by step overview of this workshop:

1.Introduction to tagging

2.Precision and recall

3.Tool to check precision

4.Tool to check precision and recall

5.Practice

1 Introduction to tagging

To annotate, or tag, a corpus means to add information to each individual token. This information can be grammatical, semantic, pragmatic etc. These tags (or labels) can be used to compare frequencies of determined linguistic feature across texts, or group of texts. In addition, adding Part-of-Speech information to our texts allows us to differentiate between words that have the same spelling (e.g. claim - noun, claim-verb).

In this workshop, we will be talking about part-of-speech annotation, or the first level of this tree-diagram.

There are different softwares or packages you could use to tag your text (TreeTagger, Stanford Parser, CLAWS, etc) each of these softwares has a different tagset and will give you a different output. A Tagset is the set of labels a tagger can identify, for instance, this is the tagset for CLAWS. The figure below compares the output of three taggers (Stanford Parser, CLAWS and the Biber Tagger) for the same text. What differences can you notice between outputs?

1.1 The Biber Tagger output

In this workshop, we will be doing tag checking in the tags of the Biber Tagger. The Biber Tagger outputs the text with each word in the text in the vertical followed by the tag. The tag and the word are separated by a space and every tag has seven tag fields followed by another space and the original CLAWS tag.

women N+CM+PLUR++++ NN2

Where women is the word in the text, ++ is used to separate tag fields, and NN2 is the original CLAWS tag.

If you would like to understand better the difference between the Biber Tagger and other taggers, I recommend Gray (2019).

1.2 Biber Tagger tagset

Going back to my previous example women N+CM+PLUR++++ NN2, we know that N+CM+PLUR++++ is the tag, but now we need to know what each tagfield means. Don’t forget that a tagfield is the label between + like this:

The fact that the Biber Tagger has seven tagfields means that we get up to seven pieces of information in one label, for instance, N+CM+PLUR++++ means that women is a noun N, but it is also a common noun CM, in its plural form PLUR. The table below shows the complete tagset.

ï..legend	CLAWS7.Tag	Biber.Tag	category
dash		Y+DSH+++	punctuation
comma	,	Y+COM+++	punctuation
semicolon	;	Y+SCOL+CLP++	punctuation
colon	:	Y+COL+CLP++	punctuation
exclamation mark	!	Y+EXCL+CLP++	punctuation
question mark	?	Y+QUE+CLP++	punctuation
single quote mark	’	Y+APO+++	punctuation
double quote mark	"	Y+QUO+++	punctuation
left parenthesis	(	Y+PAR+LEFT++	punctuation
right parenthesis	)	Y+PAR+RIGHT++	punctuation
third person impersonal possessive determiner	APPGE	D+IMPR++3+GE++	determiners
singular first person possessive determiner	APPGE	D+PER+SING+1+GE++	determiners
second person possessive determiner	APPGE	D+PER++2+GE++	determiners
plural first person possessive determiner	APPGE	D+PER+PLUR+1+GE++	determiners
plural third person personal possessive determiner	APPGE	D+PER+PLUR+3+GE++	determiners
singular third person personal possessive determiner	APPGE	D+PER+SING+3+GE++	determiners
article	AT	D+AT+++++	determiners
definite article	AT	D+AT++DEF+++	determiners
singular definite article	AT	D+AT+SING+DEF+++	determiners
singular indefinite article	AT1	D+AT+SING+INDEF+	determiners
multi-word infinitive marker	BCL	TO+++++MULTI+
coordinating conjunction	CC	C+C++++	coordinators
coordinating conjunction + clausal connector	CC	C+C+CLS+++	coordinators
coordinating conjunction + phrasal connector	CC	C+C+PHRS+++	coordinators
adversative coordinating cojunction	CCB	C+C+++ADVS+	conjunction
conditional subordinating conjunction	CS	C+S+CND++	subordinators
concessive subordinating conjunction	CS	C+S+CON++	subordinators
subordinating conjunction	CS	C+S+++	subordinators
WH subordinating conjunction	CSW	C+S+WH++++	subordinators
pronominal post-determiner	DA	D+DA+PRO++++	determiners
singular post-determiner	DA1	D+DA+SING++++	determiners
plural post determiner	DA2	D+DA+PLUR++++	determiners
comparative after-determiner	DAR	D+DA+ER++++	determiners
superlative after-determiner	DAT	D+DA+EST+++	determiners
pre-quantifier	DB	D+DB+++++	determiners
pre-quantifier / double conjunction	DB2	D+DB+DOUB++	determiners
singular determiner	DD	D++SING++++	determiners
plural determiner	DD	D++PLUR+++
plural determiner with pronominal function	DD	D++PLUR++PRO++
singular determiner with pronominal function	DD	D++SING++PRO++	determiners
singular demonstrative determiner	DD1	D+DEM+SING++++	determiners
singular demonstrative pronoun	DD1	P+DEM+SING++	pronouns
plural demonstrative determiner	DD2	D+DEM+SING++++	determiners
plural demonstrative pronoun	DD2	P+DEM+PLUR++	pronouns
WH-determiner	DDQ	D++WH++++	WH-words
WH-determiner as clause head	DDQ	D++WH+CLS+++	WH-words
WH-determiner as queston word	DDQ	D++WH+QUE+++	WH-words
genitive WH-relativizer	DDQGE	D+REL+WH++++	WH-words
genitive WH-determiner	DDQGE	D++WH+GE+++	WH-words
genitive WH-determiner as a question word	DDQGE	D++WH+GE+QUE++	WH-words
genitive WH-determiner as extraposed clause head	DDQGE	D++WH+CLS+GE++EXT	WH-words
WH-ever determiner	DDQV	D++WH+++EVER+	WH-words
WH-ever determiner as clause head	DDQV	D++WH+CLS++EVER+	WH-words
WH-ever determiner as question word	DDQV	D++WH+QUE++EVER++	WH-words
existential there	EX	P+EX+++++	pronouns
formula symbols	FO	FO++++	punctuation
unclassified word	FU	FU++++	unclassified
ain’t as an auxiliary verb	FU	VA+AI++++BE+	verbs
ain’t as a main verb	FU	VL+AI++++BE+	verbs
foreign word	FW	FW++++	foreign words
genitive marker	GE	G++++	genitive
multi-word preposition	II/IO/IW	I+++++MULTI+	prepositions
preposition	II/IO/IW	I++++	prepositions
stranded preposition	II/IO/IW	I+STR+++	prepositions
attributive adjective	JJ	J++ATRB++++	adjectives
predicative adjective	JJ	J++PRED+++	adjectives
attributive-only adjective	JJ	J++ATRB+++ONLY+	adjectives
comparative attributive adjective	JJR	J++ATRB+ER++	adjectives
comparative predicative adjective	JJR	J++PRED+ER++	adjectives
superlative attributive adjective	JJT	J++ATRB+EST++	adjectives
superlative predicative adjective	JJT	J++PRED+EST++	adjectives
cardinal number	MC	M+C+++	numbers
singular cardinal number greater than one	MC	M+C+SING+MORE++	numbers
year	MC	M+C+++	numbers
singular cardinal number equal to one	MC1	M+C+SING+ONE++	numbers
plural cardinal number equal to one	MC2	M+C+PLUR+ONE++	numbers
plural cardinal number not equal to one	MC2	M+C+PLUR+MORE++	numbers
hyphenated number	MCMC	M+C+++HYPH++	numbers
ordinal number	MD	M+D+++	numbers
fraction	MF	M+F+++	numbers
singular noun of direction	ND1	N++SING++DIR++	nouns
common noun neutral for number	NN	N+CM+NEUT++++	nouns
singular common noun	NN1	N+CM+SING++	nouns
plural common noun	NN2	N+CM+PLUR++	nouns
post-nominal titular noun	NNA	N+PR+++TIT+A	nouns
pre-nominal titular noun	NNB	N+PR+++TIT+B	nouns
plural locative noun	NNL2	N++PLUR++L+	nouns
numeral noun neutral for number	NNO	N++++O++	numbers
plural numeral noun	NNO2	N++PLUR++O+	numbers
unit of measurement neutral for number	NNU	N++++U++	nouns
singular unit of measurement	NNU1	N++SING++U+	nouns
plural unit of measurement	NNU2	N++PLUR++U+	nouns
proper noun neutral for number	NP	N+PR+++	nouns
singular proper noun	NP1	N+PR+SING++	nouns
plural proper noun	NP2	N+PR+PLUR++	nouns
singular day of the week	NPD1	N+PR+SING++DAY++	nouns
plural day of the week	NPD2	N+PR+PLUR++DAY++	nouns
singular month of the year	NPM1	N+PR+SING++M++	nouns
plural month of the year	NPM2	N+PR+PLUR++M+	nouns
neutral number indefinite pronoun	PN	P++NEUT+INDEF++++	pronouns
indefinite pronoun	PN1	P+++INDEF+++	pronouns
WH-pronoun as question word	PNQS	P+WH+QUE++	WH-words
WH-ever pronoun	PNQV	P+WH++EVER+	WH-words
indefinite reflexive pronoun	PNX1	P+++INDEF+X++	pronouns
possessive personal pronoun	PPGE	P+PER+++POS	pronouns
third person impersonal pronoun	PPH1	P+IM++3+	pronouns
singular third person personal object pronoun	PPHO1	P+PER+SING+3+O	pronouns
plural third person personal object pronoun	PPHO2	P+PER+PLUR+3+O	pronouns
singular third person personal subject pronoun	PPHS1	P+PER+SING+3+S	pronouns
plural third person personal subject pronoun	PPHS2	P+PER+PLUR+3+S	pronouns
singular first person object pronoun	PPIO1	P+PER+SING+1+O	pronouns
plural first person object pronoun	PPIO2	P+PER+PLUR+1+O	pronouns
singular first person subject pronoun	PPIS1	P+PER+SING+1+S	pronouns
plural first person subject pronoun	PPIS2	P+PER+PLUR+1+S	pronouns
singular reflexive first person pronoun	PPX1	P+PER+SING+1+X	pronouns
singular second person reflexive pronoun	PPX1	P+PER+SING+2+X	pronouns
singular reflexive third person personal pronoun	PPX1	P+PER+SING+3+X	pronouns
plural reflexive first person pronoun	PPX2	P+PER+PLUR+1+X	pronouns
plural reflexive third person personal pronoun	PPX2	P+PER+PLUR+3+X	pronouns
second person pronoun	PPY	P+PER++2++	pronouns
adverb after nominal head	RA	R+A+++	adverbs
adverb introducing appositional constructions	REX	R+++++APP+	adverbs
degree adverb	RG	R+++++DEG+	adverbs
wh-degree adverb	RGQ	R+G+WH++	WH-words
comparative degree adverb	RGR	R+++ER++DEG+	adverbs
comparative adverb (between aux. verb and main verb)	RGR	R+++ER+SPLT++	adverbs
superlative degree adverb	RGT	R+++EST++DEG+	adverbs
place marker adverb	RL	R+++++PL+	adverbs
locative adverb	RL	R+L+++++	adverbs
adverbial particle	RP	R+PART+++	adverbs
adverb	RR	R++++++	adverbs
WH-Adverb	RRQ	R++WH++	WH-words
wh-adverb as clause head	RRQ	R++WH+CLS+	WH-words
wh-adverb as question word	RRQ	R++WH+QUE+	WH-words
wh-adverb as extraposed clause head	RRQ	R+WH++CLS+++EXT	WH-words
WH-ever adverb	RRQV	R++WH+++EVER+	WH-words
comparative adverb	RRR	R+++ER+++	adverbs
superlative adverb	RRT	R+++EST+++	adverbs
quasi-nominal adverb of time	RT	R+T+++	adverbs
infinitive marker	TO	TO++++	to
to as extraposed clause head	TO	TO++++++EXT	to
interjection/filler	UH	UH++++	interjections
base form of be as auxiliary verb	VB0	VA+BF++++BE+	verbs
be as main verb	VB0	VL+0+++	verbs
being as a main verb	VBG	VL+ING+++++	verbs
being as an auxiliary verb	VBG	VA+ING++++BE+	verbs
being as a present progressive postnominal modifier	VBG	VL+ING++REL+NOUN++	verbs
be as an infinitive main verb	VBI	VL+INF++++BE+	verbs
contracted am as auxiliary verb	VBM	VA+AM+++CONT+BE+	verbs
contracted am as main verb	VBM	VL+AM+++CONT++	verbs
am as main verb	VBM	VL+M+++	verbs
been as an auxiliary verb	VBN	VA+EN++BE+++	verbs
been as a main verb	VBN	VL+EN+++++	verbs
contracted are as auxiliary verb	VBR	VA++++CONT+BE+	verbs
are as main verb	VBR	VL+Z+++	verbs
contracted are as main verb	VBR	VL+Z+++CONT	verbs
is as an auxiliary verb	VBZ	VA+Z++++BE+	verbs
is as an auxiliary verb (contracted)	VBZ	VA+Z+++CONT+BE+	verbs
base form of do as auxiliary verb	VDD	VA+++++DO+	verbs
did as a main verb	VDD	VL+ED++++DO+	verbs
base form of do as main verb	VDD0	VL+BF++++DO+	verbs
doing as a present progressive verb	VDG	VL+ING+PRG++DO+	verbs
doing as a present progressive postnominal modifier	VDG	VL+ING++REL++DO+	verbs
do as an infinitive main verb	VDI	VL+INF+++DO+	verbs
does as a main verb	VDZ	VL+EN++++DO+	verbs
does as an auxiliary verb	VDZ	VA+Z++++DO+	verbs
base form of have as main verb	VH0	VL+BF++++HV+	verbs
contracted have as main verb	VH0	VL+0+++CONT	verbs
had as a past simple main verb	VHD	VL+ED++++HV+	verbs
having as a main verb	VHG	VL+ING++++HV+	verbs
have as an infinitive main verb	VHI	VL+ING++++HV+	verbs
had as a past participle main verb	VHN	VL+EN++++HV+	verbs
has as a main verb	VHZ	VL+Z++++HV+	verbs
necessity modal	VM	VM+++++NEC+	modals
necessity modal (contracted)	VM	VM++++CONT+NEC+	modals
possibility modal	VM	VM+++++POS+	modals
possibility modal (contracted)	VM	VM++++CONT+POS+	modals
prediction modal	VM	VM+++++PRDN+	modals
prediction modal (contracted)	VM	VM++++CONT+PRDN+	modals
multi-word periphrastic necessity modal	VMK	VM+++++NEC+MULTI	modals
uninflected present tense and imperative base form of verb	VV0	VL+BF+++++	verbs
present progressive verb	VVG	VL+ING+PROG++++	verbs
present progressive postnominal modifier	VVG	VL+ING++REL+NOUN++	verbs
present participle verb (indeterminate function)	VVG	VL+ING+++++	verbs
infinitive as main verb	VVI	VL+INF+++++	verbs
infinitive of be as auxiliary verb	VVI	VA+INF++++BE+	verbs
past participle verb	VVN	VL+EN+++++	verbs
perfect aspect past participle verb	VVN	VL+EN+PRF++++	verbs
agentless passive verb	VVN	VL+EN+PSV+AGLS++	verbs
by passive verb	VVN	VL+EN+PSV+BY++	verbs
third person verb	VVZ	VL++Z++++	verbs
not	XX	XX++++	adverbs
contracted not	XX	XX++++CONT++	adverbs
punctuation	Y	Y++++	punctiation
ellipsis	YLIP	Y+ELIP+++	punctuation
full-stop/period	YSTP	Y+PER+++	punctuation
singular letter of the alphabet	ZZ1	ZZ++SING++	nouns
plural letter of the alphabet	ZZ2	ZZ++PLUR++	nouns
percent sign		Y+PCT+++	punctuation
multi-word coordinating conjunction		C+C++++MULTI+	coordinators
multi-word subordinating conjunction		C+S++++MULTI+	subordinators
causative subordinating conjunction		C+S+CAUS++	subordinators
double conjunction		C++DOUB++++	determiners
preposition from a prepositional verb		I+PPVB+++	prepositions
singular past participle noun		N+EN+SING++++	nouns
singular nominalization		N++SING+NOM++	nouns
plural nominalization		N++PLUR+NOM+	nouns
singular titular noun		N+PR+SING+TIT++	nouns
singular present participle noun		N+ING+SING++++	nouns
multi-word adverb		R++++++MULTI	adverbs
negative adverb		R+++++NEG+	adverbs
adverb (between aux. verb and main verb)		R++++SPLT++	adverbs
adverbial particle (between aux. verb and main verb)		R+PART+++SPLT++	adverbs
adverb conjunct		R+CJ+++	adverbs
conjunctive adverb (between aux. verb and main verb)		R+CJ+++SPLT++	adverbs
discourse particle		R++DSPT++	adverbs
discourse particle (between aux. verb and main verb)		R++DSPT+SPLT+	adverbs
phrasal verb adverb		R+PART+PHRV++	adverbs
place marker adverb (between aux. verb and main verb)		R++++SPLT+PL+	adverbs
time marker adverb		R+TM+++	adverbs
time marker adverb (between aux. verb and main verb)		R+TM++SPLT+	adverbs
comparative time marker adverb		R+TM+R++	adverbs
comparative time marker adverb (between aux. verb and main verb)		R+TM+R+SPLT+	adverbs
place marker adverbial particle		R+PART++++PL+	adverbs
place marker adverbial particle (between aux. verb and main verb)		R+PART+++SPLT+PL+	adverbs
that as complementizer in adjective complement clause		C+S+COMP+THT+ADJ+	that
that as complementizer in noun complement clause		C+S+COMP+THT+NOUN+	that
that as relativizer in relative clause		C+S+REL+++	that
that as relativizer in object-gap relative clause		C+S+REL+OBJ++	that
that as relativizer in subject-gap relative clause		C+S+REL+SUB++	that
that as complementizer in verb complement clause		C+S+COMP+THT+VERB+	that
had as an auxiliary verb		VA+ED++++HV+
WH-relativizer		P+WH+REL+OBJ++	WH-words
WH-relativizer with object gap and prepositional fronting (pied piping)		P+WH+REL+OBJ+PIED+	WH-words
WH-relativizer with subject gap		P+WH+REL+SUB++	WH-words
WH-pronoun		P++WH+++	WH-words
conditional subordinating conjunction as extraposed clause head		C+S+CND+CLS+++EXT	subordinators
WH subordinating conjunction as extraposed clause head		C+S+WH+CLS+++EXT
WH-determiner as extraposed clause head		D++WH+CLS+++EXT	WH-words
plural indefinite article		D+AT+PLUR+INDEF+++	determiners
plural indefnite article with pronominal function		D+AT+PLUR+INDEF+PRO++
WH-determiner as extraposed clause head		D+WH++CLS+EX	WH-words
past participle adjective		J+EN+++++	adjectives
present participle adjective		J+ING+++++	adjectives
number		M++++++	numbers
date		M+T+++++	numbers
pre-modifying noun		N+++++PMOD++	nouns
plural past participle noun		N+EN+PLUR++++	nouns
plural present participle noun		N+ING+PLUR++++
WH-pronoun as extraposed clause head		P++WH+CLS+++EXT	WH-words
dummy it		P+IM++3+EXT	pronouns
singular reflexive third person impersonal pronoun		P+IMPR+SING+3+X
multi-word adverb (between aux. verb and main verb)		R++++SPLT++MULTI	adverbs
WH Adverb as subordinator		R+S+WH+CLS+++	subordinators
that as a extraposed clause head		THT+++CLS+++EXT	that
are as an auxilliary verb		VA+++++BE+
am as an auxilliary verb		VA+AM++++BE+
be as a auxiliary verb		VA+BF++++BE+
contracted have as an auxilliary verb		VA+BF+++CONT+HV+
was or were as an auxilliary verb		VA+ED++++BE+
contracted had as an auxilliary verb		VA+ED+++CONT+HV+
done as an auxiliary verb		VA+EN++++DO+
infinitive of do as an auxiliary verb		VA+INF++++DO+
infinitive of have as an auxilliary verb		VA+INF++++HV+
doing as an auxiliary verb		VA+ING++++DO+
having as an auxilliary verb		VA+ING++++HV+
has as an auxilliary verb		VA+Z++++HV+
contracted has as an auxilliary verb		VA+Z+++CONT+HV
base form of copular verb		VL+++COP+++	verbs
past participle verb as adverbial clause		VL+EN++CLS+ADVL++
past participle of copular verb		VL+EN++COP+++
perfect aspect copular verb		VL+EN+PERF+COP+++
infinitive of copular verb		VL+INF++COP+++
doing as main verb		VL+ING++++DO+
present participle verb in adverbial clause		VL+ING++CLS+ADVL++
extraposed ing participle clause		VL+ING++COMP+++EXT
present participle as object of prepositional phrase		VL+ING++COMP+PREP++
present participle verb complement clause		VL+ING++COMP+VERB++
present participle copular verb		VL+ING++COP+++
present progressive copular verb		VL+ING+PROG+COP+++
third person copular verb		VL+Z++COP+++
present-tense semi-modal		VM++++++MULTI	modals
no tense semi-modal		VM++NO++++MULTI	modals
third-person semi-modal		VM++Z+++MULTI
past-tense semi-modal		VM+ED+++++MULTI	modals
left bracket		Y+BRAC+LEFT++	punctuation
right bracket		Y+BRAC+RIGHT++	punctuation
dollar sign		Y+DOL+++	punctuation
hashtag?		Y+HASH+++++

You can download the table here.

2 Precision and Recall

After we tagged our texts, it is important to check the precision and recall of our tags to make sure that we are counting what we think we are counting. In the following text, for example, how many noun (NN) tags do we have? and how many are not accurate?

ï..Word	Tag	Lemma
This	DT	this
study	NN	study
uses	NN	use
F-MRI	NP	MRI
(	(	(
functional	NN	functional
magnetic	NN	magnetic
resonance	NN	resonance
imaging	NN	imaging
)	)	)
to	TO	to
investigate	VV	investigate
the	DT	the
brain	NN	brain
activity	NN	activity
in	IN	in
a	DT	a
set	NN	set
of	IN	of
cortical	NN	cortical
areas	NNS	area
in	IN	in
the	DT	the
task	NN	task
of	IN	of
main	JJ	main
idea	NN	idea
identification	NN	identification
,	,	,
when	WRB	when
the	DT	the
topic	NN	topic
sentence	NN	sentence
was	VBD	be
presented	VVN	present
in	IN	in
first	JJ	first
versus	CC	versus
in	IN	in
last	JJ	last
position	NN	position
in	IN	in
a	DT	a
three-sentence	NN	three-sentence
paragraph	NN	paragraph
.	SENT	.
The	DT	the
participants	NNS	participant
were	VBD	be
eight	CD	eight
right-handed	JJ	right-handed
undergraduate	JJ	undergraduate
students	NNS	student
from	IN	from
Carnegie	NP	Carnegie
Mellon	NP	Mellon
University	NP	University
,	,	,
six	CD	six
male	NN	male
and	CC	and
2	CD	@card@
female	NN	female
,	,	,
all	DT	all
native	JJ	native
speakers	NNS	speaker
of	IN	of
English	NP	English
.	SENT	.
Each	DT	each
participant	NN	participant
read	VVD	read
twelve	NN	twelve
paragraphs	NNS	paragraph
,	,	,
six	CD	six
in	IN	in
which	WDT	which
the	DT	the
topic	NN	topic
sentence	NN	sentence
was	VBD	be
paragraph	NN	paragraph
initial	JJ	initial
and	CC	and
six	CD	six
in	IN	in
which	WDT	which
it	PP	it
was	VBD	be
paragraph	NN	paragraph
final	JJ	final
,	,	,
and	CC	and
each	DT	each
paragraph	NN	paragraph
was	VBD	be
presented	VVN	present
word	NN	word
by	IN	by
word	NN	word
in	IN	in
the	DT	the
center	NN	center
of	IN	of
a	DT	a
screen	NN	screen
,	,	,
inside	IN	inside
the	DT	the
scanner	NN	scanner
.	SENT	.
The	DT	the
major	JJ	major
finding	NN	finding
of	IN	of
the	DT	the
current	JJ	current
study	NN	study
is	VBZ	be
the	DT	the
differential	JJ	differential
response	NN	response
observed	VVN	observe
in	IN	in
the	DT	the
left	JJ	left
and	CC	and
right	JJ	right
hemispheres	NNS	hemisphere
as	RB	as
to	TO	to
the	DT	the
location	NN	location
of	IN	of
the	DT	the
topic	NN	topic
sentence	NN	sentence
within	IN	within
the	DT	the
paragraph	NN	paragraph
.	SENT	.
The	DT	the
left	JJ	left
temporal	JJ	temporal
region	NN	region
showed	VVD	show
greater	JJR	great
activation	NN	activation
when	WRB	when
the	DT	the
topic	NN	topic
sentence	NN	sentence
was	VBD	be
in	IN	in
final	JJ	final
position	NN	position
than	IN	than
in	IN	in
initial	JJ	initial
position	NN	position
.	SENT	.
The	DT	the
right	JJ	right
temporal	JJ	temporal
region	NN	region
,	,	,
on	IN	on
the	DT	the
other	JJ	other
hand	NN	hand
,	,	,
was	VBD	be
affected	VVN	affect
only	RB	only
by	IN	by
sentence	NN	sentence
type	NN	type
,	,	,
showing	VVG	show
a	DT	a
greater	JJR	great
response	NN	response
to	TO	to
topic	NN	topic
sentences	NNS	sentence
than	IN	than
support	NN	support
sentences	NNS	sentence
,	,	,
regardless	RB	regardless
of	IN	of
their	PP$	their
location	NN	location
within	IN	within
the	DT	the
paragraph	NN	paragraph
.	SENT	.

2.1 Calculating precision and recall

Precision is the percentage of the tags that are correct. Recall is the percentage of the tags that were correctly identified. Confusing? I find the following image helpful when trying to understand the idea of precision and recall.

We calculate precision and recall by using the following formula.

For example, if we search for NN here is how we would count true/false positives and true/false negatives.

Original.Tag	Correct.Tag	Count
NN	RB	False Positive
NN	NN	True Positive
RB	NN	Fase Negative
NN	NN	True Positive
RB	RB	True Negative
RB	RB	True Negative

3 Tool to check precision

The New Biber Tag Checking Tool simply helps us search for one tag at a time. When we find error in the tag and fix, it gives us the precision for that specific tag.

We need to decide on a tag to search for, the default search is set to PMOD right now, but you can change to any tag in the tagset.

Input a tagged text

Check if the highlighted words are correctly tagged

All the words that are highlighted contain the tag you input in your search.

Fix the tags that are wrong

In this example, I deleted the PMOD.

Check precision

Once you start making changes in the tags, this will appear on the top of your screen.

Save the tag checked file

If you want to continue searching for the same tag just input another file, or if you want to search for another tag click on Clear Tool

In this workshop I am showing the tool online, but you can download the html file from github and run it offline too.

Bonus: if you are taking INF502 or you want to practice your Git skills, you can clone the repo to your computer, link here

4 Tool to check precision and recall

The Biber Tag Checking Tool looks the same as the previous tool, but the difference is that you can check the precision for every token in a text, this way you can check for both precision and recall.

Input a tagged text

Click on the first word in the text

Use the << and >> menus to move to the next token

Use the dropdown menus to fix incorrect tags

Check the new tag

When you make changes to a tag the new Biber Tag will appear

Track your progress

You can check the percentage of the text that you have tag checked by looking at the token # at the top of the page.

Save your tag checked file

Precision and Recall

To calculate precision and recall after tag checking, you can open your file with excel and compare the column with the original tags and the edited ones. It will look like this:

ï..Column1	Column2	Column3	Column4	Column5
Token	Original_Biber_Tags	Original_Claws_Tag	New_Biber_Tags	Comments
The	DET+AT+SING+DEF+++	AT	++++++
publication	N+CM+SING+NOM+++	NN1	P+COM+PLUR+INDEF++B+MULTI1
of	I++++++	IO	++++++
The	DET+AT+SING+DEF+++	AT	++++++
Curve	N+PR+SING++++	NN1	++++++
concept	N+CM+SING++++	NN1	++++++
of	I++++++	IO	++++++
intelligence	N+CM+SING+NOM+++	NN1	++++++
has	VL+Z+++++	VHZ	DET+CM+++++
been	VL+EN+PRF+COP+++	VBN	++++++
the	DET+AT+SING+DEF+++	AT	++++++
pariah	N+CM+SING++++	NN1	++++++
expressing	VL+ING++COMP+PREP++	VVG	++++++
a	D+AT+SING+INDEF+++	AT1	++++++
person	N+CM+SING++++	NN1	DET+COL+RIGHT++++
’s	GE++++++	GE	++++++
intelligence	N+CM+SING+NOM+++	NN1	++++++
intelligence	++++++		++++++
Especially	R++++++	RR	++++++
right’	J++ATRB++++	JJ	++++++
(	Y+PAR+LEFT++++	(	++++++
Miele	N+PR+SING++++	NP1	EX+DOL+PPVB++++
1995	M+CARD+SING+MORE+++	MC	++++++
)	Y+PAR+RIGHT++++	)	++++++

This is just an example, the tags are not correct and I cut a part of the text

5 Precision and Recall on Excel

Looking at the results of the table below, we can calculate precision and recall by marking the ones that are true positive, false positives and false negatives.

ï..Original_Biber_Tags	New_Biber_Tags	Comments
DET+AT+SING+DEF+++
N+CM+SING+NOM+++	P+COM+PLUR+INDEF++B+MULTI1	False Positive
I++++++	++++++
DET+AT+SING+DEF+++	++++++
N+PR+SING+++PMOD+	JJ+COM+++++	False Positive
N+PR+SING++++	JJ+COM+++++	False Positive
I++++++	++++++
M+CARD+SING+MORE+++	++++++
Y+COM+++++	++++++
DET+AT+SING+DEF+++	++++++
N+CM+PLUR++++	JJ+COM+++++	False positive
P+IM++3+++	++++++
VL+ED+++++	++++++
C+CRD+++++	++++++
N+CM+PLUR++++	++++++	True Positive
P+IM++3+++	++++++
VL+ED+++++	++++++
VL+BF+++++	++++++
VL+EN+PRF++++	++++++
D+AT+SING+INDEF+++	++++++
J++ATRB++++	N+CM+++++	False Negative
N+CM+SING++++	++++++	True Positive
I++++++	++++++
DET+AT+SING+DEF+++	++++++
N+CM+PLUR+++TIME+	++++++	True Positive
I++++++	++++++
D+IMPR++3+V++	++++++
N+CM+SING+NOM+++	++++++	True Positive
Y+PAR+LEFT++++	++++++
N+PR+SING++++	++++++	True Positive
M+CARD+SING+MORE+++	++++++
Y+PAR+RIGHT++++	++++++
++++++	++++++
D++SING++++	++++++
VL+Z+++++	++++++
R++++++	++++++
I++++++	++++++
D+IMPR++3+V++	++++++
J+EN+ATRB++++	N+CM+++++	False Negative
J++ATRB++++	++++++
N+CM+PLUR+NOM+++	JJ+COM+++++	False Positive
Y+COM+++++	++++++
VL+ED+++++	++++++
N+CM+PLUR+NOM+++	++++++	True Positive
I++++++	++++++
J++ATRB++++	++++++
N+CM+SING++++	++++++
Y+COM+++++	++++++
C+CRD+++++	++++++

If I am interested in the precision and recall of the tag for nouns, I have the following results:

ï..Precision	N	Recall	N.1
True positives	6	True Positive	6
False positives	5	False Negative	2
All nouns identified	11	Real N of nouns	8
Precision	True Positives/All nouns id	Recall	True Positives/Real N of nouns
Precision	0.545454545	Recall	0.75

6 Practice

Select five tags from the list below. You can also choose a tag that you know will be important for your project

Tag	Key
C+SRD+THT+COMP+VERB	Verb complement clause
CND+	Conditionals
WH+CLS	Wh-Clause
I+STR+++	Stranded preposition
ATRB	Atributive Adjectives
C+SRD+THT+REL+	That relative clauses
WH+	Wh-Word
EX+	Existential There
VM+	Modal verbs
N+	Nouns

Group 1 - Precision

Check the precision of these features for the first 200 words of 5 texts.

Group 2 - Precision and Recall

Check the precision and recall of these features for the first 200 word of 5 texts

Report the results here

Tip:

The tool does not work if you add extra +++, therefore, if you are looking for nouns you want to do just N+ and not N+++++.

If you find issues with the tool or the tagset while doing your homework, document then here.

Tag Checking: Why? How?

Larissa Goulart