Author: Larissa Goulart, Northern Arizona University ()
Tested with Antconc 3.5.8 (Windows)


Before we start

This is a step-by-step guide on how to use Antconc to conduct your own corpus study. Antconc is a freely available software developed by Laurence Anthony. In order to follow the workshop, you will need to download:

     1. Antconc

Download Antconc from the AntLab Website

     2. Practice files

For this workshop we will use ten files from the LILE Corpus (Sarmento, Scortegagna & Goulart, 2011). This corpus contains abstracts of Thesis and Dissertations written in the fields of Literature and Applied Linguistics. You will also use twenty files from the BAWE corpus You can download the files we will use here.

     To prepare your files

If you are following this tutorial with your own corpus, you might need to convert and fix the encoding of your files. Here are two suggestions on how to do this:

     3a. Corpus Text Processor from CROW

If you need to convert and standardize your text, we can use the Corpus Text Processor. You can read more about it here

     3b. AntLab Tools

If you need to convert files from Word or PDF to .txt, download the AntConverter, and to make sure your files are in the correct encoding, download EncodeAnt

For both programs, Antconc and the Corpus Text Processor, remember to download the correct tool for your computer’s operational system.


Overview

Corpus analysis is a form of text analysis, which allows you to compare the use of specific linguistic features between texts (or collections of texts). In this workshop, we will:

      1.Prepare your files

      2.Use the Antconc tool


1.Preparing the files

In this section, I will explain how to use both the Corpus Text Processor from CROW and the AntLab Tools. You only need to do this if you are using your own corpus and your files are saved as PDF or Word document.

1.1Corpus Text Processor

If you are using CROW’s Corpus Text Processor to prepare your files. I recommend that you create three folders in your computer that mirror the menus in the Corpus Text Processor: 01_converted, 02_encoded, and 03_standardized before running the tool. It should look like this.

And this is what you will see when you open the Corpus Text Processor

To convert the files, we should follow the order of the menus in the Corpus Text Processor. You can also watch the video demonstration on how to run it.

1.2 AntLab Tools

As an alternative you can use AntConverter and AntEncoder to convert and encode your files.

Converting

  • Open AntFileConverter
  • Select the directory where your files are.

File> Open Dir..

  • Press Start

Encoding

  • Open EncodeAnt

  • Select the directory where your files are.

File> Open File(s)/Open Dir..

  • Click on Convert to UTF-8

  • Press Start

2.Using the Antconc tool

In this section, I will explain how to load your corpus on Antconc and how to use the following Antconc tools:

  • Concordance
  • Concordance Plot
  • File View
  • Cluster/n-grams
  • Collocates
  • Word List
  • Keyword List

2.1 Anctonc’s interface

Antconc is an executable file, so you don’t need to install anything. Save the Antconc .exe on your Desktop, or somewhere where you will easily find it.

This is how Antconc will look like when you open it.

2.2 Loading Corpora

To load your texts into Antconc, you will do File -> Open Dir.. and then navigate to the folder where you saved the Practice Files.

Remember to only add the files that are in the directory called LILE_Corpus

Look at the Corpus File window, and make sure you have the right number of files.

2.3 Concordance

We will use the concordance tool to view Key Words in Context (KWIC). In the search box, we will type a word and click start. The concordance window will then show you every occurrences of that word.

If we search for the, this is what the output will look like:

Now, I can use the Kwic sort to sort the words to the right or to the left of my search word. In my example case, I am sorting using the first word to the left, then the second word to the right, and the third word to the right. Remember to click sort.

Right now, my search is not case sensitive, but you can click on the button case to make your search case sensitive. You could then find The or the.

Let’s try using Wildcards in our search now. If you go to Global Settings -> Wildcards you can see all the options available.

If I search use+, I will get all instances of used, uses and use.

If I search stud*, I will get all occurrences of words that start with stud.

I could also use |, which stands for an or, as an example, I can do is|be|are|was|were|been|being to search for all the forms of the verb to be in the corpus.

Now, let’s say you want to do more complex searches, such as all the words ending in ed. Then, you need to use regex in your search.

First, remember to click the regex button in the concordance window. Then, you can use regex to search only words that end in ed. In my example, I used \b[a-z]+?ed\b.

If you don’t know regex, you can use this reference guide or this cheatsheet. You can also use regex searches to search parts-of-speech in a tagged corpus.

Other things you can change in the concordance window:
- Change the window size: you can use this to show more characters in your results.
- Change the number of results: if you increase the number in show every Nth word Antconc will skip the number of lines you select.

2.4 Concordance Plot

The concordance plot window will show you where in the file your search word occurs.

2.5 File View

This window will show you a full view of the text, so you can use this to see the results in context.

2.6 Cluster/Ngrams

This tool allows you to find the words that occur frequently together. The first thing you have to do in this window is to check the N-grams box in the control panel.

In this window, you can change the N-gram size from 2 to 10. You can also choose varied N-gram sizes to be printed in the same output. If you want to establish a minimum frequency and range you can can change this setting in the control panel by inputting a number on Min. Freq. and Min. Range. The results can be sorted by frequency, range, probability and word.

2.7 Collocates

This tool allows you to see the words that frequently occur in the company of your search word. In the example, the word paper was used as the search word and in, the collocate window, we see the words that usually occur with paper 5 words to the left or 5 to the right.

You can change the search window to one word to the right or one word to the left using the Window Span in the control panel. You can also choose to sort the results by frequency, stats, word or word end. Similarly to the concordance window, you can use regex in the search for collocates.

2.8 Word List

Word list shows you all the words in your corpus sorted by frequency.

2.9 Keyword List

This tool allows you to compare the words in two corpora. In order to use it you will have to input a reference corpus. Remember that your reference corpus will determine your keywords.
To input your reference corpus, you will select Tool Preferences, then go to category and select Keyword List. Then, you can enter the reference corpus in Add directory

In this window, you can choose either Chi-square or Log-likelihood as the method to calculate keyness. You can also set your p threshold. After you have selected your preferred method to calculate keyness and you have entered the directory with the reference corpus, click Load.

Then click Apply and start.

To understand keyness better read Laurence Anthony’s readme file

2.10 Saving your output

You can save any of your results in .txt files by going to File -> Save Output then navigate to where you want to save your file.

Here is how your results file will look like for Clusters


Practice

  • Load the Practice Texts from BAWE.
  • Search for the word state
  • Can you tell which occurrences are nouns and which occurrences are verbs?
  • Sort your results by one word to the left.
  • Does this help you identify which occurrences are nouns and which are verbs?
  • Try searching for _state*_ and state+, how are the outputs different?
  • Using regex, search all the words ending in ing and then search for words that start with s and end with e
  • What are the collocates of -ed words?