Author: Larissa Goulart, Northern Arizona University (lg845@nau.edu)
Tested with Antconc 3.5.8 (Windows)
This is a step-by-step guide on how to use Antconc to conduct your own corpus study. Antconc is a freely available software developed by Laurence Anthony. In order to follow the workshop, you will need to download:
     1. Antconc
Download Antconc from the AntLab Website
     2. Practice files
For this workshop we will use ten files from the LILE Corpus (Sarmento, Scortegagna & Goulart, 2011). This corpus contains abstracts of Thesis and Dissertations written in the fields of Literature and Applied Linguistics. You will also use twenty files from the BAWE corpus You can download the files we will use here.
     To prepare your files
If you are following this tutorial with your own corpus, you might need to convert and fix the encoding of your files. Here are two suggestions on how to do this:
     3a. Corpus Text Processor from CROW
If you need to convert and standardize your text, we can use the Corpus Text Processor. You can read more about it here
     3b. AntLab Tools
If you need to convert files from Word or PDF to .txt, download the AntConverter, and to make sure your files are in the correct encoding, download EncodeAnt
For both programs, Antconc and the Corpus Text Processor, remember to download the correct tool for your computer’s operational system.
Corpus analysis is a form of text analysis, which allows you to compare the use of specific linguistic features between texts (or collections of texts). In this workshop, we will:
      1.Prepare your files
      2.Use the Antconc tool
In this section, I will explain how to use both the Corpus Text Processor from CROW and the AntLab Tools. You only need to do this if you are using your own corpus and your files are saved as PDF or Word document.
If you are using CROW’s Corpus Text Processor to prepare your files. I recommend that you create three folders in your computer that mirror the menus in the Corpus Text Processor: 01_converted, 02_encoded, and 03_standardized before running the tool. It should look like this.
And this is what you will see when you open the Corpus Text Processor
To convert the files, we should follow the order of the menus in the Corpus Text Processor. You can also watch the video demonstration on how to run it.
As an alternative you can use AntConverter and AntEncoder to convert and encode your files.
File> Open Dir..
File> Open File(s)/Open Dir..
In this section, I will explain how to load your corpus on Antconc and how to use the following Antconc tools:
Antconc is an executable file, so you don’t need to install anything. Save the Antconc .exe on your Desktop, or somewhere where you will easily find it.
This is how Antconc will look like when you open it.
To load your texts into Antconc, you will do File -> Open Dir..
and then navigate to the folder where you saved the Practice Files.
Remember to only add the files that are in the directory called LILE_Corpus
Look at the Corpus File window, and make sure you have the right number of files.
We will use the concordance tool to view Key Words in Context (KWIC). In the search box, we will type a word and click start. The concordance window will then show you every occurrences of that word.
If we search for the, this is what the output will look like:
Now, I can use the Kwic sort to sort the words to the right or to the left of my search word. In my example case, I am sorting using the first word to the left, then the second word to the right, and the third word to the right. Remember to click sort.
Right now, my search is not case sensitive, but you can click on the button case
to make your search case sensitive. You could then find The or the.
Let’s try using Wildcards in our search now. If you go to Global Settings -> Wildcards
you can see all the options available.
If I search use+, I will get all instances of used, uses and use.
If I search stud*, I will get all occurrences of words that start with stud.
I could also use |, which stands for an or, as an example, I can do is|be|are|was|were|been|being to search for all the forms of the verb to be in the corpus.
Now, let’s say you want to do more complex searches, such as all the words ending in ed. Then, you need to use regex in your search.
First, remember to click the regex button in the concordance window. Then, you can use regex to search only words that end in ed. In my example, I used \b[a-z]+?ed\b
.
If you don’t know regex, you can use this reference guide or this cheatsheet. You can also use regex searches to search parts-of-speech in a tagged corpus.
Other things you can change in the concordance window:
- Change the window size: you can use this to show more characters in your results.
- Change the number of results: if you increase the number in show every Nth word Antconc will skip the number of lines you select.
The concordance plot window will show you where in the file your search word occurs.
This window will show you a full view of the text, so you can use this to see the results in context.
This tool allows you to find the words that occur frequently together. The first thing you have to do in this window is to check the N-grams
box in the control panel.
In this window, you can change the N-gram size from 2 to 10. You can also choose varied N-gram sizes to be printed in the same output. If you want to establish a minimum frequency and range you can can change this setting in the control panel by inputting a number on Min. Freq. and Min. Range. The results can be sorted by frequency, range, probability and word.
This tool allows you to see the words that frequently occur in the company of your search word. In the example, the word paper was used as the search word and in, the collocate window, we see the words that usually occur with paper 5 words to the left or 5 to the right.
You can change the search window to one word to the right or one word to the left using the Window Span in the control panel. You can also choose to sort the results by frequency, stats, word or word end. Similarly to the concordance window, you can use regex in the search for collocates.
Word list shows you all the words in your corpus sorted by frequency.
This tool allows you to compare the words in two corpora. In order to use it you will have to input a reference corpus. Remember that your reference corpus will determine your keywords.
To input your reference corpus, you will select Tool Preferences
, then go to category
and select Keyword List
. Then, you can enter the reference corpus in Add directory
In this window, you can choose either Chi-square or Log-likelihood as the method to calculate keyness. You can also set your p threshold. After you have selected your preferred method to calculate keyness and you have entered the directory with the reference corpus, click Load
.
Then click Apply
and start
.
To understand keyness better read Laurence Anthony’s readme file
You can save any of your results in .txt files by going to File
-> Save Output
then navigate to where you want to save your file.
Here is how your results file will look like for Clusters