All About Readability
By Cheryl Stephens
Why Are We Looking at Readability Tests?
The use of readability tests in the plain language process is
a controversial topic. Now that readability scores are easy to
obtain by using computerized grammar and style checking software
programs, there is new pressure to adopt them. While some people
use readability tests to help them make their writing plainer,
other people are fervently opposed to their use.
For example, ten years ago the International Reading
Association and the U.S. National Council of Teachers of English
were advising members against uncritical use of readability tests
to assess educational materials. At about the same time, two
government reports in England validated the accuracy and
reliability of the tests. Some of the disputes about readability
tests arise because people make use of them for different
purposes and different purposes from those that lay behind the
development of the tests.
We want to look at the original reasons for development of
readability tests, the historical development of the tests and
the purposes to which they are now put. From there, we can
discuss how they ought to be used in the plain language
What is Readability
Readability describes the ease with which a document can be
read. Readability tests, which are mathematical formulas, were
designed to assess the suitability of books for students at
particular grade levels or ages.
The tests were intended to help educators, librarians and
publishers make decisions about purchase and sale of books. They
were also meant to save time - because before the formula were
used those decisions were made on recommendations of educators
and librarians who read the books. These people were taking books
already written and figuring out who were the appropriate reading
Webster's defines "readable" as:
Fit to be read
Agreeable and attractive in style
Obviously, readability formulas cannot measure features like
interest and enjoyment. Also, when we ask whether text is
understood by its reader we are questioning is
"comprehensibility". Readability formulas cannot
measure how comprehensible a text is. And they cannot measure
whether a text is suitable for particular readers needs.
A Brief Historical Overview
The First Formulas
Readability formulas were first developed in the 1920s in the
United States. From the earliest efforts to today, readability
tests have been designed as mathematical equations which
correlate measurable elements of writing - such as the number of
personal pronouns in the text, the average number of syllables in
words or number of words in sentences in the text.
Factors like these are usually described as
"semantic" if they concern the word used and
"syntactic" if they concern the length or structure of
sentences. Both semantic and syntactic elements are surface-level
features of the text, and do not take into account any the nature
of the topic or the characteristics of the readers.
Designers of one early formula began with 289 elements of
content, style of expression and presentation, format and
organization and reduced them down to the 5 style factors which
could be counted most reliably and would be most relevant to the
needs of adults with limited reading skills. Four of the fhe
Number of personal pronouns
Average number of words in a sentence
Percentage of different words
Number of prepositional phrases
How and Why Were They Developed
The very first readability study was a response to demands by
junior high school science teachers to provide them with books
which let them teach scientific facts and methods rather than get
bogged down in teaching the science vocabulary necessary to
understand the texts. The earliest investigations of readability
were conducted by asking students, librarians, and teachers what
seemed to make texts readable.
The publication in 1921 of The Teacher's Word
Book by Thorndike provided a means for measuring
the difficulty of words and permitted the development of
mathematical formula. Thorndike tabulated words according to the
frequency of their use in general literature. Later other word
lists and reading lessons were adopted to measure word
difficulty. It was assumed that words that were encountered
frequently by readers were less difficult to understand than
words that were appeared rarely. Familiarity breeds
understanding. There is some soundness to this. There are today
more than 490,000 words in the English language and another
300,000 technical terms. It is unlikely that an individual will
use more than 60,000 and the average person probably encounters
between 5,000 and 10,000 words in a lifetime.
Readability Formulas Today
How Do They Work?
Readability formulas measure certain features of text which
can be subjected to mathematical calculations. Not all features
that promote readability can be measured mathematically. And
these mathematical equations cannot measure comprehension
directly. Readers can be questioned or tested on material they
have read and the material can be tested with formulas. The
readers success in understanding the material as measured on an
exam can be correlated to the readability score of the text
itself. This is one method to validate the formulas.
Other features of a document are just as important as word
length and sentences to determining reading ease. Other aspects
of language, sentence structure, and organization of ideas are
significant to comprehensions. Also physical aspects of the
document are important. These are type styles, layout, design,
use of graphics and so on.
Other features of clear writing are:
Use of language that is simple, direct, economic and
Omission of needless words.
Use of sentence structures that are evident and
Organization and structure of material in an orderly and
So readability formulas are considered to be predictions of
reading ease but not the only method for determining readability.
And they do not help us evaluate how well the reader will
understand the ideas in the text.
What Factors Do They Measure?
Today readability formulas are usually based on one semantic
factor (the difficulty of words) and one syntactic factor (the
difficulty of sentences). Studies have confirmed that the
inclusion of other factors in the formula contributes more work
than it improves the results. Put another way, counting more
things does not make the formula any more predictive of reading
ease but takes a lot more effort.
Words are either measured against a frequency list or are
measured according to their length in characters or syllables.
Sentences are measured for the average length in characters or
Graphs, Charts and Computer Functions
Readability tests can be performed manually by counting and
doing a mathematical calculation, or be referring to a chart or
graph. Readability tests can be performed by computer. Most
grammar or editing software today can perform several readability
The Fog Index is computed this way:
- The total number of words is divided by the total number of
sentences to equal the average number of words per sentence.
- The number of words with more than 3 syllables is divided by
the total number of words to equal the Percentage of difficult
- Total these two figures (1 and 2) and multiply that total by
0.4. This figure is the Fog Index in years of education.
The Flesch Scale
The Flesch Reading Ease Scale is the most widely used formula
outside of educational circles. It is the easiest formula to use,
and it makes adjustments for the higher end of the scale. It
measures reading from 100 (for easy to read) to 0 (for very
difficult to read). A zero score indicates text has more than 37
words on the average in each sentence and the average word is
more than 2 syllables. Flesch has identified a "65" as
the Plain English Score. In response to demand, Flesch also
provided an interpretation table to convert the scale to
estimated reading grade and estimated school grade completed.
In 1963 Fry published his readability graph which was easier
than manual computations. The graph was revised in 1977 and then
became the most widely used formula. A hand-held calculator was
developed to do the Fry test, and now it is incorporated in
Also in 1963, the first computerized readability formula was
developed and many others have been devised since. Some computer
formulas are based on characters per word and characters per
sentence while others measure syllables. The difference between
computerized measures today depend on the developers decisions
about how to measure sentences or words. For example, some
programs treat a period, colon, or semi-colon as the sign of the
end of a "sentence". This is in keeping with some
research which concludes that the sentence is not the unit for
measure. Rather the "sousphrase" which we might
consider to be a clause represents the unit of thought for
measure because it is the cognitive decoding unit.
Today most grammar software programmes provide more than one
readability measure as well as comparisons to well-known writing.
In addition to word, sentence and paragraph statistics, Grammatik
IV gives the Flesch Readability Scale, Gunning's Fog Index in
years of education, and the Flesch-Kincaid Reading Grade Level.
In addition to a qualitative assessment of the writing,
Stylewriter, a plain-English editorial program, provides word and
sentence statistics with an index percentage of the passive verbs
used as well as a count words in various categories: complex,
jargon, abstract, legal, tautologies and so on.
What is Cloze Procedure?
The "cloze" procedure for testing your writing is
often treated as a readability test because a formula exists for
translating the data from "cloze tests" into numerical
results. The name "Cloze" comes from the word
"closure". In this procedure, words are deleted from
the text and readers are asked to fill in the blanks. By
constructing the meaning from the available words and completing
the text, the reader achieves "closure". (elaboration
In 1953 the "cloze procedure" was developed and
later, after 1965, formulas were developed for its use. It became
a popular method for measuring the suitability of text for a
particular audience. It was popular because its scoring was
objective; it was easy to use and analyze; it used the text
itself for analysis; and it yields high correlations to other
The cloze technique does not predict whether the materials is
comprehensible; it is an actual try-out of the material. It tells
you whether a particular audience group can comprehend the
writing well enough to complete the cloze test.
Cloze procedure consists of deleting words in a text and
asking the reader to fill in the appropriate or a similar word.
Usually every fifth word is deleted. Cloze is thought to offer a
better index of comprehensibility than the statistical formulas.
The ability to identify the missing word or to insert a
satisfactory substitute for the original word indicates that the
reader comprehends the content of the text.
Close testing has been called a "rubber yardstick"
because Cloze scores reflect both the difficulty of the text and
the readers abilities or resources. Like any readability test,
the problem arises over what is considered a successful
completion of the text: inserting 50% of missing words, 75% or
100%. Today educators recognize that cloze procedures are more
suitable to assess readers' abilities than to measure the
readability of text. Critics have pointed out that cloze can
operate on the basis of measuring redundancy -- that in some
texts it measures the number of redundant words rather than
In particular, critics suggest that Cloze is inappropriate for
measuring text or reader's abilities in languages other than
their native language. The results of close testing reflect the
reader's basic intuition about the structure and vocabulary
of the target language -- and that does not exist for the
Cloze testing is widely used now to assess the abilities of
readers, but is usually combined with other tests measuring
grammar skills and writing ability. One educator comments:
"The underlying assumption in cloze testing is
that a close relationship exists between reading comprehension
and writing skill. The test measures the student's ability to
select appropriate words if occasional gaps occur in a passage,
based on their ability to infer meaning from context and cultural
experience. The word cloze is related to the concept of closure,
the human tendency to complete a partly finished pattern, to pick
out key words and rely on language repetition in English
discourse. The theory originated in Gestalt psychology and
assumes that in figuring out the missing word, the mind goes
through a process of sampling, predicting, testing, and
confirming the appropriate word choice. The argument is that this
process involves both recognition skills (required in discrete
formal testing) and the production of a significant content
(required in written passages). In theory at least, the cloze
test is an integrated rather than a formal test, but the
advantage is that it can be marked efficiently and
objectively." ("Assessment Report, Communications
Discipline", by Roslyn Dixon, Communications Assessment
Coordinator, Douglas College, June 1, 1989)
One critic discussed Cloze in the context of it use in
languages other than English:
"There is controversy regarding the use of cloze
procedure in determining the readability of written materials.
This controversy is based on the fact that cloze is a subjective
evaluation that mirrors the language ability and background of
information of the person taking the test. Also, some researchers
feel that multiple cloze passages should be de4veloped from each
piece of material for the results to be valid. For example, a
test deleting every fifth word should be prepared in five
versions, omitting a different word each time. Though these views
are shared by other countries, for want of a better technique,
cloze procedure is widely used." (Annette T. Rabin,
"Determining Difficulty Levels of Text Written in Languages
Other than English" in Zakaluk and Samuels,
Should You Use Readability Formulas?
Some say that readability formulas measure word length or
frequency and sentence length. In using the formulas we accept
that these features affect readability in a significant way.
Yet it can be argued that long sentences and difficult words
are merely signals that the text is not written for ease of
understanding. Some say difficult text often contains difficult
words because it discusses abstract ideas while easy text uses
common words because it discusses concrete experiences. Choosing
smaller words and shorter sentences may not be as much help as
reconstructing the sentences and using familiar vocabulary.
The Delegates Assembly of the International Reading
Association resolved against using grade-level scores in
1981. And the (U.S.) National Council of Teachers of English
advise against uncritical use of readability formulas in
assessing text for school use. After 1981, the College Entrance
Examination Board decided not to use grade-level measures to
ascertain reading abilities of college applicants.
In recent years, researchers have emphasized that readability
tests can only measure the surface characteristics of text.
Qualitative factors like vocabulary difficulty, composition,
sentence structure, concreteness and abstractness, obscurity and
incoherence can not be measured mathematically. They have pointed
out that material which receives a low-grade level score may be
incomprehensible to the target audience. As an example, they
suggest that you consider what happens if you scramble the words
in a sentence, or on a larger scale, randomly rearranged the
sentences in a whole text. The readability score could be low,
but comprehension would be lacking.
example: Fall Humpty had Dumpty great a.
Things They Can Do
- Their primary advantage is they can serve as an early warning
system to let the writer know that the writing is too dense. They
can give a quick, on-the-spot assessment. They have been
described as "screening devices" to eliminate dense
drafts and give rise to revisions or substitutions.
- In some organizational settings, readability tests are
considered useful to show measurable improvement in written
documents. They provide a quantifiable measure of improvement or
Things They Can't Tell You and Why
How complex the ideas are
Whether or not the content is in a logical order
Whether the vocabulary is appropriate for the audience
Whether there is a gender, class or cultural bias
Whether the design is attractive and helps or hinders the
Whether the material appears in a form and type style that is
easy or hard to read
Because the readability formula are based on measuring words
and sentences, they cannot take into account the variety of
resources available to different readers. Reader resources are
word recognition skills, interest in the subject, and prior
knowledge of the topic. The formula cannot measure the
circumstances in which the reader will be using the text or form
- both the psychological and the physical situations. The formula
cannot adjust for the needs of people for whom the text is
written in a second or additional language.
Studies have shown that readability, interest and prior
knowledge in the reader are equally important factors in
comprehension and retention of information. The ease of reading
that the reader experiences is also directly influenced by the
writer's use of physical, syntactic, semantic and contextual
cues which cannot be measured by these tests. Such clues include
the use of personal pronouns, the lay-out and design of the text,
the typography (use of highlighting and italics, etc), the use of
signal words (now, then, but, later) and so on.
Readability tests cannot tell you whether the information in
the text is written in a way to interest the reader, nor can they
tell you whether reader has sufficient background information to
appreciate the new information provided in the text.
How to Use Readability Tests
Researchers have been critical of using readability tests on
readers of an additional language. They point out that these
tests cannot take into account that we mentally process our first
language differently than we do additional languages we have
acquired. Therefore a reader does not approach the text with the
same or similar intuition for the language existing among native
users. This is important when using cloze testing on text
intended for people reading in an additional language. It is also
significant when designing the testing groups for cloze tests or
try-outs of the material. A population which meets the same
criteria for first language must be used to accurately assess the
readability of material written in a second or additional
Keep the readability formula out of the writing process
Follow other guidelines to writing. If you like to work with
guidelines in checklists, use the Document Design Centre's
Guidelines, the CBA/CBA Guidelines, the CLIC Red Alert Editing
System or Fry's Writeability Checklist.
Use the Formulas for Feedback Only:
Apply the formula
Remember that the readability test is only a screen and offers
only a prediction. Remember that the score is only a prediction
that the text is suitable for a particular reading grade.
Remember that the formulas do not take into account other
features which contribute to comprehension so they may
underestimate or overestimate the suitability of the
Bear in mind that at higher grade levels the scores are not
reliable because background and content knowledge become more
significant than style variables.
Consider again the purpose of the text. Material which is
intended for training readers can be more challenging to their
resources than material whose purpose is to inform or entertain.
As well, higher motivation in the readers may keep them reading
challenging material which they might otherwise abandon out of
Pick a formula that works best for you and for the task at
hand. Choose one that is easy to use. It should contain two
variables whether words and sentences or characters per sentence
and characters per word. For significant projects, use more than
one test and expect slightly different grade level scores.
Test a large sample of the text or the whole text if using a
computer program. By hand, test at least 3 sections of 100 words
to arrive at an average score. Be cautious of doing so if there
are great differences between sections of the text.
Combine the Use of Formula with Other Methods of Testing
There are other methods for assessing text for suitability for
readers. You can devise a document audit instrument which takes
into account other characteristics that formulas cannot predict.
Prepare a questionnaire to review with the document to seek out
features known to make reading easier.
Or use experts. In education it is common to use teachers and
librarians to review material and assign an appropriate grade
level for the use of the text. In other fields, find experts who
will know the needs and characteristics of your audience and get
their expert opinions.
Or use "protocol-aided revisions" as a method. These
are "try-outs" on individuals or small groups who match
your audience's key characteristics. Formal testing with
focus-groups is often beyond the budget and capabilities of those
preparing materials. But informal, or casual, testing of
materials with readers is very effective even on a small
Readability formula are not guides to writing well. The notion
of "writing to formula" has been condemned by formula
designers from the beginning. They call it "cheating"
and compare it to holding a match under a thermometer to warm a
room. Klare has said that formulas can play a useful screening
role in the prediction of readability, where only index variables
in language are needed. But formulas cannot be used in the
production of readable writing, because index variables are
insufficient for the purpose. For producing readable writing more
variables must be considered in both the text and the reader.
(Klare, A Second Look at the validity of Readability
Formulas Journal of Reading Behaviour,
1976, 8 129-152, and present reference)
Readability: It's Past, Present, &
Future Beverly L. Zakaluk and S. Jay Samuels,
editors, published by the International Reading Association,
Newark, Delaware, 1988
Small Claims Court Materials: Can They Be Read?
Can They Be Understood? by Richard Darville and
Marilyn Hiebert Canadian Law Information Council, CLIC Papers on
PLEI, no. 7, 1985
© 2000 Cheryl Stephens All rights
Back to Cheryl Stephens main page.