Critical analysis of an international proficiency test – TOEFL

I.     Introduction

1.1.   Background information

A rapid expansion of technology and advancement in research has provided new opportunities in language acquisition. Nevertheless, complexity and imperfection has continuously provided challenges for experts in the area of language testing. Early language testing theories of Lado (1961) and Carroll (1961) have emphasized two objectives for test developers: (1) what language ability is being assessed; and (2) how it is being assessed. More recent literature has specifically stated that quality of language testing is maintained through addressing practicality, validity and reliability issues (Brown, 2000: 385-387; Hughes: 1989: 22, 29).

1.2.   Structure of discussion

This paper reviews related literature in the area of international proficiency language testing in order to identify relevant issues and critique the latest version of the speaking component of the internet-based proficiency Test of English as a Foreign Language (TOEFL iBT®) by looking at the advantages and disadvantages of the speaking section.

II.  Critical analysis of an international proficiency test

 2.1.   Relevant literature review

 2.1.1.      Current and relevant discussion

International proficiency tests measure the language ability of non-native English speakers, regardless of any training (Hughes 1989: 9). Rapid-turnover permits cost-effective and practical determination of suitability of prospective applicants for tertiary education, employment and visa applications. However, inherent high-stake repercussions for examinees and existing imperfections within language testing provide strong evidence for the need of continuous and never-ending evaluation of validity and reliability issues (Shavelson et al., 2002: 5-6; Uysal, 2010).

2.1.2.      Relevant issues

Test design, standardised administration and security measures, topic authenticity and comparability, bias, markers reliability and objectivity, well-defined scoring methods, form-content issues are all relevant to test’s quality (Brown, 2000; ETS, 2011; Hall, 2010; Hughes, 1989; Uysal, 2010).

Davis et al. (2003) have indicated that social, economic, political, and unfair difference issues make all international tests biased since they do not represent the real scores of corpora of Englishes and are “agent of cultural, social, political, educational, and ideological agendas that shape the lives of individual participants, teachers and learners” (Shohamy, 1997: 3).

Furthermore, validity issues related to inner-circle norms overlook socio-linguistic reality of the candidates language use (Elder & Harding, 2008; Lowenberg, 2002). Brown (2000: 394) has argued that while large-scale tests are practical and reliable, they possess fundamental issues in finding effective ways to connect with communicative abilities of examinees. As nature of speech consists of complex matrix of underlying skills, valid identification of fundamental conversational characteristics in a restrictive and closed testing system, using single score marking is significantly challenging (Valdman, 1988: 125, cited in Brown, 2000: 396).

2.2.   Critical analysis 

2.2.1.      Brief account of an international proficiency test

TOEFL is a rigorously researched American test, developed and run by New Jersey based Educational Testing Service (ETS). First administered in 1964, it is now accepted in over 130 countries across more than 6000 institutions. The latest internet-based version of TOEFL (TOEFL iBT®) is a complement to the paper-based format which includes mandatory speaking and writing sections and other integrated tasks that require examinees to use more than one skill simultaneously. TOEFL iBT® directly assesses the 4-macro skills while indirectly assessing note-taking and grammar skills.

TOEFL iBT® speaking section is twenty minutes long and mainly measures the speaking ability. It includes two independent tasks about familiar topics and four integrated tasks where more than one skill is used. Examinees are expected to summarise, compare, convey information, explain ideas and defend opinions clearly, coherently and accurately from multiple sources in a spontaneous manner. Reading passages and listening passages of up to 100, 120 words, respectively, are given and after 30 seconds examinees are expected to respond verbally for up to 60 seconds. Each task is marked on a point scale of 0 to 4 and scored on four criteria: general description (intelligibility, task fulfillment and coherence); delivery (fluency, clarity, intonation, stress, pronunciation); language use (vocabulary and grammar); and topic development (relevance, relationship and progression of ideas). Responses are recorded and centrally marked (Davis, 2003: 580; ETS, 2004; ETS, 2011:1).

2.2.2.      Identify some key points you want to analyse

Test design, administration and marking system, topic authenticity, comparability and fairness, and markers’ objectivity are analysed for the speaking section that involves testing pronunciation, grammar, vocabulary, fluency, intonation, clear, coherent, and cohesive expressions (Brown, 2000: 268-271; Hughes, 1989: 111-112).

2.2.3.      Advantages

The speaking section contains clear and unambiguous instructions that identify and capture the underlying skills and knowledge. Authentic and relevant academic content and significant reduction in focus on grammar minimises North American bias, and misuse for non-academic purposes. This is further supported by Sawaki et al. (2009, cited in Anderson, 2009: 623) that claims that TOEFL iBT® is “better aligned to the variety of language use tasks that examinees are expected to encounter in every day academic life”.

ETS (2011: 3) argues that validity and reliability is maintained via detailed test specification; standardised administration and security; score reliability and generalisation monitoring; task design appropriateness; marker’s retraining; and well-defined, centrally-marked, holistic and rubric-based scoring system that closely reflects theory of communicative competence, where more than one skill is assessed simultaneously in wide range of goals and contexts (Jamieson et al., 2008: 57).

Furthermore, rubrics are used for  score comparability across the format by comparing markers scores for the reading and listening sections on the speaking and writing sections, thus increasing marker’s reliability.

The use of number-based identity for examinee further increases objectivity and reliability (Hughes, 1989: 35, 42).  ETS (2008: 5) claims the speaking section maintains a relatively high reliability coefficient of 0.88 with standard error measurement of 1.62.

2.2.4.      Disadvantages

The speaking section contains speaking under monologic conditions, with only native English speakers used in dialogues (Anderson, 2009: 622). Competence in effective and interactive communication related to lectures, class presentations and debates with native speakers of English is not only necessary and valid, but also it reduces bias and thus merits testing.

The absence of face-to-face interview not only lacks authenticity but is not the best-way to test oral proficiency according to Brown (2000: 395). Despite holistic scoring being reliable, it is not the best in measuring specific skills nor identifying strengths and weaknesses (ETS, 2008: 12).

As proficiency tests take samples of candidate’s ability at one particular time, results do not guarantee the applicant’s true competency in any particular context threatening the external validity of the tests.

III.             Conclusion

 3.1. Key points covered

We have identified and critically analysed relevant literature on international, internet-based TOEFL proficiency test (TOEFL iBT®) by investigating the advantages and disadvantages of the speaking section, validity and reliability issues related to test design, administration and marking, content authenticity, topic comparability, and markers objectivity.

3.2. Implications of the analysis

As future debates continue in the area of language testing, this analysis has emphasised the need for continuous and never-ending programme of research. Any forthcoming demands on objectivity and validity require a thoughtful balance against authentic contextualised speech performance. This analysis has also indicated that the use of holistic and rubric-based scoring could possibly be used in next line of proficiency tests that have a stronger focus on whole performance to include pragmatic (socio-linguistic, functional), strategic, and interpersonal/affective components of language ability and conversation analysis.

IV. References

Anderson, C. (2009). Test of English as a Foreign Language: Internet Based Test (TOEFL iBT). Language Testing 2009, 28 (4), 621-631. Downloaded from at University of Western Sydney on August 29, 2011.

Brown, H.D. (2000). Teaching by principles: An interactive approach to language pedagogy. White Plains, NY: Pearson Education

Carroll, J. B. (1961). Fundamental considerations in testing English language proficiency of foreign students. In H. B. Allen & R. N. Campbell (Eds.), Teaching English as a second language (2nd ed.,pp. 313–321). New York: McGraw-Hill.

Davis, A., Hamp-Lyons, L., Kemp. C. (2003). Whose norms? International proficiency tests in English. World Englishes. Vol.22, No. 4, pp 571-584.

Elder, C., & Harding, L. (2008). Language testing and English as an international language. Australian review of applied linguistics, Vol 21, No 3.

Hall, G. (2010). International English language testing: a critical response. ELT Journal Vol. 64. No.3.

Hughes, A. (1989). Testing for language teachers. Cambridge: Cambridge University Press.

ETS. (2004). iBT/Next Generation TOEFL Test Independent Speaking Rubrics (Scoring Standards). Retrieved September 10, 2011, from

ETS. (2008). TOEFL iBT at a Glance. Retrieved September 15, 2011, from

ETS. (2011). Reliability and comparability of TOEFL iBT scores. TOEFL iBT research, series 1 Vol 3. Retrieved September 05, 2011, from

Jamieson, J. M., Eignor, D., Grabe, W., & Kunnan, A. J. (2008). Frameworks for a new TOEFL. In Chapelle et al. (Eds), Building a validity argument for the Test of English as a Foreign Language (pp. 55–95). New York: Routledge.

Lado, R. (1961). Language testing. New York: McGraw-Hill.

Lowenberg, P. (2002). Assessing English proficiency in the Expanding Circle. World Englishes, 21 (3), 431-435.

Sawaki, Y., Stricker, L. J., & Oranje, A. H. (2009). Factor structure of the TOEFL Internet-based test. Language Testing, 26(1), 5–30.

Shahomy, E. (1997). Critical language testing and beyond. Paper delivered at the American Association of Applied Linguistics, Orlando, Fl, March.

Shavelson, R.J., Eisner, E.W. & Olkin, L. (2002). ‘In memory of Lee J. Cronbach (1916-2001).’ Educational Measurement: Issues and Practice 21, 2, 171-177.

Uysal, H. (2010) A critical review of the IELTS writing test. ELT Journal Vol. 64 No.3.

This entry was posted in Language Education and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.