USES OF BENCHMARK TESTS latest ss

icon

8

pages

icon

English

icon

Documents

Écrit par

Publié par

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

8

pages

icon

English

icon

Documents

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

The Uses of Benchmark Tests to Improve Student Learning Hardin Daniel and Betsy Wheeler ThinkLink Learning/Discovery Education © 2006 Hardin Daniel and Betsy WheelerThe Uses of Benchmark Tests to Improve Student Learning Michael K. Smith and Jacqueline Shrago ThinkLink Learning/Discovery Education The use of formative assessment to improve student learning has gained support in recent years from meta-analyses of research studies showing its effectiveness (Black & Wiliam, 1998; Black, Harrison, Lee, Marshall, & Wiliam, 2003) and theoretical discussions articulating its principles (Sadler, 1989). Conversely, there are ongoing disputes about the definition or definitions of formative assessments. Recently, the Council of Chief State School Officers formed a special interest group to sort out these various constructs. Benchmark tests can be considered an aspect of formative assessments. Shrago and Smith (2006; see also Smith, 2006) presented evidence that online benchmark tests dramatically improved student learning. Herman and Baker (2005) outlined six criteria for “Making Benchmark Testing Work.” These six criteria should apply to both commercial and school produced benchmark tests according to Herman and Baker. ThinkLink Learning/Discovery Education has pioneered the use of high quality benchmark testing to help schools improve student learning as defined state content standards and as measured by state ...
Voir icon arrow

Publié par

Nombre de lectures

83

Langue

English

The Uses of Benchmark Tests to Improve Student Learning
Hardin Daniel and Betsy Wheeler
ThinkLink Learning/Discovery Education
© 2006 Hardin Daniel and Betsy Wheeler
2
The Uses of Benchmark Tests to Improve Student Learning
Michael K. Smith and Jacqueline Shrago
ThinkLink Learning/Discovery Education
The use of formative assessment to improve student learning has gained support in recent
years from meta-analyses of research studies showing its effectiveness (Black & Wiliam,
1998; Black, Harrison, Lee, Marshall, & Wiliam, 2003) and theoretical discussions
articulating its principles (Sadler, 1989).
Conversely, there are ongoing disputes about
the definition or definitions of formative assessments.
Recently, the Council of Chief
State School Officers formed a special interest group to sort out these various constructs.
Benchmark tests can be considered an aspect of formative assessments.
Shrago and
Smith (2006; see also Smith, 2006) presented evidence that online benchmark tests
dramatically improved student learning. Herman and Baker (2005) outlined six criteria
for “Making Benchmark Testing Work.”
These six criteria should apply to both
commercial and school produced benchmark tests according to Herman and Baker.
ThinkLink Learning/Discovery Education has pioneered the use of high quality
benchmark testing to help schools improve student learning as defined state content
standards and as measured by state assessments.
This article addresses how ThinkLink
benchmark tests address each of the six criteria outlined by Herman and Baker.
Furthermore, this article outlines how high quality, commercially developed benchmark
tests can help schools improve student learning and address accountability issues raised
by such laws as the No Child Left Behind Act.
Alignment
Herman and Baker suggest, “Unless benchmark tests reflect state standards and
assessments, their results tell us little about whether students are making adequate
progress toward achieving the standards and performing well on the assessment.”
They
also caution against benchmark tests that “mimic the content and format of annual state
tests” without also addressing state standards.
ThinkLink Learning/Discovery Education benchmark tests are aligned with state
standards.
Benchmark test development begins by examining the curriculum standards in
each state, particularly the larger objectives articulated in these standards.
Then the
standards that are most frequently assessed are also examined.
The resulting benchmark
tests are a synthesis of the important state standards that are listed in curriculum standards
and measured on the state’s assessment program.
For benchmark tests to be helpful to teachers, they must provide useful, timely
information that can be utilized in a reasonable amount of time.
Benchmark tests must
not interfere with instruction.
At the same time, benchmark tests must truly assess the
most crucial skills.
All ThinkLink/Discovery benchmark tests are designed to be
3
administered within a single class period.
Results are provided immediately for online
users and within two weeks for students who take paper-based versions of these tests.
ThinkLink/Discovery benchmark tests are completely aligned with state standards and
assessments in the following states:
Alabama, Florida, Illinois, Indiana, Kentucky,
Michigan, New York, North Carolina, Ohio, Tennessee, Virginia, and Wisconsin.
ThinkLink/Discovery also produces benchmark assessments that blend state criterion-
referenced standards and norm-referenced standards in the following states: Arkansas,
Colorado, Mississippi, Missouri, New Mexico, Oklahoma, South Carolina, and West
Virginia.
One goal of ThinkLink/Discovery is the development of high quality benchmark tests
aligned to the state standards in all states.
Mapping Content
Herman and Baker suggest that “The alternative to aligning benchmark tests with the
specific content and format of state assessments is to align them with priority content and
performance expectations implicit in state standards.”
This is not been necessary for
ThinkLink/Discovery’s approach for the 12 states that are fully aligned.
This is precisely
the approach that has been used for the ‘blended tests’ available in the other 8 states
described above, however.
Conversely, Herman and Baker suggest that if schools did develop their own benchmark
tests, these tests could map “standards in terms of specific content knowledge that
students need to acquire” such as the “four cognitive levels suggested by Norman Webb
(1997).”
These four levels include recall, conceptual understanding, problem solving,
and extended and strategic thinking.
While this approach can be useful for determining
long-term curriculum changes, teachers generally desire a state-level standards
comparison for current requirements to adequately teach to the standards.
Since the inception of ThinkLink Learning in 2000, ThinkLink/Discovery benchmark
tests have addressed levels of cognitive complexity such as those articulated by Bloom
(Bloom et al.,
1956; Bloom et al
.,
1971; Anderson et al
.,
2001).
Furthermore, levels of
understanding and depths of knowledge articulated by specific state standards have also
been addressed (see, for instance, information on Kentucky benchmark tests).
During the
2006-2007 school year, ThinkLink/Discovery won the contract to develop the benchmark
testing program for the Milwaukee City Schools grades 3-9 in the subject areas of reading
and mathematics.
All of these assessments are aligned with the principles articulated by
Webb.
Focusing on Big Ideas
Herman and Baker note, “By incorporating the key principles that underlie state or
district standards into benchmark assessments, educators have a reasonable strategy for
addressing the breadth of these standards.”
To incorporate this suggestion into
4
benchmark testing, the definition of “key principles” of “big ideas” must be addressed.
ThinkLink/Discovery benchmark tests have always focused on the key principles
articulated in state standards and measured by a state’s assessment program.
The
benchmark tests are designed to provide a snapshot of student performance on these key
principles.
When the “key principles” have been articulated by a specific school district,
ThinkLink/Discovery has helped that district develop benchmark tests to address its
specific priorities (see information on Jefferson County Public Schools in Kentucky, for
instance.)
More recently, these “key principles” for mathematics and science have been articulated
by national organizations and research groups such as the National Council of Teachers
of Mathematics Curriculum Focal for Mathematics (NCTM, 2006) and the National
Research Council’s Taking Science to School: Learning and Teaching Science in Grades
K-8 (Duschl, Schweingruber, & Shouse, 2006).
Herman and Baker also suggest “Despite the ease of scoring multiple-choice items,
benchmark tests should employ many different formats to enable students to reveal the
depth of their understanding.”
Currently, ThinkLink/Discovery benchmark tests are
strictly multiple-choice.
This is primarily a result of financial limitations in schools and
districts paying for scoring of non-multiple choice items. On the other hand, practice
items for these benchmark tests have always included open-ended items with sample
scoring rubrics and student examples when appropriate.
The practice pool allows
teachers to help students deepen their understanding of key concepts through practice in a
variety of formats.
Diagnostic Value
Herman and Baker note, “A test has diagnostic value to the extent that it provides useful
feedback for instructional planning for individuals and groups. A test with high
diagnostic value will tell us not only whether students are performing well but also why
students are performing at certain levels and what to do about it.”
ThinkLink/Discovery benchmark tests provide detailed feedback on student performance
in a format that is easily used by teachers, parents, and even students. Further, these
benchmark tests are released for teacher and student use after administration. Unlike most
standardized tests, these benchmark test questions can be examined by teachers and
students. Therefore, the diagnostic value is increased as students and teachers discuss
correct and incorrect responses.
Feedback reports provide a variety of diagnostic information.
Diagnostic information is
summarized for every state objective and subskill at the class and student level. Teachers
use the test items and the reports to provide immediate feedback to students and
determine the specific concept within a subskill that students are not mastering.
The
unique advantage of the ThinkLink/Discovery approach is that parents, students, and
5
teachers become colleagues in analyzing test items.
Self assessment and peer assessment
methods are used to understand why students selected certain correct and incorrect
responses.
Feedback analysis and utilization are demonstrated and supported through
such services as professional development sessions, support telephone calls, manuals and
collectively as classroom teachers, parents, and administrators examine the reports.
Finally, professional development activities constitute an integral part of the use of these
benchmark tests. No professional development is needed to administer the tests.
Teachers know how to do this.
However, once the data is available (immediately for
online testing or within a few days for paper administration), teachers often collaborate
with a ThinkLink/Discovery professional to devise classroom teaching strategies to
overcome the prominent gaps.
This professional development is typically organized
within a school during a class period, with one grade group team meeting together.
This
structure allows flexibility of embedding the professional development into the learning
year and also fosters ongoing learning communities within the school. Professionals work
together to examine what works and what doesn’t work.
Such sharing continues after the
professional development day in meetings with an instructional leader.
Evaluations
support the effectiveness of these professional development activities.
Fairness
As Herman and Baker suggest, “Fair benchmark tests provide an accurate assessment of
diverse subgroups.”
All ThinkLink/Discovery benchmark test items undergo standard
fairness reviews to eliminate bias related to gender, race, and other categories
.
Teachers
are encouraged to apply high-stakes accommodations to the formative area, including
time allowances and individual testing.
Large print options are provided at no cost for
any student. If requested, ThinkLink/Discovery produces Braille tests and audio
recordings.
These accommodations are appropriate but some schools find the costs to be
burdensome.
Schools that have invested in software such as Read, Write, Gold, can
routinely convert a ThinkLink/Discovery test into an approach to address most testing
accommodations.
Technical Quality
Herman and Baker note, “Tests with high technical quality provide accurate and reliable
information about student performance.”
From its beginning, ThinkLink/Discovery has
developed benchmark tests that address the Standards for Educational and Psychological
Testing (1999).
All benchmark tests have the following technical characteristics:
Content Validity:
Subject matter experts align all tests with state standards.
Reliability:
All tests have reliabilities of .85 or greater as measured by
Cronbach’s alpha.
6
Criterion Validity:
Correlation of scores on benchmark tests with scores on state tests
provide evidence that these tests predict performance on key
indices reported on state assessments.
Predictive Validity: Benchmark tests also predict proficiency levels reported on state
tests.
Equated Tests:
Scores on benchmark tests are equated across time periods.
ThinkLink/Discovery is working with the Milwaukee City School System to develop
benchmark tests for grades 3-9 in reading and mathematics. These benchmark tests will
be scored using the Rasch measurement model (Wright and Stone, 1979) and will be
horizontally and vertically equated across all grades.
This project is the first of its kind to
use state-of-the-art item response theory measurement techniques to develop equated and
linked benchmark tests across multiple grades.
Utility
Herman and Baker define utility as “the extent to which intended users find the test
results meaningful and are able to use them to improve teaching and learning.”
Furthermore, “To make benchmark tests useful, schools must put the results in intended
users’ hands quickly and train them to interpret the information correctly.”
ThinkLink/Discovery benchmark tests provide timely reports. Results of online
benchmark tests are available immediately.
ThinkLink/Discovery scores and reports
results within two weeks for paper-based benchmark assessments.
Professional development activities train teachers to interpret the information from
benchmark tests.
(See comments above on typical professional development
incorporated into the ThinkLink/Discovery system.)
Feasibility
Herman and Baker note, “Benchmark testing should be worth the time and money that
schools invest in it.” ThinkLink/Discovery has helped schools by developing benchmark
tests that can be used by schools without the additional costs of development.
Finally, Herman and Baker suggest that to make benchmark testing worthwhile,
“educators ultimately need to look at the results . . . . Are they actually improving student
learning?”
ThinkLink/Discovery routinely implements research studies that demonstrate
how the use of these benchmark tests improve student performance on state assessments
and enable schools to meet AYP criteria.
Final Comments
The use of formative assessments is being widely researched in many educational
settings.
The uses of benchmark tests by ThinkLink/Discovery offer one method for
7
schools to quickly and efficiently administer benchmarks that are high quality
assessments of student learning.
These assessments provide valuable feedback for
teacher, parent, self, and peer feedback to facilitate student learning.
References
Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E.,
Pintrich, P. R., Raths, J., and Wittrock, M. C. (Eds.) (2001).
A taxonomy for
learning, teaching, and assessing:
A revision of Bloom’s Taxonomy of
Educational Objectives. Abridged Edition.
New York:
Longman.
Black, P., and Wiliam, D.
(1998). Assessment and classroom learning.
Assessment in
Education:
Principles, Policy, and Practice, 5
(1), 7-73.
Black, P., Harrison, C., Lee, C., Marshall, B., and Wiliam, D.
(2003).
Assessment for
learning:
Putting it into practice
.
Maidenhead, UK:
Open University Press.
Bloom, B. S. (Ed.), Engelhart, M. D., Furst, E. J., Hill, W. H., and Krathwohl, D. R.
(1956).
Taxonomy of educational objectives:
Handbook I: Cognitive domain
.
New York:
David McKay.
Bloom, B. S., Hastings, J. T., and Madaus, G. F. (1971).
Handbook on formative and
summative evaluation of student learning.
New York:
McGraw-Hill.
Duschl, R. A., Schweingruber, H. A., and Shouse, A. W. (Eds.) (2006)
Taking science to
school:
Learning and teaching science in grades K-8.
National Research
Council.
Washington, D.C.:
National Academies Press.
Herman, J. L., and Baker, E.L. (2005). Making benchmark testing work.
Educational
Leadership,
63, 3, 48-54.
National Council of Teachers of Mathematics.
(2006).
Curriculum focal points for
prekindergarten through grade 8 mathematics:
A quest for coherence.
Reston,
VA:
NCTM.
Sadler, D. R. (1989). Formative assessment and the design of instructional systems.
Instructional Science, 18,
199-144.
Shrago, J. B., and Smith, M. K. (2006). Online assessments in the K-12 classroom:
A
formative assessment model for improving student performance on standardized
tests.
In S. L. Howell and M. Hricko (Eds.),
Online assessment and
measurement:
Case studies from higher education, K-12, and corporate
(p. 181-
194). Hershey, PA:
Information Science Publishing.
Smith, M. K. (2006).
How can a large scale formative assessment be research-based
and valid?
Paper presented at the CCSSO conference, San Francisco, CA.
Standards for educational and psychological testin
g. (1999).
Washington, D. C.:
American Educational Research Association.
Webb, N. L. (1997).
Criteria for alignment of expectations and assessments in
mathematics and science education
. Madison, WI:
University of Wisconsin,
National Institute for Science Education.
Wright, B. D., and Stone, M. H. (1979).
Best test design
.
Chicago:
MESA Press.
8
The following ThinkLink research articles provide more detailed information on the
technical aspects of these benchmark tests:
What is Predictive Assessment in Tennessee?
What is Predictive Assessment in Kentucky?
What is Predictive Assessment in Alabama?
What is Predictive Assessment in Florida?
What is Predictive Assessment in Illinois?
Improving Learning in Birmingham: A Controlled Group Comparison
Research with Multi-State Examples
Research example: IL
Research example: TN
What Research is Available on ThinkLink Learning?
How Can a Large Scale Formative Assessment Be Research-Based and Valid?
Peer Review:
ThinkLink’s tests and results have been incorporated and analyzed in the following Peer
reviewed work:
Dr. Elizabeth Vaughn-Neely,
Associate Professor, Dept. of Leadership & Counselor
Education, Ole Miss University
eiv@olemiss.edu
662-915-5771
Dr. Marjorie Reed
, Associate Professor, Dept. of Psychology, Oregon State University
Peer reviewed finding based on their joint research and work with schools using
ThinkLink Learning were presented at:
Society for Research & Child Development, Atlanta, GA. April 2005
Kappa Delta Pi Conference, Orlando, FL November 2005
Two dissertations for Ed.S studies have been published:
Dr. Juanita Johnson, Union University
johnsonj4@k12tn.net
Dr. Monica Eversole, Richmond KY
meversol@madison.k12.ky.us
Voir icon more
Alternate Text