Achievement Effect-Size Benchmarks 321
Jencks, C., & Phillips, M. (1998). The black–white test score gap. Washington, DC:
Brookings Institution Press.
Kane, T. (2004). The impact of after-school programs: Interpreting the results of four
recent evaluations. New York: W. T. Grant Foundation.
Konstantopoulos, S., & Hedges, L. V. (2008). How large an effect can we expect from
school reforms? Teachers College Record, 110(8), 1613–1640.
Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research.
Newbury Park, CA: Sage.
Lipsey, M. W., Bloom, H. S., Hill, C. J., & Black, A. R. (in preparation). Find-
ings from prior studies as achievement effect size benchmarks for educational
interventions.
Lipsey, M. W., & Wilson, D. B. (2001). Practical meta-analysis. Thousand Oaks, CA:
Sage.
Ludwig, J., & Phillips, D. A. (2007). The benefits and costs of head start. Society for
Research on Child Development, Social Policy Report, XXI(3), 3–18.
Neyman, J., & Pearson, E. S. (1928). On the use and interpretation of certain test
criteria for purposes of statistical inference. Biometrika, 20A, 175–240, 263–
294.
Neyman, J., & Pearson, E. S. (1933). On the Problem of the Most Efficient Tests of
Statistical Hypotheses. Philosophical Transactions of the Royal Society of London,
Ser. A, 231: 289–337.
Nunnally, J. C. (1967). Psychometric theory. New York: McGraw-Hill.
Rosenthal, R. (1991). Meta-analytic procedures for social science. Newbury Park, CA:
Sage.
Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper & L. V. Hedges
(Eds.), The handbook of research synthesis (pp. 231–244). New York: Russell Sage
Foundation.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behav-
ioral research: A correlational approach. Cambridge, UK: Cambridge University
Press.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer.Newbury
Park, CA: Sage.
Tallmadge, G. K. (1977). The joint dissemination review panel IDEABOOK. Washing-
ton, DC: U. S. Office of Education.
TerraNova, The Second Edition (2002). California Achievement Tests, Technical
Report I (Monterey, CA: CTB/McGraw-Hill).
APPENDIX A: STANDARD DEVIATIONS OF SCALED SCORES
Table A1 shows that for each test, standard deviations are stable across grades
K-12. Thus, effect size patterns reported in this paper are determined almost
entirely by differences among grades in mean scaled scores. In other words,
it is the variation in growth of measured student achievement across grades
K-12 that produces the reported pattern of grade-to-grade effect sizes—not
differences in standard deviations across grades.
Downloaded by [Northwestern University] at 12:41 16 June 2014