In Defense of Standardized Testing | Stats + Stories Episode 224 / by Stats Stories

Howard Wainer is a statistician and research scientist with a specialization is the use of graphical methods for data analysis and communication, robust statistical methodology, and the development and application of generalizations of item response theory. After serving on the faculty of the University of Chicago, a period at the Bureau of Social Science Research during the Carter Administration, and 21 years as Principal Research Scientist in the Research Statistics Group at Educational Testing Service. He has authored more than 20 books, John’s favorite of which is Truth or Truthiness: Distinguishing Fact from Fiction by Learning to Think like a Data Scientist.

Episode Description

The utility of standardized testing is under debate in the US with opponents of their use in K-12 suggesting educators are now being forced to teach to tests. In higher education, there's been a push to abandon the use of standardized tests in admissions processes. But if we throw out standardized tests completely, are we throwing away a tool that still has some value? That's a question framing this episode of Stats and Stories with guest Howard Wainer.

+Full Transcript

Rosemary Pennington
The utility of standardized testing is under debate in the US with opponents of their use in K through 12. Suggesting educators are now being forced to teach to tests. In higher education, there's been a push to abandon the use of standardized tests and admissions processes. But if we throw out standardized tests completely, are we throwing away a tool that still has some value? That's a question framing this episode of stats and stories where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and stories is a production of Miami University's Department of Statistics and media journalism and film, as well as the American Statistical Association. Joining me as usual is regular panelist John Bailer Chair of Miami statistics department. Our guest today is Howard Wainer. Wainer is a statistician and research scientist with a specialization in the use of graphical methods for data analysis and communication, robust statistical methodology, and the development and application of generalizations of item response theory. After serving on the faculty of the University of Chicago at the Bureau of social science research during the Carter administration, and 21 years as Principal Research Scientist and the Research Statistics group at Educational Testing Service, he's now distinguished research scientist at the National Board of Medical examiners, Wainer has authored more than 20 books, the latest of which is truth or truthiness, distinguishing fact from fiction by learning to think like a data scientist, he also authored an essay and chance which questioned whether getting rid of standardized tests is a good idea. Thank you so much for joining us today.

Howard Wainer
Well, thank you. Thank you for having me.

Rosemary Pennington
I, you know, to get started with this conversation, I wonder what made you feel like you had to write this essay about standardized tests.

Howard Wainer
Oh, aside from having spent 40 years of my career working in standardized testing, it's like when somebody says untrue and rude things about a close friend, you feel that it's important to stand up for them. And so I did and and I've started to do that, it's a mystery to me why large testing organizations like ETs and AC T, don't have a fair number of their staff members, working away diligently writing op ed pieces, and answering these various kinds of responding to the various kinds of concerns that are being expressed in the in the public media.

John Bailer
So before we dive into that, Howard, before we dive into sort of those concerns, if I could just get you to kind of help deconstruct a little bit of this context where we are. So we're talking about standardized tests. And if you could just give a, you know, a couple sentence summary of what the purposes of tests would be in sort of in the use that you've you've worked, and then what a standardized mean, in this context.

Howard Wainer
Testing goes back a long way, a very long way, certainly to the Han Dynasty in China. So we're talking about 1000s of years. And the key idea of a test is that a small sample of behavior taken under controlled circumstances, can predict future behavior of different sorts, under uncontrolled circumstances. And so when we give a test that takes two or three hours in which a person sits with a number two pencil, we don't want to know how they're going to do with number two pencils in the future, we want to know how they're going to do at university or in licensing tests. We want to know how they're going to fly a plane or how they're going to work as a physician. And obviously, the closer the test is to the actual thing, the smaller the intellectual leap. But quite often you don't know what the thing is going to be. So you want something that's sort of general , that's what a test is. It's a small sample of behavior taken under controlled circumstances. And controlled circumstances are very important. That's where we get to the other part of your question and standardized test. And the whole idea is that everyone who's taking that particular test is subjected to it in precisely the same way or is it as similar a way as you can possibly manage? So that you know what you're talking about? If you can, if some people take it with a book open in front of them, and others take it without one, if some people take it in an hour and others take it in five hours, you know, some can bring their mother with them, you know, that's not standardized. And so you want to be you want to be able to control as much as you possibly can. Again, think a little bit about the principal thing that I had, that concern me in all of the public discussions about why they're not going to use they They stopped giving college entrance tests during COVID Because you couldn't have a bunch of people gather in a gymnasium all at once, and so I had to do something else and various Hornets have testing or they're always opponents. One of my colleagues says the testing is a lot like a proctologist. You know, nobody really likes them, but you gotta have them and, and nobody likes tests, you know, but so during the time of COVID, they had to make do without them and they tried all sorts of things. And then various organizations like the University of California, they've decided that they're going to try not having them anymore, and they're going to substitute something else. And that brings me to Henny Youngman. Henny Youngman is famous for his one line quips and one was, someone said, How's your wife? And he said, compared to what? And that now, if you ask when you start talking about tests, and you say, Okay, we're going to do without tests, okay, what are we going to do instead? Because you have to do something whenever you have to allocate resources in a limited way. And, limited resources means we have openings for 1000 students, and we have 10,000, who are applying, how do you pick. And before there were tests, like the college entrance exams began in the beginning of the 20th century. And it worked its way in a little bit at a time. Back then, it turned out that there was a guy named Henry Chauncey, who was working in the admissions office at Harvard. And he was a big fan of these admissions testing Columbia had really pioneered. And so he went to the president of Harvard, Lowell, and said to him, we should do this, because we have more people applying to Harvard than we have space, and this allows us to make better decisions, and Lowell turned them down flat. Because he felt that test scores, if you use test scores, it wouldn't exclude enough Jews. And he preferred having quotas. And President Lowell, had similar opinions about lots of other things, and in fact, just recently, by just recently, I mean, within the last six months, Lowell House at Harvard, has been renamed, it only took 100 years, but they finally got rid of him. But that was the whole point. What was going on back then was, was the old boy network, both of you are too young to remember this. But when I applied to college, one of the things you put on the application was a photograph of yourself, guess what that was used for? Hmm. And then that was subsequently they got rid of it. And the conditions for choosing people that got in, is you would call your friends and ask if they had anybody good. And it resulted in, you know, really kind of dreadful kinds of segregation. And right segregation, I don't mean just racial segregation, but segregation of all sorts. However, as soon as you have a test, the game has changed. Now, that doesn't mean the test is completely flawless. Doesn't mean that at all, what but what you can do is because you're doing the same thing over and over again, you can control it, you can try doing it one way and doing it another and seeing if you can fix it, and then make a change. And there are gazillions of examples of flaws in the LSAT that were corrected, and they don't do them anymore. I know the opponents of testing often point out a vocabulary item that had the word Chaka in it, Chuck is a period in a polo match. And you can imagine that there are a lot of people that don't know that. And they didn't know Chuck a hasn't appeared on an essay T probably in 70 years. Because they found that they said That's stupid, we're not going to do that. And then there were, you know, lots of other things that were done both to make them fairer, and make them less offensive. So at one point, I think the term Eskimo was considered to be derogatory. And so they substituted in into it, which is the term and and I remember the first time they made that change, the phrase that was used in the test, it was a reading comprehension thing was the Eskimos or into it says I like to be called, they disappeared quickly enough. But the point is, if you have a standardized test, you can constantly improve things.

Rosemary Pennington
Howard, you raise this issue that admissions to college used to be I was gonna say kind of discriminatory, but I will say a lot of discriminatory,

Howard Wainer
Discriminatory, in what sense, please.

Rosemary Pennington
Well, just, you know, what you said like, you had to include a picture right? So like, people could sort of sift through and figure out like, Oh, we don't want that kind of person here. But either and I think this argument about standardized tests, sort of helping make those admission processes more objective, is persuasive. But I do feel like there are criticisms from people in education who are concerned that standardized tests can sometimes sort of reinforce inequalities. That has nothing to do with intelligence and everything to do about where a student is located. And I wonder if you have thoughts about sort of whether there's validity to those criticisms, or whether there are ways that standardized tests can sort of take into account that kind of thing when they're being created. Cuz you just talked about how these change over time.

Howard Wainer
Yeah. All right. Well, let's listen. Let's there's two parts to the answer. The first part is you're absolutely right. There are biases that can happen. But let's think about what really happens. When someone takes the LSAT. They walk into a big room, completely naked, with just the number two pencil. There's no but mom isn't behind them. The tutor isn't behind them. They're what their performance is based strictly on how much they have absorbed. Now, yes, if you come from back in the hills of West Virginia, and so in Appalachia or an inner city, you're not going to be as well prepared as someone who's gone to prep school and all sorts of other things. That's certainly true. And there's no way around that. Because, you know, it's like saying, Well, some people are taller than others, how are we gonna make basketball fair, you know, that's just the way it is, but it is fairer. And the thing that has not been discussed very much, and I think it's really important, is the power of tests, in particular, because they're cheap. They're cheap to administer, you know, you're talking about a two or three hour test that handles everything you've learned for the first 12 years of your education. They're cheap, and they're easy. And it's a spectacularly good way of being able to find jewels hidden away in places that you'd never would have found them. So sure, it's easy to go to Lawrenceville trap and find some terrific students. Or what about the Southside of Chicago, you know, you see this tall black girl with, you know, overwhelming LSAT scores, they accepted her Princeton, her name was Michelle Robinson. And her brother went to Princeton ahead of her again, spectacular, you know, Michelle Robinson or Mellody Hobson, you know, a black woman from Virginia, who went to Princeton also, they found her spectacular. She obviously is a very, very smart person. But they might not have been able to find them, they found them. And the Merit Scholarship has known this for a gazillion years, because they give the PSAT which is dirt cheap, you take it for free, and they give it to millions. And out of those millions, they find the 1500 highest scores and give them scholarships, because they're able to go into all corners of the earth. Is it perfect? No, but it gets way better than anything else. Again, compared to what, what are you going to do otherwise?

John Bailer
So one of the things that you mentioned is that you're the definition that you gave earlier about the tests of the sample of behaviors under controlled circumstances, that will predict behavior under uncontrolled conditions. So you know, what are some of the when you think about the standardized tests, and when you talk about these, these entrance exams for colleges or other types of exams are psi Ts, as you just mentioned? What do these behaviors that typically are calibrated to predict?

Howard Wainer
Let me let me shift your question. But it is basically what you know, the SAP grew from a test that was developed in 1918. in Vineland, New Jersey, which became known as army alpha, in the army strike was trying to make better decisions in terms of manpower utilization. And the work of the Stanford Binet test was very influential. And so they developed this test called the army Alpha actually had two forms on the alpha, which was a written test, and Ami beta, which was nonverbal, for people who could do real illiteracy. And it was to be able to make better judgments about all the various enlistees and what they would, what kinds of training they would undergo. And it was so successful that it was then copied by the College Board. So the College Board tests grew from that which grew from an IQ test. So what we're measuring, to a large extent, his intelligence now let me shift to a different kind of testing. For 15 years, I spent licensing physicians, and the tests that licenses physicians in the United States. It's called the USMLE, the United States medical licensing exam. And everyone who practices medicine in this country, whether us trained or foreign trained, has to pass this and it is the test from hell. Three, eight hour sections. It goes on and on and on. And so let me get to your question, which was, how do we know that what's being tested there is going to produce a good doctor, because that's what we care about. We want a good doctor. Well, you do the best you can in the topics. So you've got topics like anatomy and physiology and psychology and again, three, eight hour tests. So this is a big deal, including one that's clinical, where you have what are called, they're actors. They're called standardized patients, an actor who goes in and you know, we're hiring you, and you're going to be the SP of the standardized patients, and you're going to represent urinary tract infection. So you sit in the room, and a candidate comes in, and you describe all of your symptoms, and they poke around, and then they have to do something. I mean, this is a serious thing. However, in the end, the only thing we can be sure of, is that we're hoping to pick out who are going to be smart doctors, we don't know if they're going to be good doctors, we just hope that there's a strong relationship between being a smart doctor and being a good doctor. And we're pretty sure, but there's no evidence necessarily that a stupid doctor is not going to be a good doctor. So we got some, we have to have some support for that. And there's been attempts to try to determine how good you know whether being a smart doctor is a good predictor of who's going to be a good doctor. And how good is that. But you can certainly imagine other characteristics of a person that are not measured by the test. There's a famous case of all the various things that went into getting somebody admitted to medical school, their high undergraduate grades and their MCAT scores and things like that. And there was this one guy who was a professor at a medical school, an old guy. And he felt that these all the various criteria that were being used, didn't really get at three important variables, and particular maturity, commitment to medicine, and neuroticism. And so he was going to personally interview all of the finalists for admission, and evaluate them on those three different variables. And I won't go through the whole story about how this was discovered. But it turned out, if you were a man and married, you were considered mature. If you were a woman and married, you weren't committed to medicine, if you were a man and divorced, you were considered committed to medicine and a woman in divorce, you are considered neurotic. And so the particular medical school he was dealing with tended to vastly, vastly over admit men over women. And they only discovered this when they stated that it was an allocated extra money for more students, and they had a lot of women in the waitlist. And they were very concerned about what they were going to do because they weren't good enough to get individually, and they followed them. And of course, they blew the top off the class. Now that's an unfair, standardized test, and sounds good. It's one of these ideas that only makes sense if you say it fast. Physicists call it a Doppler effect. Those are ideas that come fast. only makes sense that dummies,

Rosemary Pennington
You're listening to Stats and Stories. And today we're talking about standardized tests with Howard Wainer.

John Bailer
So, you know, Howard, I have enjoyed your, your work, you know, and actually anybody that calls a book truth and truthiness has, alright, by me as a starting place, that was a great title. You've you've written recently, an art book with a colleague on on the history of some data visualization you've been writing, and you said for 32 years a column on on these visual revelations, I was just curious, how did you get started in and kind of data vis and, and and you're interested in the history and exploring the history of visualization?

Howard Wainer
Well, it's, it's It started when I was a student, the history of statistics in the world really peaked at around 1961 or so. And up until that point, statistics was filled with formalisms. And that then names Fisher and, and Neyman. And Pearson dominates. And it was a branch of mathematics. But in 1962, John Tukey, changed the rules of the game at Princeton, and he wrote this wonderful paper on the future of data analysis, where he said that the goal of looking at data is insight, not equations. And, the greatest value is when you find something that you weren't expecting. And so I showed up at Princeton in 1965, and took his work and was just sort of starting to have its influence. And, and it just, it made so much sense to me. Also, I had some colleagues in the math department who were so much better than I was that I knew I wasn't going to be able to compete with them. And so this looks like a way of doing it. Tookie, of course, was, in some sense, a terrible model, because you couldn't possibly model yourself after him. Because he knew everything. I mean, he was smarter than anyone you could ever imagine. And he knew everything. There's this wonderful story about because he used to get annoyed at people who ask him ridiculous questions, to see whether or not Do it because he always did. And so the story went that if you want to learn, if you want to find out how to milk an elephant, don't ask John, just go in and start talking about elephants. And eventually, it'll get around to how to milk them. And he'll tell you.

John Bailer
Okay, so it was this, the emergence of EDA and some of the insights that translate that.

Howard Wainer
And in fact, John McCarthy, at Yale, and I at Chicago taught the first two courses in EDA, outside of Princeton and Harvard, I guess. Mosteller was doing it at Harvard. And it didn't meet these expectations. My course didn't meet with great success. Because the students that I had, didn't understand this stuff they wanted to know about chi square and analysis of variance and regression. And what's this stuff about, you know, all the stuff you're doing? So it met with enormous resistance, Fred and John had a symposium at triple A S. And they asked McCarthy, John McCarthy and I to go and talk about our experiences in teaching EDA, and that that led to graphics and things like that my first graphics paper was, was part of that.

John Bailer
So that's, that's exploratory data analysis. Rosemary. Sorry, sorry. Thank you. That was an inside baseball reference that I realized after I had shared it.

Rosemary Pennington
I haven't heard that term since I took a statistics class in graduate school.

Howard Wainer
So here in that course, Yes. Oh, all right. Yeah, that's good.

Rosemary Pennington
But I did, you know, I'd finished my graduate work in 2007. So I think they were just data visualizations, and were just becoming a big thing in my program. I wonder what advice would you have Howard, for people who are interested in exploring, you know, data analysis and data vis as a career, because I would imagine that you've really seen an evolution in the field given your experience?

Howard Wainer
Well, you know, the obvious advice is buy as many of my books as you can.

Rosemary Pennington
Maybe we should start saying, this podcast as well, John, that's great.

Howard Wainer
But it doesn't matter what subject you're in, if you're interested in any subject in which you're going to make claims. And if you're going to make a claim, it means you need evidence to support those claims, whether that's in a court of law, or in a scientific circumstance or anything else. Now, what, what's evidence? That's, that's, you know, there's been a long history of that, you know, certainly starting with Aristotle, but it really got rolling somewhere around Hume, where evidence was considered to be experiential, as opposed to, you know, the Socratic idea of sitting back and figuring out rationally so evidence is, is, is data, but it's not any data, evidence is data related to a claim. And so, David, by themselves, you know, I know your shoe size is data. But that doesn't mean anything unless there's some claim in which the shoe size mat has is supportive or antithetical to whatever the claim is. So any field that anybody's going into, if they're interested in making causal inferences, and making claims, and trying to gather evidence for an audience of people for whom evidence matters. Now, we live in a world now, where there's a lot of people for whom evidence doesn't matter, witness all the people who don't want to have a COVID shot, or don't want to have standardized testing.

Rosemary Pennington
Well,that's all the time we have for this episode of Stats and Stories. Howard, thank you so much for joining us today.

Howard Wainer
Oh, it's a great pleasure to meet you both.

John Bailer
Well, thanks, Howard. It was great. Thank you for joining us.

Rosemary Pennington
Stats and Stories is a partnership between Miami University’s Departments of Statistics, and Media, Journalism and Film, and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places you can find podcasts. If you’d like to share your thoughts on the program send your email to statsandstories@miamioh.edu or check us out at statsandstories.net, and be sure to listen for future editions of Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics.