Back to School Statistics | Stats + Stories Episode 107 / by Stats Stories


Libby Pier is the Research Manager at Education Analytics, overseeing and executing EA's diverse educational research portfolio, encompassing social-emotional learning, predictive analytics, academic growth measures, human capital analytics, and program evaluation.

Nichole Webster is a research analyst at Education Analytics. She examines the item properties and performance of Social and Emotional Learning surveys and estimates teacher and school performance metrics in R. She’s part of ongoing research that examines how Item Response Theory models estimate error. She studied Mathematics and Applied Economics at the University of Wisconsin

+ Full Transcript

Rosemary Pennington: School districts across the United States are working to understand how to best meet the educational needs of their students as well as the instructional needs of their teachers. Increasingly districts are turning to data to help them do that. The data of education and educational policy is the focus of this episode of Stats & Stories, where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats & Stories is a production of Miami University’s Departments of Statistics, and Media, Journalism and Film, as well as the American Statistical Association. Joining me in the studio are regular panelists John Bailer, Chair of Miami’s Statistics Department, and Richard Campbell former and founding Chair of Media, Journalism and Film. We have two guests joining us today both from education research non-profit Education Analytics. Nicole Webster is a Research Analyst with the organization and Libby Pier is the Research Manager. Libby and Nicole thank you so much for being here today.

Nichole Webster: We’re happy to be here.

Libby Pier: Thanks for having us.

Pennington: Just to get things started off can you explain a little bit about what your organization is and what it does?

Pier: Sure. Education Analytics is an education non-profit like you said. We conduct rigorous research and evaluate schools and teachers for different competencies, and cycles for students to achieve it in the Spring.

John Bailer: I was interested to read some of your work that said that there’s been requests or expectations that there’s indicators of school quality or student success, other than student cognitive ability. And that was something I wasn’t aware of. Can you talk a little bit about the traditional sense of what’s evaluated in terms of cognitive ability in some of these other measures that you’re looking at?

Webster: Definitely, Education Analytics grew out of a value-added research center at the University of Wisconsin Madison. I was primarily focused on looking at cognitive outcomes like math and reading test scores. We’re growing and expanding into the field of social and emotional learning, and social and emotional learning is the development of non-cognitive skills that are also important for classroom and career success.

Pier: I’ll also add on that this is nearing a trend that you see all across the country at the District and the State level. Since the passage of the Every Student Succeeds Act, which is essentially the legislation that replaced No Child Left Behind, there is something called colloquially a fifth indicator of student success, which is the idea that when schools or districts are evaluating school quality, they need to have a measure that doesn’t just rely on academic test scores. A lot of states have chosen to use things like their chronic absenteeism or student attendance rates to try and capture some of these non-academic measures. And some of the work that we’re doing at Education Analytics in conjunction with a group of districts in California is to really see if we can expand the different measures that might be able to fit for assessing how students are doing in these non-cognitive realms, above and beyond looking at things like attendance or suspension rates.

Richard Campbell: Very good. So, I started out my teaching career as a high school English teacher in the Milwaukie Public School system, Nicole, so I taught ninth grade America Literature for five years in classes of 35. I had five classes of 35 and their reading scores ranged from third grade level to twelfth grade level in one class.

Bailer: Oh my.

Campbell: So, we never did anything about this sort of social/emotional learning and this was basically 80% African American school they were very poor, so my question, if I was doing this today, how would your research help me in these classes?

Webster: That’s a great question. We’ve shown that students’ social/emotional skills contribute to their academic success. So, when a student believes that they can master the hardest concepts in their classes they’re better able to achieve academic and career success. We also look at item neutrality and one thing that’s different between assessing the social/emotional survey is that the survey is administered to a wide span of grade levels, fourth through twelfth grade. And it’s typical to ask the same question of all those age levels when you have such a wide span of ability. So, it’s really interesting and fascinating to compare how a survey would perform differently than an academic test.

Campbell: So, one kind of follow up is- I know from my high school teaching experience how resistant teachers are to this kind of data especially because they don’t understand it. What are the challenges in explaining this to people that are in the trenches? To everyday teachers and how can they actually use this data to help in these kinds of situations? But first, how do you explain this?

Webster: Yeah, that’s a great question. I think generally as Nicole was mentioning all of the work and research that we do really is about use and making sure that the analytics that we provide can inform decision making at all levels of the educational ecosystem. And it’s always a challenge to make sure that we’re really closely aligned to what teachers need and what Principals need and what Superintendents need. And social/emotional learning data is no different, as you mentioned. I’ll tell you a little bit about the context of where we’re working. So, we work with the core districts in California which is a construction of eight of the largest urban school districts in the state including Los Angeles, Oakland, San Francisco and many others. The core districts also have what’s called a data collaborative that encompasses nearly 100 districts across the state, serving about 2-million students which is about a third of all students in California. So, this is a really massive scale that we’re talking about, and one of the really exciting things about the large-scale nature of the work is that we’re able to do really interesting statistical research. We have tons of statistical power to answer all kinds of interesting research questions that are really outside the scope of your typical researcher in a University setting. For instance, you might be responsible for recruiting their own participants to take a survey measure. One of the challenges to that large-scale nature though, is that we’re not “boots on the ground” in classrooms with teachers, working with them to help translate the data, or working to help align their pedagogical practices to what they’re seeing in the data. But the core districts do have a really fantastic infrastructure set up where they are able to empower their local teams to really use these measures in a way that aligns best to their district’s needs. And so, the goal is to make a measure that’s general enough and flexible enough that if a district really wants to align improving their ninth graders growth mindset for instance, where they really want to focus on improving school culture and climate for their fifth and sixth graders. These measures would allow them to do that and we don’t necessarily take a strong stance on exactly what they should do once they have the data in their hands.

Bailer: Okay thank you. One of the things I’d like to just take a step back and explore some of the components of what socio-emotional learning is. You’ve listed four things there as constructs associated with it. The growth mindset, self-efficacy self-management, and social awareness. Could you describe what each of those things are?

Pier: Sure. Growth mindset is the belief that one’s abilities can grow with effort. This is popularized by Carol Dweck and her book Mindset. Self-efficacy is the belief that one can master the hardest topics in their classes and meet the learning objectives set up by their teacher. Self-management is how well one participates in the classroom and how polite they are, how prepared they are to achieve academically. And finally, social awareness is a measure of students’ empathy and how often they get compliments and also asks about their listening ability and how much they respect their peers.

Pennington: So how do you measure these things?

Webster: Good question. We have a survey of about 25 items. What’s really difficult is that there are only 4 items that we can ask to measure growth mindset and self-efficacy. So, we want to be careful that the items aren’t all asking the same thing. Those items would increase the reliability of the construct, but only really be asking about one facet. And we want to make sure that our constructs are distinct and measured to their completist ability.

Bailer: So, which of these do you think is the hardest to measure and why do you think that?

Pier: So, we’re working with a self-report survey, and I think the self-efficacy and growth mindset items are really suited for self-report because they are so internal. But self-management and social awareness are more difficult. They rely on a lot of self-awareness to answer accurately. There’s also a social desirability bias so it might be better for a self-management or social awareness survey to be given to teachers and then they would report on the abilities of their students. Because a student saying that they gave a compliment, or they played with students on the playground could not show as clearly a how they adapt to social cues.

Bailer: Do you calibrate some of those two measures, the self-management and social awareness, have you ever calibrated the scale you’re using with teacher ratings of such attributes.

Webster: We ourselves haven’t done that. We work with a network of researchers across the country as part of the Core District Research Initiative. So there has been some work to align teacher reports with the student self-reports since the core districts early on when they piloted these survey measures also had staff surveys and parent surveys. And they found generally pretty strong correlations, but as you can imagine there’s certainly differences. And Nicole really hit the nail on the head when saying that you really want to think about these as students’ beliefs about their SEL, rather than a quote unquote true assessment of SEL. There’s lots of other measures out there too that are direct observation measures, where a teacher might rate a student’s behavior directly or performance-based measures where a student might engage in a simulation or a game as well as other item formats beyond a large-scale item asking students to agree- strongly agree or strongly disagree on a one to five scale. So, some of those items are choice items or situational judgment items, we have lots to say about the technical underfittings of some of those things, but student’s self-report is really something that you can do quickly and inexpensively and at a large scale. Some of those other formats even if they might give you additional information, they can be expensive or more time consuming or harder to do for a large number of students.

Pennington: You’re listening to Stats & Stories and today we’re talking data and education policy with Libby Pier and Nicole Webster of Education Analytics.

Campbell: So, I have a question about- I know this- the data base here is large and you’re doing some things that haven’t been done before, but do you have any early examples of how your work has helped schools make better decisions? Which is I think what you’re set up to try to do.

Pier: Yes, so there is a paper that’s been put out by Policy Analysis for California Education, which is a policy research center out of Stanford University that serves as the research arm of the core districts. And they have a paper where they summarize some of the school level and district level practices that have emerged as a result of measuring that social-emotional learning with this survey. I think one of the most concrete examples that they talk about in that paper is really messaging the importance of these skills. So, when you put measures of social-emotional learning in school culture and climate side by side with math test scores and English test scores as the core districts do in their online dashboard that we build and house at Education Analytics. That really messages something important to the districts and to the people that work in the districts, that this is something that matters just as much as test scores which for a really long time were the only things that factored in to measuring or assessing school quality. And similarly making sure that these measures communicate the measures of the districts we’re placing on measuring and improving non-cognitive abilities. I think we’ve seen that trickle down into a lot of important practices at specific schools or in specific districts, that “hey, this is something that matters and even if it’s hard to measure, we’re going to try, and we’re going to try to see what’s possible”.

Bailer: So, you just mentioned the value-added work that comes with studying these social and emotional factors, but when you look at a student’s progress, how do you tease out what’s the teacher and the school system versus what’s the home life, what’s the effect in terms of progress? I know that’s a hard thing… sorry about that.

Pier: So, the value-added research is trying to grow beyond looking at a student’s attainment and how well they scored on post test and DeBeers score, so by taking into account their prior achievement and other demographic factors we can assert that we’ve controlled for everything except the impact of a school or teacher and some idiosyncratic student effects. So, then we’re able to disentangle from the [inaudible] what part is an idiosyncratic student effect and what part is the school and the teacher after controlling for demographics.

Campbell: Give me an example of an idiosyncratic student effect.

Pier: Sure, like what you said about a student’s home life, or just if a student had a bad day, they could score poorly on the test, or if they didn’t sleep well, these are a whole bunch of different student components.

Bailer: So, when you’re looking at these models do you apply them to students that weren’t used in the set to develop the models to see how well they continue to work for groups that weren’t part of building them? Do you add a sample prediction or apply to other districts?

Pier: In terms of a value-added framework, we don’t really do out of sample prediction, we use the students that were linked to or attributed to a particular teacher or school to measure the impact of that teacher or that school in that academic year. We do have some research that we do around out of set sample prediction again in collaboration with the core districts where we’re trying to predict student’s college and career readiness based on a whole suite of variables, including their test scores, their GPA, the rigor of the courses that they take, their SEL, and for those models we’re trying to make predictions for students in grades three through twelve, but we don’t have a longitudinal data set that covers grades three through twelve and college and career outcomes, so in that case what we do is use a chain-linking approach to try and predict ninth grade variables based on eighth grade levels, eighth grade based on seventh, and so on and so forth. And for those models as well where we’re calibrating on a particular cohort and then making predictions for later cohorts, but we’ve seen in that research the importance of continually updating the calibrations because things change in school districts from year to year.

Bailer: Can you give an example of something that you’ve learned that completely surprised you or a school district in analyses you’ve done?

Webster: Oh. Maybe the growth mindset performance that we’ve done.

Pier: Sure. Okay the growth mindset construct- I guess we are working on our research paper right now about rejecting a fixed mindset and how that’s different from adapting a growth mindset. The original items that we asked in the core survey were all negatively worded about rejecting the fixed mindset. So “I’m not capable of achieving in a class that I’m not naturally good at”. And then we reworded them so that they’re less confusing for students, then we can better measure whether a student is adapting a growth mindset, so instead it now says, “I can achieve things in the classroom even if I’m not naturally good at them”.

Bailer: So, you just gave an example of rewording a question so when you’re posing these questions, where’s your input? Who writes the questions on this and how much goes into that?

Webster: It’s really important to consider the content as well as the quantitative performance of an item. The original items were written by Carol Dweck and when we assessed the performance of the rejecting a fixed mindset items, we noticed that students in younger grades are having trouble understanding the negative wording. So, after recognizing this, because our survey is given to third through twelfth graders and Carol Dweck and Neil Farrington don’t recommend that the growth mindset items are assessed at very young grades, where they might have trouble understanding the growth mindset concept, we chose to reword them to that they’d be better applied to younger grades.

Pier: More broadly, the core districts went through a process of working with content experts like Neil Farrington and others to pull from measures survey items that have been used in research and shown to be valid and reliable, and worked with the stakeholders at the districts to say- for them to be able to weigh in on what constructs were important for them to measure and specifically what items they really wanted to prioritize. So, it was a good example of bringing research base to bear with practitioners to assess the things that they decided mattered most for their districts.

Bailer: You know one thing you’ve alluded to, but we haven’t really explored is the idea of what factors might impact this socio-emotional learning. And they’re the factors that are maybe characteristics of the student, but also there may be some cultural characteristics of schools. So, can you give a quick fly-over of some of the student characteristics that seemed to differ with respect to some of these SEL constructs and then also how school culture can help with the growth of these?

Pier: The core survey is actually two components. A socio-emotional learning component and a school culture and climate survey. The school culture and climate survey look at whether students feel safe in their schools and how clear the rules are and whether they feel like they belong in their schools. And these two components, though we often don’t look at them together, are very intertwined. when a student feels comfortable in their school and feels like they can participate in their class they have better academic outcomes and better non-cognitive traits. There are student factors that could contribute to both their social and emotional learning and their academic outcome, such as how well they respond to teacher’s expectations and some of the cultural norms around demographic or other homeless and foster composition of the school. Whether they’re learning English or are already fluent. So, these student characteristics can be aggregated also to then to the school level and you can look at whether a student teacher ration or the school average levels of different demographic stuff to assess how a school is performing in their social and academic measures.

Bailer: So, as a follow-up if you looked at a student from a socio-economically disadvantaged or economically disadvantaged background, that their trajectory in terms of self-efficacy or growth mindset would differ from the trajectory from a student that you might find in a more economically advantaged situation.

Webster: Very true. We’ve seen that in one of our papers but the gaps between economically disadvantaged and economically affluent students decline in high school, we’ve also seen similar trends with different racial groups. One particularly interesting, but not necessarily surprising finding, with self-efficacy is that student’s self-efficacy responses tend to decline in middle school, in this very awkward time of life, and then tend to pick back up again towards the end of high school. What’s even more fascinating is that both girl’s and boy’s self-efficacy declines in middle school, but girls have a sharper decline. Self-efficacy, on the survey, some districts assess self-efficacy from a global perspective, how well you can master the hardest topics in all of your classes, and then there’s also subject level competencies. So how well I can master topics in my Science courses or my Math courses, or my English courses. So, looking at the breakdown between girls and boys globally and also in those subject areas is very fascinating.

Campbell: One of the criticisms of standardized testing has been- and I think this is particularly at the high school level, is the criticism that teachers teach to the test, and do you find- how does that factor into your research and what you’re trying to do especially with studying the social and emotional learning elements of all of this- of education?

Pier: Yeah, it’s a really important question. I think we’re in a different era of accountability that we were ten years ago the core districts in particular often use this phrase that I love which is “using data as a flashlight, not a hammer”. And that it’s really meant to illuminate best practices facilitate schools and districts being able to learn from each other, and also shed light on which schools and which districts might need some more support and more resources to help them improve their students’ outcomes. We’re always concerned about teaching to the test or gaining any measure that would be used for any kind of accountability and so we’re skeptical at this point in our research about whether or not these sorts of surveys should be used to really be integrated into an accountability system. Right now, they’re just in the core districts, like I mentioned, and their dashboards so it’s information that’s side by side with other variables and data points that they care about, but there’s no stakes tied to improving your student’s social and emotional learning in a way that a state might have stakes tied to improving test scores. And we think that’s really important, and I’m really optimistic to see as we continue down a broader definition of what it means for students to be successful, how we can help schools know if they’re doing well without them feeling gear that they have to get their student’s scores up to a certain level “or else”, because that’s where you really start to risk seeing some of the wonky implementations of these kinds of assessments.

Bailer: So just a quick question, you’re tracking socio-emotional learning up through twelfth grade, what happens after twelfth grade? What do we see into the University life, and beyond, as people go through their lifetime? Is this maxing out? Or is this something that continues to change?

Pier: So, a really important question. I think that we probably are a little bit less up to speed on what happens after K-12, since that tends to be the area we focus on most. But I know that there is a lot of really interesting work out there in assessing these kinds of skills for students in college and you know, at Community Colleges as well. We do know that these skills continue to develop over time, and in the education space there’s a growing emphasis on teacher’s social and emotional well-being as well. The idea that if we want students to have a strong growth mindset and be able to regulate their emotions in appropriate ways, we also need to have strong adults around them who can model those skills and teach those skills. So, they’re mailable at all parts of life. It’s not only in early childhood, for instance, where these things can be taught. I think most adults can probably reflect on times at work when they’ve maybe not regulated their emotions so well, or times when they’ve doubted themselves so there’s still certainly a lot that can be done after twelfth grade.

Campbell: Did you hear that John?

Bailer: There’s hope for me Richard!

Pennington: That’s all the time we have for this episode of Stats & Stories. Libby and Nicole, thanks so much for being here today.

Webster: Thanks to all of you.

Pennington: Stats & Stories is a partnership between Miami University’s Departments of Statistics and Media, Journalism, and Film and the American Statistical Association. You can follow us on Twitter, Apple podcasts, or other places where you can find podcasts. If you’d like to share your thoughts on the program send your email to or check us out at, and be sure to listen for future editions of Stats & Stories, where we discuss the statistics behind the stories and the stories behind the statistics.