Mark Glickman, a Fellow of the American Statistical Association, is Senior Lecturer on Statistics at Harvard University, and Senior Statistician at the Center for Healthcare Organization and Implementation Research, a VA Center of Innovation. He is well-known for his work in games and sports, having created the Glicko and Glicko-2 rating systems that are widely used in online gaming. Mark co-organizes the biannual New England Symposium on Statistics in Sports, has been Editor-in-Chief of the Journal of Quantitative Analysis in Sports, and has been the chair of the US Chess Ratings Committee since 1992. More recently, Mark has embarked on projects in music analytics. His work on authorship attribution of Lennon-McCartney songs has received widespread media coverage.
+ Full Transcript
(Background music plays)
Rosemary Pennington: There are a lot of things Beatles fans can argue about - which album is the best, for instance, or which Beatle was the better songwriter. Songwriting, in fact, has been a site of some heated debates over the years. Early on, John Lennon and Paul McCartney agreed to share writing credits on the songs they wrote for the band. Since the group split up, fans have been trying to figure out which songs belong to which songwriter, leading to sometimes impassioned arguments in bars around the world. Recently, researchers converted the songs into data in an attempt to figure out which songs were written mostly by Lennon and which songs were written mostly by McCartney. One of the researchers is the guest on today's episode of Stats and Stories, where we explore the statistics behind the stories and the stories behind the statistics. I’m Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media journalism and film as well as the American Statistical Association. Joining me in the studio is regular panelist John Baylor, Chair of Miami Statistics Department. Richard Campbell is away today. Our guest is Mark Glickman, a senior lecturer of statistics at Harvard University. He's also senior statistician for the Center for Health Care Organization and implementation research. Mark, thank you so much for being here!
Mark Glickman: Well thank you very much for having me.
Pennington: My big opening question is, how does a statistician get involved in figuring out authorship of Beatles songs?
Glickman: Well that is the big question. Well. I guess the starting point is, you know, it doesn't hurt to be a Beatles fan!
Glickman: Right? So I actually became interested in the Beatles back when I was about eleven years old, which was well after they broke up. You know, I was really pretty fascinated with them and my interest in them had, you know, pretty much continued for my entire life I mean, maybe not quite as passionately as back then.
Glickman: And then, along the way, I ended up meeting this colleague about four years ago, my co-author Jason, who turns out also be a Beatles fan and we were talking about some Beatles related things. The topic just came up because he had done some work on some Beatle related topics and we were discussing the idea of being able to predict this one particular song, the song In My Life by The Beatles, and you know, he made a big point about saying, do you know that one particular song is in a big dispute about which of Lennon or McCartney wrote it?
Glickman: And he had some ideas, based on some of his mathematical background, how to identify whether McCartney or Lennon wrote it. But you know, the statistical instinct in me kicked in and I had some thoughts about how we might address this problem and so we ended up forming this working relationship. And that's what led to this work.
John Bailer: So why that song in particular?
Glickman: Well that song is a kind of a special song, it's listed as the 23rd best song of all time according to Rolling Stone magazine. So, it has a certain cache just to begin with. So that song, you know, kind of drew us to doing this work. But that's not the only song among the Lennon McCartney songs that are in dispute. There are a bunch of other ones. But this is the one that really was the main motivation for our work.
Bailer: Were there particular songs where it was clear who was the lead author of them?
Glickman: Yeah. There's a special way that you can actually tell, generally speaking, which of the two songwriters actually wrote the song and that's who sung it.
Glickman: Unfortunately that doesn't always work. I mean, part of the problem is that you have a whole bunch of songs that were written by Lennon or McCartney and they handed it off to George Harrison, the Beatles’ lead guitarist…
Glickman: …to sing the song and you know there are a couple songs where even in that realm, it's not really known whether McCartney or Lennon wrote the song. So for example, there's this very early Beatles song - Do you want to know a secret?
Pennington: Um hmm.
Glickman: And it's, you know, George Harrison sang this on the recording, but one of McCartney or Lennon actually wrote it. And it's not known which of the two wrote it. I mean, I suppose the strange thing is that you know, over the years both - well McCartney, you know, certainly over the years and then Lennon until you know, his death, in 1980. You know, I've been asked about, you know, who wrote what songs and generally speaking, they would agree, most of the time, who wrote what.
Glickman: But there are these handful of songs that they just have conflicting memories about. And you know, as an academic, you know, it's kind of mind boggling that you wouldn't be able to remember if you wrote a song or not…because it's almost like the same thing as you know, not remembering if you've written an article.
Glickman: You know, like if you're a co-author in an article, whether you wrote it or not, or you know what your role was in the article. But that's the analogy to this song writing context.
Pennington: So I was reading an article about this that sort of described what you did. As I said in the intro, it is kind of turning the songs into data. So how exactly are you taking - In my life - and turning it into data that you can then statistically analyze?
Glickman: Right. That's a really good question and there are a ton of choices that one can make in converting the musical features of a song, into you know, some data object that you can analyze. What we ended up deciding to do was to describe, to basically think of the music as if it were a document of text.
Pennington: Um hmm.
Glickman: So in a document of text, you basically have a whole bunch of words in sequence and a very typical analysis of text involves counting the number of times certain words occur, or counting the number of times like pairs of words occur in a sequence or three words appear in sequence, and when you count the number of times that those particular features of a text occur, you're essentially coming up with some kind of fingerprint about the document, you know, that frequency of occurrences of all these different words is a particular way of characterizing an entire document. So what we did was we identified, depending on which version of the analysis we did, we captured basically about one hundred forty nine or one hundred fifty nine different musical features within these songs, which come about from taking the entire song, stretching it out and having – labeling parts of the song as if they were like individual words. So to be very concrete, what we did was, we took for example, the sum melody of the song, and we wrote down the individual notes in sequence and then what we did was, we just simply counted the frequency of individual notes occurring within the song that would be one representation of a song. Just counting the number of times, say you know, that the note that is in the key of the song, the tonic note occurred, counting the number of times that the subdominant note occurs. So all these particular notes in a scale, we're counting the number of times they occur. But then we would also count the number of times that there would be no transitions like going from, you know, the root node to the third or the root note to the fifth so that would be considered like its own - like it's been - that we're counting over. So what you end up getting out of this whole process essentially is, counting the number of times that individual notes occur, counting the number of times pairs of notes occur, but then we did the same thing with chord sequences so because the chords in a song is another part of the song. It's another characteristic that's inherent to describing the song. So we counted the number of times certain chords appeared and then counted the number of times chord pairs, that chord sequences occur, chord transitions. So that actually accounts now for four different representations.
Glickman: There is a fifth one which is something that's called the contour of a song. It's basically describing the shape of the melody, whether it generally goes up or whether it generally goes down or whether it modulates a little bit. And so there are about two different categories that we count over for the - you know, for these contours. What we get in the very end, the end product is, we have what amounts to being one hundred fifty nine different categories where we're counting the number of times each of those categories appears in a song. So you could think of it as an individual song gets translated into one hundred fifty nine numbers where each of those numbers is counting the occurrence of that feature.
Glickman: One hundred and fifty nine different features.
Bailer: OK. So when you when you looked at these, were there particular features that really stood out as differentiating between Lennon and McCartney?
Glickman: Yeah that's one of the things that pops out of our modeling. What we're able to get is what - you know - what individual features really kind of light up when we perform our modeling, because the modeling is going to be based on the songs of note authorship and so we have for every song of note authorship, we have the author, but then we have this like - one hundred fifty nine different categories, and we're counting over and so I can give a couple examples of categories.
Pennington: Yeah, that’d be great.
Glickman: So one that’s very interesting is that one of the categories that we capture are note transitions that span greater than an octave, while still staying on what's called the diatonic scale. The diatonic scale is essentially the scale of the notes that are essentially most often used in the key of a song. But the main focus here really is on having a melody job, a transition in notes that's being sung that’s very large.
Bailer: So can you give an example of a song or two that has that?
Glickman: Yeah absolutely. So one example might be the song, the early Beatles song, Love Me Do.
Glickman: So there's one point in the song - so I'm going to embarrass myself by singing a little bit.
Bailer: Oh you're a good man, you're a good man.
Glickman: Brave might be better than good!
(Music plays in the background)
Glickman: So there's one point in the song where the singer is going – Some want to love…somebody knew…so that jump from some want to love, that note, and then jump to whaa…that's over an octave.
Pennington: Yeah, yeah.
Bailer: So that's one example that turns out to be a Paul McCartney song. There's one other song in the seventy songs that we that we examined by Lennon McCartney in the time period that we looked at, which also has a large jump which is Eleanor Rigby.
Bailer: OK. Glickman: So in Eleanor Rigby, during the chorus, it goes - All the lonely people, where do they all come from - so that jump from – where do- that's an octave in a third, so that's huge and that's also another Paul McCartney. Bailer OK. Glickman: So Paul McCartney was the one who ended up picking quite a bit of liberties in big shifts in the melody. You know, big jumps and Lennon intended not to do that. (Background music plays) Pennington: You're listening to Stats and Stories, where we discuss the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington with Miami University statistics department chair John Baylor. Our guest today is Mark Glickman, senior lecturer of statistics at Harvard University and the author of a study that is looking at data that can help us understand which Beatle wrote which song. And so before we broke, Mark, you were talking about the time period and the number of songs. So seventy songs, that sounds like the number of songs in dispute over authorship. Can you talk about the time period you chose for this study and why? Glickman: Sure. So just to be clear, the seventy songs are ones where we do know the author. Pennington: Oh OK. Glickman: And…yeah. Because those are the ones where we build our model. Pennington: Oh so you are using the model. OK. Glickman: Right, and then the ones where we don't know, I think they are seven? I’m not going to get the number exactly right. The reason I'm not getting it right is that we also ended up applying our approach to songs that were known to be collaborative, because we wanted to see how the model worked on those as well.
Pennington: Oh, that's interesting.
Glickman: But to answer your original question, what we did was, we focused on Beatles songs or Lennon McCartney songs that were written or recorded during the period 1962 to 1966. And that included the albums going from Please Please Me which is the first Beatles album, up through the album Revolver, but not going beyond, like so starting with the album Sergeant Pepper's Lonely Hearts Club Band. At that point, the Beatles were exclusively producing music in the studio and not performing in front of large live audiences. So they were focused on their music, and the music tended to get much more complex.
Pennington: So it's harder to break the puzzle.
Glickman: Right. So the idea here is that if we're going to be coming up with some kind of musical fingerprint for Lennon and for McCartney, you know, we want to do it in a realm where you know, we kind of view their style as not changing all that appreciably.
Pennington: Yeah, yeah.
Glickman: The problem is, once you start going beyond, you know, their non-studio focused work, you know, their style definitely changed over this period of sixty to sixty six but once you get beyond 1966, you know, there's a lot of experimental ideas and you know, very unusual kinds of harmonic…
Glickman: …motifs poured into the music. So that's why we stuck with that shorter time period. Bailer: Yeah, that was going to be my next question…it seemed like it would be really interesting to think about the evolution of songs by a particular songwriter over a career and you can look at, maybe, was there even some observable, gradual shift between sixty two and sixty six?
Glickman: We at we haven't looked at it, so we haven't analyzed that quantitatively, but I think most Beatles fans would probably acknowledge that, you know, there are pretty noticeable changes. You know from the early years where the songs tended to have you know much more of a blues or rock'n'roll motif whereas the songs closer to sixty five and sixty six tended to be you know much more musically interesting and you know generally getting away from you know these like standard eight bar blues progressions, which was, what were, characteristic of their music early on. But we did not analyze that quantitatively. But I will say, just to follow up what you're suggesting is I think John, you're right on the mark here. One of the future projects that we want to explore and once we are satisfied that this is all working is several things. One is to you know, study the time varying signatures, very musical signatures. So rather than say over you know a five year, six year time period, like what we're doing right now with the Beatles, you know, assume that the signature might vary stochastically over time. And then be able to pick up like where are the changes like you know it's like certain chord changes more or combinations of chord changes more frequent in you know the songwriters style as they progress in their career or are other features changing over time. So that would be another thing to detect. You know, the other idea too is that we can take these musical signatures and rather than just analyze Lennon McCartney songs, we could analyze a much larger corpus of song writing so that we're able to get you know all these fingerprints for all these different musicians and then be able to even like figure out some influence. So you know like if two musicians have pretty similar styles particularly one like towards the end of their career and another at the beginning and then you know the years kind of line up then you could probably start making some, you know, some inferences about whether one musician has influenced another.
Pennington: It could be really interesting to sort of think about like the sixty's were such an interesting musical decade where you start with what feels like not simple but I think bit more straightforward kinds of compositions and then by the end of the sixty's, you are getting into all this crazy psychedelic stuff and it'd be interesting to see if your model could also sort of look at evolution broadly over - across styles or artist even just to sort of look at that kind of thing.
Glickman: Yeah absolutely and I'll be the first to say that the features that we’re examining right now probably are not enough to be able to do that kind of characterization. You know, once like you said, once you start in the late sixty's and early seventy's you know that kind of music tends to get into you know rock groups like Yes or King Crimson, you know, all these like you know music that starts sounding like you know modern classical music very intricate very eternal. So you'd probably need a different dictionary to be working with relative to what we've been using with the Beatles.
Bailer: Yeah that was…it is interesting actually that you mention that. I was going to ask you about the generalizability of these constituent parts that you're measuring for the music to other performers. I mean do you think that this would still apply to other musicians in sixty two to sixty six window or do you think that you would need some other characteristics to be considered?
Glickman: My sense is that for the early to mid-sixty’s this would probably be pretty good. You know just again this is purely based on my musical knowledge which you know is...I don't claim to be you know super deep but like you know from you know my musical knowledge you know the music in the early sixty's generally was riffing off Eight bar blues and you know some departures but wasn't very musically rich and by the time you get into the mid sixty's for most groups you know, most groups are trying some pretty interesting musical ideas but not quite at the level of being so experimental, you know maybe with a couple exceptions like you know Pink Floyd comes to mind, that the music actually can be described using fairly simple musical features derived from melody lines in harmonic structure like we’re doing for the Beatles.
Pennington: This is a work in progress and you have already received a fair amount of attention for that so I guess I'm going to ask kind of what has that been like to be sort of in the midst of this and then also sort of be dealing with I know there have been a couple stories about this task to be juggling your work and also trying to figure out how to present that work in media about a band that, I mean is perhaps one of the most beloved you know sort of cultural figures, icons of the last century.
Glickman: It's been interesting. I mean I suppose one of the nice things about this project is that it's generally easy to explain it to non-quantitative people to the extent that you know we've been having these conversations with the media. It's pretty straightforward to come back and say well look you know here are some of the musical features that highlight the difference between McCartney and Lennon through our analysis and then it's just easy to point to those features and give example songs and you know in some ways that actually, you know it is able to resonate I think with people that aren't really quantitative. So I suppose in that sense it's been you know it's been kind of a fun process to you know kind of parade this work around. On the other hand you know we're taking this work seriously you know we you know we're doing our due diligence to make sure that our model is being fit properly which involves a lot of checking. You know a lot of outer sample prediction to make sure that the predictions are calibrated well so you know right now we're in the process of you know finalizing these details and so you know we haven’t…you know at this point we haven't made our work public because you know, as my colleague is quick to point out based on his past experience with this kind of work that you know you have really one chance to get it right.
Glickman: And you know we don't want to you know we don't want to come up with you know put something out on the public and you know basically realize that you know we didn't dot our I’s and cross our T’s, and then you know kind of get a lot of pushback. So we're being pretty careful with this project before we you know release the technical report which will be submitting to peer review journal.
Bailer: Yeah, I think that’s really smart, especially given the interest this has generated today you know that when you go live with us it's going to get going even more.
Glickman: I hope I mean.
Bailer: Fair enough, fair enough.
Glickman: Hey, I want my life back.
Bailer: Hey, you picked this project man!
Glickman: Yeah that's true I know I deserve it we get what we deserve.
Bailer: Hey I know…
Glickman: It's not it's actually quite a bit of fun and I'm actually hoping….but you know specifically Once you know once we've ironed out all these details I mean I you know certainly would love to have the opportunity to you know give some you know technical talks about it while toting my guitar around you know so I can actually give some demonstrations and I done that a couple times so far but you know having it you know having this be something where you know I could explain bit more of the details and but still make it a fun talk by having you know having it supplemented with some live music might not be so bad.
Bailer: That’s cool! You know you mentioned earlier that one of the things that was characteristic of McCartney's songs were the big jump. I'm wondering do you have sort of a similar characteristic that really was a tell-tale sign of a Lennon piece?
Glickman: Yeah there is. It's a little more subtle so it's probably not is obviously noticeable but yeah so one of the so I could tell you this might this might provide a little bit more context for what I'm about to tell you which is that, you know John Lennon came from a fairly broken home and you know he didn't have a very supportive upbringing whereas Paul McCartney in some ways had this…had slightly the opposite experience. I mean his mother died early but he had a very supportive home, his father was a musician. And you know he had a very positive environment and the way that sometimes gets manifested is that John Lennon tended to be a little bit more...kind of withholding in or taking fewer liberties in the way he would write music, he would tend generally to rely on slightly more standard musical ideas than Paul McCartney tended to. So you know one artifact of course with Paul McCartney is these large musical jumps these melodic jumps is you know sort of an artifact of his willingness to take these kind of liberties. Whereas John Lennon tended to stay in slightly more confined regions of notes when he was singing and he would also tend to rely on more standard kinds of musical ideas. So one of the things that came out of this model is a particular chord transition that John Lennon would use much more often than Paul McCartney would use and this may or may not have meaning with it. I’ll just say it. What John Lennon tended to do was use the chord transition that goes from the tonic key, the tonic chord of a song which is like the root chord and then translating or making a change to the minor six and that might just sound like you know a kind of musical jibberish, but going from the tonic chord to the minor six is one of the most natural chord changes in all of pop music. It's essentially going from the root of the major key to what's called the relative minor it's a very natural transition John Lennon tended to use that much more often than Paul McCartney did and again that's also consistent with the idea that John Lennon tended to use more standard musical…I’m sorry, John Lennon used more standard musical ideas. There's I mean I can give a quick example…
Pennington: Is there…yeah. Are there are a couple of songs that you think are that…
Glickman: Yeah I mean let me…let me think. So let's see there...he does it he does it in this song Help, this song It's Only Love, which is a little bit more obscure but at the very beginning of the song it's only more like if you call it up really quickly.
Glickman: Those two chords you just heard, that translation from the chord that sounds happy to the chord that sounds slightly sad…
Glickman: That's going from that tonic to the minor six.
Glickman: So that change John Lennon uses all the time and Paul McCartney uses very infrequently.
Pennington: So Mark one of the questions that we often ask when we have guests that join us is if we had students or others that are interested in getting involved in the type of analysis that you're describing here what are some of the things that's that they would study or that they would do to be able to play in the same space?
Glickman: Right. Well so the techniques that we're using are you know to analyze our data are very closely related to the ones that tend to be used in text analysis. So the kinds of topics would be things like topic models, models for bag of words. I promised you that I would mention what a bag of words is. So bag of words just simply means that I have these like a document of words and I'm taking the words and I'm grabbing them all together and then I'm distributing them into different bags and I'm counting the number of times they appear in each individual bag.
Pennington: Oh yeah yeah… Glickman: …bag of words model. So but there's a whole area in machine learning called Bag of words models and that would be you know the place to start. You know learning about the techniques that are used to analyze texts, word frequencies in text.
Pennington: So Mark that's about all the time we have but I do have one last final question for you.
Pennington: Are you sick of the Beatles yet?
Glickman: Never. Never, never, and that's the beauty of the Beatles. The beauty of the Beatles is that you can just keep on listening to songs and hear something new all the time.
Pennington: Well thank you so much for being here today.
Glickman: My pleasure, thank you very much for having me.
Pennington: Stats and Stories is a partnership between Miami University's departments of statistics and media journalism and film and the American Statistical Association. You can follow us on Twitter iTunes or other places. You can find podcasts. If you'd like to share your thoughts on the program send your e-mail to firstname.lastname@example.org. And be sure to listen for future editions of Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics.
(Background music plays)