Listen

Description

Ed Feng discuss his algorithm, The Power Rank, for forecasting the outcome of sporting events. As a Miller Fellow, he studied statistical physics. We explore his path from academia to Silicon Valley start up and sports analytics.

Transcript

Speaker 1:        Spectrum's next 

Speaker 2:        [inaudible].

Speaker 1:        [00:00:30] Welcome to spectrum the science and technology show on k a l x Berkeley, a biweekly 30 minute program bringing you interviews featuring bay area scientists and technologists. 

Speaker 3:        Hi, I'm Rick Karnofsky. Brad swift and I are the hosts of today's show. Today we're talking to Dr Ed thing. I met Ed when he worked at Sandia national labs prior to this. Ed earned his phd from Stanford and then became a Miller fellow at UC Berkeley. His research [00:01:00] has focused on statistical mechanics and single molecule experiments, but I've asked him on today because the big game is tomorrow. I'd left his job at San Diego to become a Silicon Valley entrepreneur. He runs the power rank.com which algorithmically ranks sports teams and predicts the outcome of future games. He'll discuss sports analytics with us. Can you give us an overview of that ad? 

Speaker 4:        My background is in a field called statistical physics, so statistical physics is the study of large scale properties of bulk matter based on [00:01:30] it's small scale units. So for instance, in, in a physical system that, that I would study, you can, the large scale property would be like the pressure of a gas or the temperature of a gas and that gas would be made up of atoms or molecules. So those would be the units. I was here at Berkeley and we actually studied a single molecule experiments. And so these are these pretty amazing experiments where you take a laser and grab onto the end of a molecule. So for instance, [00:02:00] like a DNA molecule can grab onto both ends and you can stretch it, pull it, do whatever you want to it. And so these are some really amazing experiments going on in the physics department. 

Speaker 4:        And my work here was doing some theory and some math and some computer simulations to understand those experiments. At some point I got a little disgruntled with the academic life and started looking for some other jobs and I had an interview lined up with Google. And so I thought I should do some homework. I researched their, their pagerank algorithm. So, so this is the [00:02:30] algorithm that that made Google. Okay. So it essentially ranks all web pages. And the intuitive idea behind it is that a website should be ranked highly if other highly ranked websites point to it or link to it. Okay. So you're essentially using the link structure of the web to make a ranking of all websites. And it's really elegant because it considers the whole link structure of the web, not just kind of the local link structure around your particular website. 

Speaker 4:        And again, it's a perfect example of statistical physics. So here [00:03:00] are the large scale properties are the overall rankings of all the websites. And then the small scale units are the individual websites. There are these amazing parallels between statistical physics or what I've been studying. And then patriot. So anyways, I got super excited about this, went to my interview and said, Hey, you know, I'm really excited about working for Google because this is what I've been doing for the last eight years of my life, you know? And, and my interviewer was like, uh, okay. [00:03:30] So needless to say, I did not get the job, but it sparked some ideas about how I could use page rank to do other things, maybe more important things like rank sports teams and figuring out college football and, and things like that. So, so some ideas brewed in my head and it's okay to do it about a couple of years ago, so this was about 2008 so that's how this all got started. 

Speaker 4:        What kind of modifications have you had to make to the page rank algorithm to actually make it work for sports? So it turns out that [00:04:00] pagerank actually doesn't work for sports [inaudible] you can show that it does not have some properties that you would like. Uh, I was carpooling to the national lab over in Livermore with, with a guy named David Gleich and I was telling him about this hobby that I had on the side and Oh, I applied page rank to sports and most people are like, oh cool. But they was like, it seemed completely disinterested. And the reason was because he had actually written his phd thesis [00:04:30] on page rank, so he was entirely on impressed that, you know, it's like, oh, of course you can apply it to sports. And then he was telling me about this paper that they were showing that you you can't, you can't actually apply to sports. 

Speaker 4:        It doesn't have some properties that you need. And I can't really describe all the math on the air. Probably not feasible, but you need to, you need to modify the links in an inappropriate way such that I worked for two teams that are playing a game that end up with a final score. And are those modifications the same regardless [00:05:00] of the sport or the League that the sport is in? Maybe I should go back a sec. So the power rank not only gives you a rank of all the teams, but it also gives you a number. So we call this like a rating 14 that rating is in the unit of points. Okay. So when the, the algorithm gives you a prediction in the sense that if you take the ratings of two teams and subtract them, you get a predictive point spread for a future game. 

Speaker 4:        So one of the modifications that you have to make is you have to make a link, you know like a one [00:05:30] zero kind of link. Makes sense in terms of points. Okay. And in order to do that you need to put some non linear behavior in there. The paper is actually super interesting because it says, oh well page rank doesn't work for ranking sports teams. And then it goes on to make the conclusion that some of the algorithms that go into determining a national champion and football are good, which is about the only place that you'll ever hear any type of praise for these algorithms. Cause most people can't stand because they're, they're these mythical computer algorithms that [00:06:00] no one knows much about and they help determine the national championship. And I know one that roots for the third place team and at the end of the season is really happy with that, that type of situation. 

Speaker 4:        But these people are saying, Hey, well these algorithms are actually good. They work better than page rank. And so David was complaining was like, well, you know, don't just say page rank is wrong. Like fix it, you know, fix it and make it work for sports. And I was like, well that's actually what I did with the power rank. So there are these modifications [00:06:30] that that, you know, take a one zero link structure and make it work for points. And that works across all sports because there's a, there's a point structure for all sports, right? A goal is a goal. Beyond that, uh, there are some other things that you can do to make it much more sports-specific. One of the problems with kind of promoting any ranking system in college football, um, which is kind of my primary focus is that, oh, well, it's just like these silly rankings that determine the national championship. 

Speaker 4:        So right now it's a listing of teams and [00:07:00] it looks like every other of 50 to a hundred people that have tried to rank college football teams with their ideas in math and physics or whatever they're doing stats. You know, I'm a fan and I actually care about these rankings and I want to go a little bit beyond. So I'm looking at how to separate the offense and the defense. So right now you'll hear a lot about, oh well Oregon has the nation's top scoring office. Okay, well does it matter that they played a bunch of crummy defenses last year? And a lot of people, you know [00:07:30] the people, people talk about strength this schedule cause it matters. So they talk about it when like, oh well you know Oregon, uh, so, so Auburn went undefeated last year and that's good because they played a very hard schedule, right? And so they talk about it then they quote these stats that are completely independent of strength, this schedule. So, for instance, Oregon I think scored 43 point whatever points their offense last year, but it doesn't account for who they played and whether pack 10 defenses were bad. And so I'm trying to use the algorithm to [00:08:00] separate out the offenses and defenses and really kind of give insight to each of the units. 

Speaker 2:        [inaudible] you're listening to spectrum on calyx Berkeley, we're talking with sports analytics expert Ed thing about the powering [inaudible] dot com [inaudible] 

Speaker 4:        [00:08:30] kind of the end goal for a lot of things I'm doing with college football is to not just break down on offense and defense, but break down a passing, rushing and then special teams. And then that's where I think it gets really interesting to your offensive coordinator. Where do you your data? Oh, I get my data from Yahoo. So I just saw publicly the level, everything I [00:09:00] have is publicly available. Um, at some point that might end, but oh, I'm still at a point where I have all the data I need and I need to analyze it. 

Speaker 4:        How successful is your app Algorithm? Well, pretty good. So it's a work in progress. You know, I don't want to go on the air with these very smart people that listen to the show and tell them about the 34 ball games last year that we beat the line of 55% [00:09:30] because you guys will know that that's a really small sample size for the 60% that we'd beat the line last year, uh, or two years ago with the college football ball games. So I've looked at the NFL for the last five years with the kind of rudimentary version of the algorithm that I have right now without any of the offense and defensive modifications. It's beating the line at about 52% so that's a not enough to make money, but the house has a take and so you have to win it about 52.4% in order to make money. 

Speaker 4:        [00:10:00] But you also have to consider that this algorithm does not account for injuries, does not account for coaches that sit their players the last week of the season. Um, any one of the number of things, the rate gets higher in the playoffs. I think it was about 55%. Again, small sample size. So I'm not going to tell you guys that that's anything significant. On other result, we have a, we looked at the last four years of the NBA. It wasn't as good in predicting the outcome of results against the line. Uh, so it was about 51%. [00:10:30] Why do you think that was? So I think it's because of the, how you file at the end of games. Um, I need to look into this more. Um, because we actually saw some stellar results in the NCAA tournament. Um, so actually all neutral site post-season basketball, college basketball games last year we beat it at 59%, which is good. 

Speaker 4:        And then you can get super excited about that. But then you look at over 2000 NBA Games and it's not quite doing as well. So [00:11:00] at the end of a basketball game, you foul because you're behind, you need to get the ball back in order to score. And that behavior is, is a, is detrimental to your final expected score. Okay. So usually a team's score is about a point every time they get the ball. When you fall, you're giving them two shots and usually a team hits about 75% of their foul shots. And so you're essentially giving a point and a half every time you fall. So if you fall three times at the end of the game, you're making a 2.12 point swing. That can certainly affect the outcome of the line. [00:11:30] But in the NBA, uh, we did predict winners at a rate of 70% and that in and of itself doesn't tell you much because that's, you know, the rate at which you can predict winners is very related to how competitive the League is.

Speaker 4:        Okay. So if the league is not very competitive, you're going to be able to come pick, predict a higher rate of winners because you know, cows going to be Presbyterian. But the NBA, that's not so much. So we actually looked at the line for those same set of games and they were predicting it. Um, [00:12:00] 70.8%. So just using the final scores in the NBA over the last four years where within a percentage point of what the line predicts for the actual game winners in the game. So about 70%, and that's within the air of what you would expect. So, so the two results are within the same air. The MBA, you have a big advantage because you have a lot of games, you do not have that luxury in college football. And so a lot of my work in the future will be figuring out other things to, [00:12:30] to make it work better with college football. 

Speaker 4:        Have you bet. Have you placed bets based on your algorithm? I have. Um, so I was in Vegas for a wedding last April and uh, put some bets down on the NBA playoffs, uh, did pretty well. I know I didn't do, I came out ahead, which is I think is a lot more than most people can, uh, can say, uh, when they go to Vegas. And have you ever thought about trying to make a simulation? Yeah, absolutely. [00:13:00] So the idea of doing a simulation is of baseball is very old. I think the first paper was in the 70s and it's not, you know, these are the types of things that I was trained to do, right? So you write a stochastic simulation or you don't finish your phd. Right. Um, so, so it's kind of in the statistical physics world, it's just kind of what you do. You know, it's kind of like selling wood if you're a carpenter and it absolutely has applications in baseball. [00:13:30] So I kind of would have thought that it would be everywhere in the baseball world. It's not. And the reason, you know, I was reading a bunch of baseball stuff and you always see comments like, oh, well the Atlanta Braves might lose Brian McCann for a couple of weeks and it's going to decrease their run production. And the relationship might not be linear. 

Speaker 4:        And I think, well yeah, sure, you're probably right, [00:14:00] but you should just run a stochastic simulation. You should be able to figure that out. At least at a core screening level. It's certainly not out there in the media world that the idea is out there, but none of kind of the baseball analytics sites do it. Or at least they don't discuss it very widely. At least not that I know of. And interestingly, only four out of 30 major league baseball teams can do it, which is what I find really interesting because [00:14:30] you know, these guys are pretty advanced. They have these huge databases. Actually the Pittsburgh pirates have an enormous database when the most sophisticated databases and all the major league baseball. But there's a lot of old tools and you know, these guys are busy and they've never developed these kinds of tools and there are opportunities, I think, you know, I mean I've actually talked to people in organizations, in organizations that would like to have that. 

Speaker 4:        It's certainly something that you could I see on the powering maybe next season. And then just let me, uh, actually conclude. So this whole idea of [00:15:00] a random process is actually at the core of both page rank and the power. Right. Okay. The backbone of that is mathematical idea of a mark out process. So it's essentially random. That's the idea that Pedro is based off of. It's a, it's a distribution that comes out of this type of, of random simulation. And so they always motivate it by the, you know, the intuitive result of [inaudible] page rank is as the random surfer. So you have a surfer, you go to website, you randomly could come one of the other sites and he keeps doing this [00:15:30] and the amount of time he spends on any site is that related directly related to the directly proportional to the rank of that website. 

Speaker 4:        And so the same idea applies to my algorithm. It's a, essentially you have a fan, I like to call them fickle Freddy [inaudible]. He starts out being a Phillies fan and then he's like, Eh, I don't like these guys anymore because they keep losing. And so then he picks one of the teams that they lost to makes it random choice between all the seams and jumps and [00:16:00] then keeps doing this randomly. And so intuitively you can think of the algorithm in terms of this fickle fan that keeps making these jumps between teams and he's more likely to jump to your team if you've beaten that team, if you'd beaten that other team that he's already on. And this goes on forever. And the more that fickle Freddy is a fan of your team, the higher your rank. And so that's kind of the intuitive idea behind what's going on here. 

Speaker 2:        [inaudible]

Speaker 3:        [00:16:30] you're listening to spectrum on K A L X we're talking to with the sports analytics expert Ed thing about his site, the power rank.com and what he predicts tomorrow for the big game 

Speaker 2:        [inaudible].

Speaker 4:        [00:17:00] So one of my colleagues, his name is Steve Abel, he's, he's working on his postdoc and he, we did our phd in the same group and he always says that [inaudible] or statistical physics applies to everything. And it's essentially that, you know, we're studying the large scale bulk properties of things that are made up of individual units. You know, one day he told me, he's like, Hey, this whole power rank thing, you're actually proving that likes that neck applies to everything. I wonder are they [00:17:30] uh, interesting facts about this show is that it's prerecorded and edited so well will air immediately before the big game. It's still quite a ways out. But do you want to make any comments? I absolutely want to make some comments. I think it's, I think it's super interesting. So let's start last night. So cal went up to Oregon. No, a lot of my methods said that Oregon was the better team, but that they would lose by about 19. 

Speaker 4:        And the line said that Oregon would win by 24. So [00:18:00] it's quite a discrepancy. There's a lot of factors there. Went up there and you know, we knew, we kinda knew that their defense, it dropped off a little bit, but Oregon really took advantage of that. Had a couple of really long runs that they didn't dream of last year when they came down here to Strawberry Canyon. [inaudible] was in the game. They were up one at half time. And then the wheels kind of fell off in the second half and they ended up losing by 28 so we're going ended up covering, that's part of this business. You're wrong a lot. But I thought it was interesting because these are the types of things I think about because Kelly's defense really played badly. I mean there was, there was about maybe [00:18:30] three runs, a 50 yards or more that went for touchdowns. 

Speaker 4:        The quarterback sack Maynard did not have his greatest game. You know, he threw a touchdown pass in the first half but wasn't very effective in the second half at all. And I actually got pulled towards the end of the game. So now getting to big game, my rankings have Oregon and Stanford about the same right now. A little bit different. Stanford slightly ahead. But again, you know there are huge fluctuations. Okay. So I think the rankings probably would have held for the last season too. So it's by maybe about a 20 point cap, but Stanford won the game pretty soundly last year [00:19:00] and that happens with turnovers and you just never know what's going to happen. You know, the cow's defense is much better than they showed last night at Oregon. Maynard's probably a much better quarterback than he showed last night. And you know, they have some weapon. Kyle definitely have some weapons at receivers that are, that are very dangerous. It's interesting that sports fans tend to forget very quickly. So a lot of Stanford fans are very excited about their team as they, as they rightfully should be. And they kind of forget that. Two years ago cal came down to Stanford quarterback sensation. Andrew [00:19:30] luck had his worst game of his career and cal won again. So there's a lot of variants in sports. Anything can happen, Stanford will most likely win, but you never know. 

Speaker 4:        You know, the best background to understand sports turned out to be studying statistical physics. I mean it was great. It was perfect. So Ed, thanks for joining us. Thank you guys for having me. It was a pleasure. 

Speaker 2:        [inaudible] [00:20:00] that was Ed Fang. You can visit his website@wwwdotthepowerrank.com 

Speaker 3:        a regular feature of spectrum is to present a calendar of the science and technology related events happening in the bay area over the next few weeks. Brad Swift joins me for this. I'm happy to announce that the cabinet space and science center at 10,000 Skyline Boulevard in Oakland is joining the list [00:20:30] of museums that are offering adult nights. Cabot will host such an outing the third Friday of every month this month. That is today, November 18th from seven to 11:00 PM the event is 18 and over and costs $15 this is a bit cheaper than general admission. This includes access to exhibits, special activities, workshops, open labs, discussion forums, a planetarium show, film screens, and a telescope viewing when weather permits. The theme for this month is curiosity, a cure for [00:21:00] boredom or NASA's next generation rover searching for water and signs of life on Mars, capita, astronomer Ben. Burris will host a discussion on the rover and UC Berkeley's. Dan Wertheimer will discuss the search for ITI. There'll be a fix it clinic for your broken something or others at a telescope makers workshop and a lab on surviving in space. Visit www dot Cabot's space.org/night school dot HTM for more information. 

Speaker 5:        The next science at cal lecture [00:21:30] will be given at 11:00 AM tomorrow, November 19th in the genetics and plant biology building room 100 the talk will be given by Dr Genevieve graves and is entitled from gas into galaxies. Just add gravity. Come learn about the origins and the ultimate fate of this island universe we call the Milky Way. Dr Genevieve graves studies the formation histories of galaxies, how they form stars collide and merge together to make [00:22:00] bigger galaxies and eventually shut off star formation. Dr. Graves is a bay area local, having graduated from Albany High School in 1997 after undergraduate work at Harvard and Cambridge, she returned to California to do a phd in astrophysics at UC Santa Cruz. She is now a fellow at UC Berkeley's Miller Institute for Basic Research in science, 

Speaker 3:        Internet Archive, founder Booster Kale is giving a talk to the long now foundation on universal access to [00:22:30] all knowledge. He will discuss the next steps for the archive such as whether we can make a distributed web of books that supports lending and vending and how our machines might learn by reading these materials. Modern research into machine language translation and optical character recognition will be discussed as well. How we plan to preserve petabytes of changing data. This talk is on Wednesday, November 30th from seven 30 to 9:00 PM at the cal theater in San Francisco's Fort Mason. Center admission [00:23:00] is $8 and 71 cents for tickets. Please visit [inaudible] dot org slash seminars the exploratorium is after dark falls on the first Thursday of every month from six to 10:00 PM for people 18 and over. It is $15 or $12 for students, seniors and persons with disabilities and is free for exploratory members. The theme for December is after dark is glow. In addition to the hands on science exhibits available at the exploratorium. There'll be information on bioluminescent marine life, how to make glow sticks, [00:23:30] black lit, fluorescent cocktails for purchase, and more. There will be illuminated sculptures including local favorite playa flies by Michael Brown and Cuba tron by Mark Lotter. Visit www.exploratorium.edu/after dark for more Info 

Speaker 2:        [inaudible]

Speaker 5:        you are listening to spectrum on k a l x 

Speaker 2:        Berkeley [inaudible] 

Speaker 3:        [00:24:00] with some current science news headlines. Here's bad swift, 

Speaker 5:        a Russian Soyuz capsule, so use TMA 22 carrying an American and two Russians blasted off successfully Monday, November 14th from Kazakhstan on a mission to the international space station. It is the first flight of a NASA astronaut in the post space shuttle era [00:24:30] and as a welcome success for the Russian space program, a series of Russian launch system failures over the past four months has delayed the international space station staff rotation, raising the possibility of no crew on the space station. Last week, a Russian Mars probe failed to leave Earth's orbit. It is expected to burn up in the atmosphere by November 26 unless it can be reactivated in August. Then unmanned progress cargo ship [00:25:00] bound for the International Space Station crashed. The rocket that failed was the same kind used by the Soyuz, the NASA program to contract future International Space Station cargo and crew transport from two US companies space x and orbital has ironically been delayed by the reliability issues surrounding the Soyuz system. Space X is awaiting permission from the International Space Station partners to validate their system [00:25:30] by launching its falcon nine rocket and docking the dragon capsule when the International Space Station, the success of the recent Soyuz mission might accelerate the decision on the space x launch date. 

Speaker 3:        Science news summarized an article by Edward Pos shouts of the Massachusetts General Hospital and fell asleep. Researchers that appeared in the November 8th issue of current biology. The team claimed they have for the first time used functional magnetic resonance imaging to capture the brain activity of [00:26:00] lucid dreamers. Lucid dreamers are able to control their dreams while in deep sleep. They ask dreamers to squeeze first their left hand and then the right hand, one of their six volunteers were able to do this and the FMR I revealed increased activity in their sensory motor cortex when they were directed to squeeze their hands. Similar brain regions showed activity whether the hand squeezing was performed while awake, imagined while awake or directed in a dream. 

Speaker 5:        Coral reefs worldwide are increasingly disturbed by environmental [00:26:30] events that are causing their decline. Yet some coral reefs recover science daily reports that you see. Researchers have discovered that the health of coral reefs in the South Pacific of Moria in French Polynesia maybe due to protection by parrotfish and surgeonfish that eat algae along with the protection of reefs that shelter juvenile fish. The findings are published in the August issue of the journal plus one coral [00:27:00] reefs that suffer large losses of live coral often become overgrown with algae and never returned to a state where the reefs are again largely covered by live coral. In contrast, the reefs surrounding Maria experienced large losses of live coral in the past, most recently in the early 1980s and have returned each time to assist them dominated by healthy live corals. The new research found that fringing reefs, the reefs that grow against the island act as a nursery ground for baby fishes.

Speaker 5:        [00:27:30] Most notably herbivores, [inaudible] fishes with more food available in the form of allergy. The survivorship of the baby parrot fishes and surgeon fishes increased providing individuals to help control the algae on all the nearby reefs. In effect, the large numbers of parrot fishes and surgeon fishes are acting like thousands of fishy lawn mowers. Keeping the algae crop down to levels low enough that there is still space for new baby corals to settle into the reef [00:28:00] and begin to grow. A major reason the reefs in the Caribbean do not recover after serious disturbances is because these reefs lack healthy populations of pair of fishes and surgeon fishes due to the effects of over fishing without these species to help crop the algae down, these reefs quickly become overgrown. With allergy, a situation that makes it very hard for corals to reestablish themselves. The new research suggests that marine protected areas need to include the [00:28:30] fringing reefs that serve as nursery grounds. Without these nursery grounds, populations are pair of fishes in surgeon fishes can't respond to increasing amounts of algae on the reefs by outputting more baby herbivores. 

Speaker 1:        [inaudible]

Speaker 2:        [inaudible]

Speaker 1:        the music played during the show is written and performed by David lost time from his album titled Folk [00:29:00] and Acoustic 

Speaker 2:        [inaudible].

Speaker 1:        You're listening to spectrum. We are happy to hear from listeners. If you have comments about the show, please send them to us via email. Our email address is [inaudible] spectrum dot g a l s@yahoo.com join us in two weeks at this same time. 

Speaker 6:        [00:29:30] [inaudible] [inaudible].


Hosted on Acast. See acast.com/privacy for more information.