Monday, May 19, 2014

Washington Post Sports Watch: Why is "Fancy Stats" Misusing Stats?

"No one cares about the hockey team. Not even people in the Washington area." That was the tagline on a post entitled "The Washington Wizards Have Buzz, But Do The Capitals?" on the Washington Post's new "Fancy Stats" blog last week. That's a pretty strong statement of questionable veracity, so I wondered what "statistics" the blog's author, Neil Greenberg, had to prove his assertion. As it turned out, he really didn't have any.

Greenberg used Google Trends (which tells you the relative popularity of search terms over a period of time) to demonstrate that interest in the Wizards had increased over their recent playoff run, and that interest in the Nats and Redskins had fluctuated over the past four weeks based on the news surrounding the teams. And then he said, "The team that you should worry about is the Washington Capitals. The hockey team is paid attention to the least in the Washington area and doesn't even register on the scale in the nation's capital in the past 30 days." Greenberg is correct on one point here--there hasn't been much interest in the Caps over the past 30 days. Of course, that shouldn't be surprising--the 30 days that Greenberg uses for his Google Trends search (April 12-May 12) included exactly one Caps game, the last regular season game of the year four days after the team had been eliminated from the playoffs. Of course the Wizards had eight times as many searches done for them than the Caps in this time period--they advanced to the second round of the playoffs, while the Caps weren't playing any games. (For the record, Greenberg claims interest in the Caps should have been higher during this period because Alex Ovechkin won the Richard goal-scoring trophy and the team fired their coach and GM. Well, Ovechkin had pretty much clinched that trophy by mid-to-late March and is firing a coach and GM really more interesting and exciting to fans than playoff games?)

As for the hockey team being "paid attention to the least in the Washington area," if you take any time other than the last month, that assertion is flat out wrong, based on the same Google Trends statistics Greenberg cites in his post. Check out this graph of the last year, in which the Caps are either ahead of or even with the Wizards virtually the entire year (other than a couple weeks last February--when the NHL shut down for the Olympics.) Check out this graph since 2004, in which the Caps are ahead of the Wizards in Google searches pretty much since 2008 (and weren't that much below them in 2006 and 2007, when the Caps were terrible and the Wizards were a playoff team). And if you want to see the danger of depending on one month of Google Trends to make assertions about the popularity of teams, check out May of 2013 in this graph, with which one could argue that the Caps are more popular than the Redskins in Washington, DC. Of course that's ridiculous--but using Greenberg's statistical analysis, it would apparently be enough evidence to declare the Caps the most popular team in town.

Unfortunately, using statistics without proper context isn't a one-time occurrence on the Post's "Fancy Stats" blog. A couple weeks ago, Greenberg had a post entitled, "Kentucky Derby Is Bigger Than Super Bowl -- At Least in Gambling." The post noted that the amount of money bet on the Derby throughout the country is larger than the dollar amount legally bet on the Super Bowl in the United States. Well, of course--what Greenberg leaves out is that you can bet on the Kentucky Derby at racetracks throughout the country, off-track betting parlors in some states and even online. But the only way to legally bet on the Super Bowl is to be at a Nevada casino. If you could walk into a racetrack and bet on the Super Bowl, isn't it obvious that bets on the Super Bowl would blow the Derby away?

There are a number of other problems with the blog besides misleading stats, though--such as missing the full story. Take this post last month reveling how Stephen Strasburg's change-up is unhittable. That's true, according to the stats Greenberg cites. But the chart within the post also notes that batters, at the time, were hitting .345 against his four-seam fastball and slugging .700 against his slider. If the seeming goal of the "Fancy Stats" blog is to look for the truth through stats and data, isn't something that deserves at least a mention in the story--and, really, shouldn't be the focus of the story along with the unhittable changeup?

Then there's the weekly fantasy baseball posts, not by Greenberg but by Collin Hager. One of the best posts on Fancy Stats in its first few weeks was one by Greenberg debunking the Thor hockey rating system, a system which claimed to find that neither Sidney Crosby nor Alex Ovechkin were among the best 150 players in the NHL (while Troy Brouwer was seventh!). And yet every week, the blog carries Hager's posts using his HVaC fantasy baseball ranking system which incldes a link to an explanation of Hager's system but no link to his rankings. After googling a little, I found his rankings, and they're a little odd--Troy Tulowitzki, who is hitting .400 with 12 home runs, is ranked as the 13th best shortstop in MLB and the 85th player in baseball. Why? Because he doesn't steal enough bases, and fantasy players need their shortstop to be strong in that category. Really, that's the reason. That doesn't mean the rankings are useless, but shouldn't the reader know about this kind of oddity in his system without googling? And is a system ranking the guy who has been the best player in baseball this year so low really an effective system? (Asdrubel Cabrera is on my fantasy team and he's ranked higher than Tulowitzki in this system. I have Cabrera on my fantasy team, and he'd have to have 200 stolen bases this season to be worth anything close to what Tulowitzki is worth.)

Perhaps the most distressing and disturbing thing about the Fancy Stats blog, though, is that its primary author doesn't seem to care that his statistics are misleading or taken out of context. When a commenter (not me) asked below the Google Trends post about the Wizards and Caps whether the numbers would be different if the Caps were in the playoffs this year, Greenberg actually told him to figure it out for himself:
"Sure, you can pull up any years in Google Trends. It is available to public. Let me know what you find."
The author of the "Fancy Stats" blog may not have any interest in whether his claims are actually supported by the statistics he's using. But shouldn't there be an editor at the Post who can make sure that the "Fancy Stats" blog is using statistics correctly, so readers aren't being misled?