Robot Byline: Software Sportscasters and Other Stories

Several months ago I was reading a fun piece by Kontra on the subject of sports previews written by software. Prolific Internet commenter Walt French (you'll find him writing detailed comments on several sites: Asymco, Monday Note, the late AllThingsD) felt that the potential was being overstated:

It would also get dreadfully dull after a fan checks in a couple of days in a row. Or flits over to the college games’ summaries to see how his college team and its arch-rival(s) were doing.

I certainly think there’s a place for form-letter sports reporting, but it’ll have to be at least an order of magnitude better than what you posit, before it’s anything besides a joke.

I had written up a fairly lengthy response as a comment, but then my browser froze and after force quitting and relaunching, my text was gone. Lesson learned. Compose in a proper text editor, and any comment of more than three paragraphs is a good candidate for a blog post.

A few weeks ago I came across another piece on sports and language, this time about the disappointing limitations of television commentators for the majority of US professional sports, with particular emphasis on basketball and football. It only strengthened the point I was originally going to make.

Sports fans, in the US at least, are incredibly tolerant of cliche and repetition. They have been conditioned by terrible television coverage and bland commentators, not to mention the cesspit of "insight" that is sports radio. It doesn't take much to meet their expectations.

At the same time, there is already a laboratory for software commentators that we see improving every year, and I predict will eventually bust out of the labs and into our live broadcasts. Where is this lab? Video games. From Madden to FIFA to NBA 2K, developers have spent thousands of man-hours and millions of dollars recording bits of dialog from marquee commentators and figuring out how to string them together into naturalistic presentations. My only argument is that they haven't (yet) gone far enough.

Forget the simplisic template. All sports commentary seeks to summarize and optionally emphasize/contextualize an event or series of events that happened in the past—in the case of a live game, in the very immediate past. The heart of the matter is generating deep statistics and feeding that to the narrative synthesis engine.

Player X passed the ball.

Player X passed the ball to player Y.

Player X passed the ball to player Y, who shoots and scores.

Player Y loses his man off a pin-down from Teammate Z, coming open at the elbow to receive a perfectly-timed pass from Player X and hit the game-winning jumper.

This relatively simple sequence of basketball events involves multiple simultaneous actions occurring around the court, culminating in a play whose significance must be understood to be contextualized. For the 2013/2014 NBA season, STATS LLC operated in every NBA arena Sport Vu cameras capable of tracking all 10 players on the court and the ball at once, 12 times per second. The data coming out of these cameras is already challenging established analytics perceptions.

For our purposes here, however, they provide the possibility, along with image classification algorithms, of generating a finely-grained chronological event record that we can feed into our commentary engine. By analyzing sequences of player events backwards from points of discontinuity (ball out of bounds, basket scored, timeout, etc.), the software can synthesize a narrative: Player Y took a shot… because Player X passed the ball to him… because Player Z set the pick that got him open.

Given the data of our aggregate event to be described and metadata such as time, date, venue, participants (players and/or teams), a rough outline can be prepared. This may look quite similar to the template from Kontra's essay.

A second pass can then look at the generated outline—at this point still a runtime object graph, because of the ease of traversing and parsing as well as the ability to retain annotations—and compare it against other outlines: paragraphs in the same article, summaries in the same day's releases, prior summaries/articles referencing the same player/team, in each case looking to see how structurally similar the graphs are.

The goal will then be to soften similarities by deploying euphemisms, substitutes and asides: inserting nicknames for teams, arenas and popular players, digressive paragraphs and parentheticals providing statistics on recent trendlines in players' and teams' performance, etc. The distribution of how often to pull in various types of asides should be hand-tweaked for optimal results, so as to prevent being inundated with references to players' marriages, children, affairs and divorces. Fans only care for the very biggest superstars.

Can robots write sports previews? Yes, and then some. It is only a matter of time until algorithmic synthesis of commentary, even in real-time, supersedes all but the most talented of broadcasters (and many a fan breathes a sigh of relief, freed from the inanity of a Troy Aikman or Joe Buck). They won't be deployed to the big games immediately; they'll start out churning out Automated Insights-style previews for the Yahoo! Sportses and CBSes of the world, then an affluent town might have one used for a local high school game… and then the flood gates will open.

The question will not be whether robots can write sports previews. It will first be whether we want them to, and then why we wouldn't.