" rel="attachment wp-att-4795
So you want to be a data journalist. A number-crunching sleuth who’s surfing the crest of the very same digital wave poised to sweep aside the low-lying journalistic towns and villages in its fateful path. Our guest blogger, Liv Buli
… "/>

Learning to query: Data journalism 101

So you want to be a data journalist. A number-crunching sleuth who’s surfing the crest of the very same digital wave poised to sweep aside the low-lying journalistic towns and villages in its fateful path. Our guest blogger, Liv Buli, is turning numbers into stories at Next Big Sound and lays out a few things you should know before claiming the moniker for yourself.


First off, data journalism isn’t new. Not even close. The Guardian, the same U.K. paper that has made a fetish for data one of the centerpieces of its widely-acclaimed online strategy, started turning numbers into stories in its very first issue way back in 1821. (It was a list of schools by student enrollment and spending, so not exactly 19th Century linkbait.) What has revolutionized the field in recent years, and the reason you hear the term every time journos get together to discuss “What’s next?” is the amount of data now available, and the speed with which this data is being generated and delivered. At Next Big Sound, the music analytics company at which I blog, we are gathering an average of 175 million data points each day. Facebook users have generated roughly 300 times as much data as is contained in the Library of Congress. That’s a lot of scatter charts.

It shows, too. Some of the best, and best-read, stories to hit the press this past year were  based on data findings, such as The New York Times’s damning series on corruption and cruelty in horse racing, not to mention the Wall Street Journal’s uncovering in recent years of two of the biggest scandals to hit the business world: the LIBOR rate-fixing scandal (currently rocking the finance world) and the backdating of tech company stock options (which unseated a slew of high-profile Silicon Valley execs.) What’s striking about this kind of technical journalism is that the data weren’t just a starting point for asking questions, they provided the answers, too. WSJ’s statistical analyses in both the LIBOR and stock options cases showed that the likelihood people were obeying the rules was remote, providing the circumstantial evidence long (really long) before regulators and law enforcement obtained proof.

Data journalism can also come in different formats. The News Application team at the Chicago Tribune consists of a group of programmers embedded in the newsroom, assisting journalists in uncovering data and creating cool apps to make use of it.

Interviewing the data

OK, you’re interested, you want to know “Where do I sign up?” and “Is it OK that I flunked every math class I ever took?” Start with this: A basic requirement for the role is the ability to query a database in order to extract information and that requires some basic coding skills. It’s usually the second step in my reporting, after picking a topic for which the data seems promising. It’s akin to interviewing sources at the scene of a crime or compiling a list of experts to call for a science feature. I work mostly with music industry numbers so if I’m looking at a database of artists and their followers on social networks I may want to query only the artists that have a certain range of followers on Twitter.

In the run-up to the MTV Video Music Awards, I wanted to see if our data would give any sort of indication as to who would be most likely to take home the Best New Artists award. Querying the Wikipedia, Facebook and Twitter data for the five nominees in the right time frame made it  easy to compare the size of their fan base and current popularity in order to accurately predict who would take home the title. Which brings us to essential skill number two: a level of number comprehension that allows the journalist to understand what story the data is telling.

Figure out what data to work with

So what equipment does the data journalist need? I use Excel at times, and there are some  fancy stats programs such as SPSS out there. I’m lucky, though: I have at my fingertips a proprietary platform that allows me to easily graph the information in NBS’s database so I can see correlations and pull overview reports of relevant data. But all data journalists have to figure out what data they want to work with and find the right tools for analyzing that specific information set. Telling great stories then simply becomes a matter of figuring out the right questions to ask of the data and combining this with relevant reporting. And that part looks a lot like regular journalism: I stay up-to-date on the industry I am covering.

As for where to do data journalism, think outside the newsroom. Because NBS is a data company, not a major news outlet, it’s more of a challenge to get our content widely read and I don’t have a bullpen of editors to bounce ideas off of but there are some major advantages. Because we have historical data on hundreds of thousands of artists, I can often add insights to breaking news stories. I’m able to work with our in-house engineers to do analysis on major events such as festivals and award shows, to gauge which performers are growing the fastest in terms of views, plays and new fans. We have our own data science team with whom I am able to delve even deeper on certain topics, uncovering how social media metrics and radio spins correlate to album sales, or whether artists are purchasing fake Twitter followers.

About the Author: Liv Buli is the resident data journalist for music analytics company Next Big Sound. Buli is a graduate of New York University’s Arthur L. Carter Journalism Institute and her work has appeared in Newsweek Daily Beast, The New York Times Local East Village, Westchester Magazine and more.