Big Data
Alumni Find the Stories Behind the Numbers
Ponder, if you will, some seemingly imponderable questions.
How much carbon can a forest hold? How quickly will that forest grow over the next 100 years?
When it comes to voting, can you pinpoint which voters will support a particular cause? Can you identify enough of them to sway the outcome of an election?
If you like to play computer games, where can you find others with whom you will be evenly matched? Can you measure your influence in the world of online gaming?
Not so long ago, such questions would be answered with a good amount of head-scratching and guess-timation. But today, Weinberg graduates are helping to solve those enigmas, bringing all their analytical skills to bear on the huge and growing amount of information now available.
On this scale, data isn’t just “data.” It’s “big data”—more data than could ever have been gathered, measured and analyzed in the pre-digital era. And the data aren’t just numbers. They include audio, video, images, text, social media, climate and traffic sensors, meter readings, financial transactions, cell phone GPS signals—and more.
Mathematics Gets Social
It was in Weinberg’s Mathematical Methods in the Social Sciences (MMSS) program that Paul Spraycar ’01 first learned how to gather and make sense of such vast amounts of information. The unique program empowers students to help organizations harness big data and adapt to a fast-changing technological environment.
Students in the program not only study mathematics, statistics, and formal modeling, they do so in tandem with the social sciences. “It’s a really accelerated track that taught me game theory, advanced econometrics, statistical modeling, and how to pull a story from data and learn from it,” explains Zach Kahn ’09. “It’s basically economics on steroids.”
The program is selective—only 30 to 60 students per class are enrolled. All must complete a second major in the social sciences—psychology, political science, linguistics, economics, even philosophy or Slavic studies.
For Spraycar, who spent his undergraduate summers leading backpacking trips in the Sierra Nevada, the program offered a unique opportunity to cultivate his mathematical skills and apply them to the hard sciences. “It enabled me to bring this set of quantitative skills to an environmental issue I care about,” he says.
As a result, students expand their career options well beyond the hard sciences to a vast range of fields where their skills are highly valued.
Spraycar has done just that at a Berkeley, Calif.-based startup called ecoPartners. The six-person firm uses state-of-the-art data analysis to calculate how much carbon a forest can hold. Once that figure—the “carbon sequestration” value—is determined, those who want to mitigate climate change can pay landowners to protect their forests. Preservation becomes an economically viable alternative to razing the property for timber, agriculture, or development.
This calculation requires data. A lot of data. “Forests are complex systems,” explains Spraycar, “and the projects we do are many thousands of acres with hundreds of trees per acre. Everything from nutrient levels in the soil to the variety of tree species to fires and human intervention will affect how much carbon a forest can hold.”
ecoPartners takes all of these factors, and many more, into account to predict how the forest will grow over a period of 100 years. Then, ecoPartners compares this scenario, in which a landowner protects and maybe even increases the carbon in the forest, with “the common-practice scenario where they’re just cutting trees down for timber and are not concerned with the carbon.”
Seeing the Trees for the Forests
Quantifying both scenarios is still a very new task. And until now, it has required boots-on-the-ground legwork to gather information from the forests themselves, establishing homogenous sample units and taking exhaustive measurements of each unit’s biomass. Today ecoPartners works to make that process more precise and more efficient.
“We have about 10,000 of those little units per forest, and we’re doing a series of calculations to figure out what happens in that unit over 100 years,” Spraycar says. “So 10,000 units times 100 years is a million fairly complex calculations.”
But by managing their data and rearranging their calculations, ecoPartners has already cut down the processing time of one model iteration from 18 hours to two and a half. With further developments in technology, the process could grow even less cost-intensive in the next few years. Achieving this is Spraycar’s central challenge.
One possibility is to use an optics scanner that could take images around a tree and then quickly build a three-dimensional model of the area. Aerial and satellite images could also yield large-scale and accurate models of forests, as well as provide frequently updated information about fires or human activity that would affect carbon storage in a region. Both methods would produce more precise results, as well as greatly reduce the time, labor and cost of gathering the necessary information. And that could have an impact on the prospects for global warming.
“The cost of mitigating climate change is often cited as the reason not to act,” Spraycar observes. “So when you dramatically reduce the cost of helping landowners and indigenous communities to protect their forests, you lower the cost of addressing climate change. The ability to process data is a big part of that.”
Political Predictions
It’s also a big part of motivating the public, as Marshall Miller ’08 can attest. Miller is currently the director of analytics at Washington, D.C.-based Catalist. The company provides progressive organizations with the data they need to pinpoint the audiences most likely to be receptive to their causes.
Miller entered Northwestern intending to pursue science or mathematics, but he was also interested in politics and current events. “And then I realized that that’s exactly what the MMSS program is for—working within the social sciences, but drawing on mathematics and certain skills that are usually kept in the hard sciences,” he said.
At Catalist, Miller works with a massive database of nationally registered voters and unregistered adults. Starting with this data, he develops predictive models that can show, for example, prospects for voter turnout, or the likelihood that a certain group will support a certain cause. Catalist’s clients, which include nonprofits, campaigns and advocacy groups, are then able to communicate with and mobilize groups valuable to their causes. The clients’ results are then added back into Catalist’s databases, and the predictive models grow even more detailed and accurate.
Ultimately, Catalist’s clients achieve their goals more efficiently. They can focus their resources on the individuals most likely to support them, while campaigns can find voters on the fence who, as an election nears, still need additional information or just an extra nudge in one direction.
“We’re going to learn so much about voter behavior because of how much is being recorded now,” Miller says.
Data’s Personable Side
Managing big data isn’t only about number-crunching, and the structure of the MMSS program reflects that. All MMSS students are required to develop a senior thesis that involves original research in social-science analysis.
For Kahn, that proved a golden opportunity to learn the critical art of presenting data to others. Kahn’s senior thesis required him to analyze statistics for the Los Angeles Police Department on the success of police academy recruits. The skills he gained were key to landing his first post-college job working with survey data at the Center for Effective Philanthropy.
“Not everyone can present data clearly, not even some people who actively work in data,” Kahn says. “It’s essential in today’s world to understand the data you are given and be able to glean insights from that data.”
Kahn’s first employer “loved” that he had worked with a professional agency, he says. “In that position, I was going to be presenting data to clients, and I had already done that during my thesis. That’s an opportunity I wouldn’t have had if I hadn’t been enrolled in MMSS.”
Kahn built on that opportunity, moving later to Groupon, where he was a strategic analyst in merchant services, and ultimately to OptionsHouse, where he is now a business intelligence analyst. He is constantly refining his approach to data in order to produce ever-more-reliable insights for stakeholders.
Not Just a Game
If you notice that the next video game you play seems unusually tailored to your preferences, you might have Harrison Shih ’10 to thank.
Shih, a senior product manager at GREE, Inc., is helping to build the global platform the company uses to customize online-gaming experiences to individual users. “Our ultimate mission is (to enable) players to find other players they want to connect with,” says Shih, who was a marketing manager at Google before joining GREE.
Game developers acquire an extensive amount of information about their players, from how much they play to what kind of social influence they might have within the online-gaming community. All that data can be used to improve the experience for each player.
The major challenge comes after the data is gathered and sorted. “Data itself does not generally hold recommendations,” Shih says. “It’s the analysts who interpret the results and present the findings.”
But that data can easily be skewed—and analytics don’t always provide the answers.
“Especially in the fast-growing space of the consumer Web, there are a lot of areas that are unknown and that require leaps of faith or qualitative decisions to really succeed,” Shih observes. “Sometimes, a quick brainstorm among smart people is sufficient to validate or disprove seemingly sound data-based recommendations.”
Knowing when to take a different approach is an art as well as a science. And that is a skill, Shih says, that he learned through MMSS’s multidisciplinary approach.
“I learned how to ask questions and to collaborate,” Shih recalls. “That came from working with the high caliber of students and professors in the program.”