Big data can lead to big breakthroughs in research

December 09, 2011

By James Temple, San Francisco Chronicle

UC Berkeley Professor Dennis Baldocchi has taken on the not-so-modest task of monitoring "the breathing of the biosphere."

He and a team of researchers oversee sensors in 500 sites around the world that measure things like wind, carbon dioxide, ozone and water vapor 10 times a second. They combine this massive amount of data with NASA satellite imagery to visually depict and analyze how climate change is altering the world.

If a year's worth of that data were collected and processed at once, it would take a regular PC an entire year to trudge through the task. But the researchers are making sense of this data through a partnership with Microsoft Research, tapping into its cloud of worldwide server farms to turn around the same work in as little as a day.

"We had the ideas, we knew what we wanted to do with it, but we were overwhelmed with the computational demands of the project," Baldocchi said. The ability to collect and analyze huge volumes of information, a concept known as big data, has "democratized computing and large-scale science."

In fact, big data has become a big deal across the disciplines of science, business, medicine and technology. A McKinsey Global Institute report in May dubbed it "the next frontier for innovation, competition and productivity." Gartner recently identified it as one of the top 10 strategic technologies for 2012.

Forces converge

Like many buzzwords in technology, big data isn't really new. Companies have been processing a lot of information for a long time. But a convergence of forces has radically accelerated the amount of data at our disposal - as well as our ability to extract treasures from those troves.

Increasing Internet use, ubiquitous smart phones, cheaper cameras and better sensors are doubling the world's information every two years, according to some studies.

An IDC Digital Universe report sponsored by EMC said we'll collectively create or copy 1.8 zettabytes of data this year. That's equivalent to 1.8 trillion gigabytes - or the amount of data accumulated if every person in the United States tweeted three times per minute for nearly 27,000 years.

Meanwhile, the rise of cloud computing has granted far more people and organizations the ability to affordably tap into the supercomputing-level horsepower and data storage necessary to collect and process all those bits and bytes. In addition, increasingly sophisticated analysis software is allowing them to spot patterns that only become apparent on a mind-bogglingly massive scale.

"The scale and scope of changes that big data are bringing about are at an inflection point, set to expand greatly, as a series of technology trends accelerate," the McKinsey report stated. "We are on the cusp of a tremendous wave of innovation, productivity and growth."

Much of business interest in big data involves the ability to automatically analyze online behavior to create ads, products or experiences that are most appealing to consumers - and thus most lucrative to companies. There's also great potential to more accurately predict market fluctuations or react faster to shifts in consumer sentiment or supply chain issues.

This all suggests businesses will become increasingly responsive to the wants and needs of their customers. They'll be able to point them to perfectly suited music, movies, books, news - or even content tailored to their unique interests.

But there are dangers, of course.

Data security

The sheer quantity of data involved raises a host of new questions on the sensitive subjects of online privacy and data security. We've already seen how an over-reliance on data and automation can exaggerate market downswings, as computer programs dump stocks in the face of price declines. And the application of data is only as good as the models and algorithms invented by imperfect humans.

But there's still much to be excited about. The biggest of the big data breakthroughs may lie in the scientific realm.

It's the fuel that powered many of the recent leaps forward in artificial intelligence, including IBM's "Jeopardy" champion Watson and Google's machine language translation tools. Big data can also be used to spot emerging disease epidemics at an earlier stage, to study the links between lifestyle factors and disease and to discover new insight into climate change.

For Baldocchi's team, the tools are providing greater understanding of the complicated interplay of environmental variables, and how they have affected things like vegetation patterns and water levels.

Yet we're still at a very early stage in realizing the power of big data, said Catharine van Ingen, partner architect at Microsoft Research, who collaborated on the climate project.

"We're just beginning to see the science that becomes possible with that kind of computation," she said. "It changes the way we think about things; it changes our ability to think about the world."

Read it at the source.