Breaking Down Big Data
By Philip Brittan, Chief Technology Officer, Financial & Risk, Thomson Reuters
There is a lot of talk these days about “Big Data”. And there seems to be a fair amount of both confusion and mystique about what it is and what it can mean. Big Data is essentially a valuable set of practices and technologies, and is something which can help businesses on a daily basis. But in order to derive the most benefit from it, you have to be able to understand it and be clear exactly what you mean by it.
In reality, it is not actually that complex, but the term Big Data is used broadly and increasingly vaguely, which is what makes it confusing.
In my view, Big Data breaks down into two fundamental parts:
- Technology to efficiently process large amounts of data
- Analytics to extract useful information out of those large amounts of data
The total amount of digital information in the world is increasing at a staggering exponential rate. Not that long ago we used to think a gigabyte was a lot of data. Now, some companies are dealing in exabytes – that’s 1,000,000,000 gigabytes.
Dealing with very large amounts of data has traditionally been expensive (you need big computers) and slow (copying or analyzing large amounts of data can take a long time). The first big aspect of Big Data is therefore a set of technologies that help make processing large amounts of data less expensive and less time consuming. There are many technologies used for this today, but what they all have in common is the ability to handle data from thousands of servers, processing it and presenting it in usable format at high speed.
These technologies are not only extremely fast and efficient at dealing with large amounts of data, but they can be scaled relatively inexpensively over many low cost commodity servers. Google has literally millions of servers powering its search capabilities and other services. When you use Google Search, you see all these technologies in action, as your search query runs against content that is spread over thousands or even a million servers, to find exactly what you are looking for. These technologies, while seeming quite simple, are extremely powerful and flexible.
Once you have an efficient way to store and process large amounts of data, the next challenge is to make sense of it. It is impossible for humans to get their heads around such vast quantities of data, so we use statistical methods to find information buried among all that data. These methods have been known for a fairly long time, but firms like Google, Amazon and Facebook, along with the Big Data movement overall, have brought much broader interest in them.
These statistical methods, which include various kinds of pattern matching – factoring large groups into clusters – and machine learning, are what create the almost magical experiences we have today when we use Google Search, Amazon Recommendations, or Apple Siri. These methods are powerful, and are valuable even if the underlying data set is not massively large—in fact, we see them increasingly being used on smaller datasets. The term Big Data is still often used in these cases, but “Smart Data” might be more descriptive in this context.
Another aspect of the analytical half of Big Data is the use of visualization. A picture paints a thousand words, and Big Data projects often employ creative visualization to help the human end user gain insight from a large data set. Visualization tools are going to become more and more important in helping humans make sense of big data. At Thomson Reuters, we are increasingly using sophisticated visualization tools to help our customers get actionable insight from data.
For example, we used innovative mapping technology to provide a clear picture of the commodities supply chain, plotting weather, news, and the positions and status of key production facilities as well as the routes of ships. Or, in response to growing interest in social media and its influence on the financial markets, we leveraged news sentiment technology to display Twitter and Stocktwits data in a charting application. In both instances, visualization tools help users identify trends and potential signals extremely quickly amongst vast amounts of unstructured data, giving them actionable insight that would otherwise be hard to find.
Technology and analytics are the foundation to understanding Big Data. As the amount of content, data and information that we are exposed to on a daily basis continues to grow exponentially, it is ever more important to be able to distinguish the signal from the noise—to make sense of all that data and turn it into information. Big Data will only continue to grow in importance as businesses find new ways to efficiently handle the increasingly large data sets we’re being faced with every day.