When it Comes to Data, Time is Money

By Warren Breakstone, Chief Product Officer of Data Management Solutions, S&P Global Market Intelligence

Warren Breakstone, S&P

The most sophisticated corporations, banks and investment firms are harnessing the power of data to drive their businesses. But what they are increasingly finding is that the costs associated with data analysis go far beyond the cost of the data itself, to the point where businesses could end up spending up to 10 times more on data integration than for the underlying content.

Therefore, purchasing premium priced data is likely the least expensive part of decision analytics and the research process. This is as true for us, as it is for any other major firm on Wall Street.

So what are forward looking firms spending those additional resources on? Well, before a single insight can be derived from this data, there are a number of steps that need to be taken. These steps include, but are not limited to:

  • Structuring the data
  • Cleaning and standardizing the data
  • Linking the data with other data sets
  • Generating aggregations or new data attributes
  • Storing it in a database
  • Making it available for others

Preparing the data for analysis or modeling is the final step before performing the actual analysis.

Alternative Data

More recently, with the proliferation of alternative data, client challenges have compounded. Alternative data, by definition, is nontraditional and the value clients’ gain from it is in the form of supplementing their existing data and analysis—it’s a new and more complete perspective—and its value often corresponds to its scarcity. The value of the data must be in excess of the costs incurred in getting it into a format that’s usable. Often, this hurdle isn’t reached and the opportunity to consider interesting and potentially advantageous data is set aside.

A good example of alternative data in action comes from Panjiva[1], which tracks over 40% of the world’s merchandise trade by dollar value of shipments via cargo ships. The traditional use case for supply chain data is to help optimize a company’s own supply chain network, as well as gaining a better understanding of competitors.

However, an “alternative” use case for supply chain data is to evaluate and model the impact of disruption. For example, when a hurricane hits a port we can measure the impact on the flow of inventory and raw materials. Separately, we can also assess the relevancy of relationships between clients and their suppliers, and how that relationship has evolved overtime.

While alternative data today is all the rave, it is most useful in concert with traditional data to drive useful insights. For example, combining and linking shipping data with traditional supply chain data and company level financial data tells a more complete story. This is where the use of alt data really comes to life.

We expect that the label “alternative” will be dropped as expectations associated with these new sources of data will increasingly converge with the “higher” expectations of traditional data. After all, alternative data is really just data that is being employed for a different use case than initially designed.

Data Scientists

 Data Scientists, people who help extract insights from the data, or Quants in the investment management space, use this data to identify signal, build predictive analytics, and empower better business decision making. This is what they get paid to do.

You might consider statisticians as the first ‘data scientists’ that companies were hiring to analyze data. In other words, making sense of large amounts of data isn’t new, but the practice is more mainstream now than ever. The practitioners that use alternative data are now being hired hand over fist by data rich corporations and financial institutions, who are looking at data as a competitive differentiator and also an enabler to make better decisions.

According to a 2017 IBM study[2], data science jobs are the fastest growing jobs, but also one of the most difficult positions to fill with qualified candidates. These individuals are working in python and R, where their predecessors worked in Excel. They often have computer science degrees where their predecessors may have had MBAs. Data scientists are expected to not only conduct meaningful analysis but also to effectively communicate the impact in business terms.

But the job of a Data Scientist is less glamorous than the headlines and paychecks indicate. Much of their time is focused on the data engineering side of this science[3], and that entails wrangling the data and preparing it for analysis. If these were medical scientists, they would be spending their time prepping the lab, making sure test tubes are available, taking inventory, writing test scripts, and making sure the microscope is at the ready. This is not nearly as glamorous as the science itself, but critically important to the process.

The True Value

 What our clients are telling us loudly and clearly is that whomever can help free them up to spend more of their time and resources on the value-add analysis, predictive analytics, and search for alpha, they will be the partner for the future.

Data providers are being asked to deliver data in a consistently structured manner and provide options for the ingestion of that data directly into the clients own processes, systems, databases and machines. RESTful API solutions and bulk feed delivery with loader capabilities that easily integrate into modeling tools, Python and R, are expected. No extra points will be afforded by having these capabilities. The data itself must be historically in-depth, time stamped and linked to source documents. It will also need to be linked to other data that they utilize from multiple vendors along with their own internal data.

Delivering quality and comprehensive data remains the most essential part of the offering, but will be increasingly considered just table stakes. The value increasingly is in the work to make that data more useful, more actionable and timelier. In this model, time is money.

[1]Panjiva was acquired by S&P Global in early 2018.

[2]IBM: THE QUANT CRUNCH: HOW THE DEMAND FOR DATA SCIENCE SKILLS IS DISRUPTING THE JOB MARKET. May 2017. Retrieved October 2018 from: https://www.forbes.com/sites/louiscolumbus/2017/05/13/ibm-predicts-demand-for-data-scientists-will-soar-28-by-2020/#3e53f5567e3b

[3]University of Wisconsin: What Do Data Scientists Do? Retrieved October 2018 from: https://datasciencedegree.wisconsin.edu/data-science/what-do-data-scientists-do/

Related articles

  1. There's value there, but institutions will wait for a path.

  2. Alternative data sets can drive organic growth through differentiated alpha generation.

  3. CloudQuant users will gain access RavenPack's historical data.

  4. Digital transformation provides fertile soil for new revenue streams.

  5. Distributed Ledgers' Impact Assessed

    Incorporating the new datasets can be laborious.