08.28.2017
By Rob Daly

Machine Learning Compresses Data Costs

While exchanges continue to raise the price of their market data feeds, other financial data is in a race to the bottom regarding subscription prices.

Public filings with the U.S. Securities and Exchange Commission and or financial regulators are freely available, but users pay vendors to gather, clean, and format the data to make it usable, Rachel Carpenter, co-founder and CEO of Intrinio, told Markets Media

Rachel Carpenter, Intrino

“The reason data subscriptions cost so much is that a lot of the big vendors pull down public data and go through an arduous manual process to clean it up,” she explained.

Intrinio, a financial data startup and winner of 2017 Markets Media Startup Competition, has automated the process via machine learning.

Most of the files which Intrinio gathers, such as 10-Ks, 10-Qs, and bank regulatory reports, are tagged using the eXtensible Business Reporting Language and, to a lesser extent, eXtensible Markup Language.

Even though Intrinio is a proponent of the XBRL standard, Carpenter still finds much of the raw data to be messy.

“When you take something as complex as a financial statement, it is pretty impossible to tell all companies to file them in the same way,” she explained. “There are financial companies that make interest income, and then there are industrial companies that do not.”

Intrinio addresses the issue by running the data through millions of lines of code that identifies, tags, and categorizes the cleansed data into a standard tag set.

The vendor currently covers the US equities market and has expanded into non-US pricing data in the past six months. In the next six months, Intrino is looking to expand internationally into fundamentals, according to Carpenter.

Eventually giving away data may sound like a counter-intuitive business plan, for Intrinio, but the vendor plans to develop a business model similar to what Amazon.com has with its Amazon Shops partners but for financial data.

“When we started, we looked at what the larger vendors were doing and tried to do the opposite,” she said. “Most of it seemed wrong to us. Part of it is the bundling effect in which you are paying for everything regardless of the fact that you only might be using two types of data.”

To expand the ecosystems of possible partners, Intrinio has decided to stay out of the analytics space while actively courting the developer market. “We do not play in that space on purpose,” said Carpenter. “We want to provide data to developers and have them redistribute it when they build their apps.”

Ultimately, she would like to see Intrinio act as a clearinghouse for various small data and analytics offerings that will use Intrinio tag set and API. “It’s hard for some of these niche providers to sell crypto-currency APIs or blogger ratings just off their websites,” she said.

Related articles

  1. Regulators Target Cybercrime

    There is no standard approach to identify data that needs to be protected.

  2. Industries leading this year’s D&I Index Top 100 are banking, investment services & insurance.

  3. The new dataset combines traditional measures, such as EPS estimates, with ESG data and investor sentiment.

  4. With Ankit Mittal, Business Change Manager, Global Trading, Schroders

  5. Social data is more difficult to find as this component is growing in importance to end investors.