Tick By Tick
Specialized databases are starting to emerge for storing and analyzing time-series data.
Developing risk management, trading and quantitative analysis applications in the Big Data era presents unique technological challenges.
These include loading massive volumes of time series data from many markets and internal sources in real time, analyzing high-speed streaming data with very low latency and supporting rapid development and backtesting of quantitative models against years of historical data.
Traditional row-oriented relational database management systems are ideal for transaction-based applications such as order management, but are less well-suited to high-performance analytics, which requires the ability to access reams of data with as few disk read/writes as possible.
“When you have to store decades of market data, the volume of disk space needed expands geometrically,” said Neil McGovern, senior director of strategy and financial services at software provider Sybase. “In order to address the problem of increased performance for large dataset analytics, we’ve had to move beyond traditional relational database architecture.”
In a relational database, data is stored in tables and the tables comprising an application are typically related to each other. Such row-oriented databases are ideal for transaction-oriented applications, where each record contains numerous fields of data.
Column-oriented databases, on the other hand, are useful when queries are being run against large datasets containing a single type of data, such as tick data, where the data can be sorted and stored in an optimal fashion for searching.
“Column-oriented databases typically provide much low latency—by orders of magnitude—over traditional row-oriented databases,” said Hugh Heinsohn, vice-president of marketing at software company Panopticon.
The explosion of Big Data has affected all industries, but the capital markets have their own unique set of issues, such as the need to capture time-series data and merge it with real-time event processing systems.
“Tick data is voluminous and in order to perform useful analysis, the user must be able to play back a stream from storage in real-time, or often at speeds exceeding real-time,” said Heinsohn. “The low latency offered by databases like HANA, Kx, OneTick or Thomson Reuters Velocity Analytics make doing in-depth analysis of time series data vastly more efficient.”
OneMarketData’s flagship OneTick database is an enterprise-wide system for tick data collection, management and complex-event processing.
“OneTick can capture streaming market data from any source, clients receive ultra-low latency access to the latest tick data,” said Louis Lovas, director of solutions at data management provider OneMarketData. “It can collect every tick for all markets globally, regardless of asset class, data volume—including OPRA—peak data rates or type of data.”
Predictive analytics solutions are unlocking the power of Big Data.
“These systems leverage massively parallel processing architectures, in-memory processing and appliance technologies for predictive analytics on structured information, Hadoop appliances for unstructured information and purpose-built appliances for simulations,” said Matt Benati, vice-president of global marketing at software provider Attunity.
Traditional Big Data solutions are storage-centric processing models with relatively long computational or response cycles.
“This is because they were developed to handle non-latency sensitive data, often processing server log files where there is low value to receiving processing results in real-time and decisions are made days or weeks later,” said Richard Tibbetts, chief technology officer at software provider StreamBase Systems.
“These big data storage and analysis technologies have a place in capital markets, but we are seeing slow adoption because trading firms need so much more,” said Tibbetts.
Technologies which could have an impact are massively parallel processing databases, MapReduce frameworks such as Apache Hadoop and NoSQL databases.
“The problem was that traditional systems to store this massive amount of small data [relational databases] were no longer adequate to store this information,” said Tom Leyden, director of alliances and marketing at Amplidata, which provides databases for massive storage requirements.
Amplidata’s AmpliStor system consists of storage nodes and controller appliances, connected over 10 Gigabit or one Gigabit Ethernet networks. The system can be scaled from a small configuration of a few storage nodes up to thousands of nodes, accessed by hundreds of controllers to serve large groups of concurrent users.
These technologies, however, are generally reserved for more complex implementations with extremely large data sets (petabytes of data).
“While there are certainly applications within capital markets, most firms are only just beginning to understand and utilize Big Data, and thus initial implementations are more likely to use the aforementioned tools,” said Matt Blakely, business intelligence technology practice lead at SWI, a software consultancy.
Many trading firms are quite advanced in their storage, management and use of historical data, using tick store products like Reuters Velocity Analytics to store data for later analysis.
“These products have a sophisticated understanding of market data already, which will be difficult for Hadoop et al to match,” said Tibbetts at StreamBase.
At the same time, tick databases don’t extend naturally to unstructured data or massively parallel map reduce jobs.
“Until some storage and historical analytics technology bridges the gap the leading firms will use a combination of tick stores and Hadoop for historical data storage and analysis,” said Tibbetts.
Technological solutions, from parallel databases and column databases to in-memory databases, are important components of the larger picture.
“The key here is to understand how these tools are being used to bring data together from disparate sources like trade reports, risk reports and IM logs, to understand relationships with counterparties, calculate risk and more,” said Peter Duffy, chief technology officer at Sumerian, a provider of analytics software.
Data volumes can quickly overwhelm the capacity to consume it, and the resulting problems are manifold.
“Poorly designed or legacy systems can easily translate to spending inordinate amounts of time and resources processing data, outfitting new storage and scrambling to deploy new systems to meet the rising velocity,” said OneMarketData’s Lovas.
The consolidated quote system for corporate bonds has raised funds to expand outside the US.
SEC's proposed rule could result in dissemination of incomplete, inaccurate and misleading data.
SEC requires a review of data on non-listed securities before initiating or resuming quotes.
Broker-dealers will be able to meet the new SEC requirements.
Clients can access data from Refinitiv in the Charles River and State Street AlphaSM Data platforms.