A Needle in a Haystack


Capital markets are looking to advanced technologies to make sense out of Big Data.

While data might be the lifeblood of organizations, too much of it can threaten their livelihood unless it’s controlled.

“Large amounts of data can be a challenge,” said Rodney Comegys, head of the Index Analysis and ETF Trading teams in Vanguard Equity Investment Group. “You have to be careful not to overwhelm the team with unneeded or unwanted information where they might miss something of importance because it is buried in a haystack of additional data.”

Capital markets firms are harnessing new technology such as in-memory and noSQL databases to make sense of high-volume real-time event stream, perform analytics.

“These systems leverage massively parallel processing architectures, in-memory processing and appliance technologies for predictive analytics on structured information, Hadoop appliances for unstructured information and purpose-built appliances for simulations,” said Matt Benati, vice president of marketing at Attunity, a provider of real-time data integration software.

Technologies which could have an impact on Big Data in capital markets—such as massively parallel processing databases, MapReduce frameworks such as Apache Hadoop and NoSQL databases—are in the early stages of implementation.

MapReduce, introduced by Google in 2004, is a framework for processing huge datasets using a large number of computers.

Roji Oommen, Senior Director, at Savvis

“The capital markets industry needs to make sense out of huge amounts of data, and is starting to look at technologies which form the web,” said Emmanuel Carjat, managing director of TMX Atrium, a firm that connects trading platforms to institutions. “The industry is looking at how Google makes sense of billions of web pages.”


“These technologies are generally reserved for more complex implementations with extremely large data sets, involving petabytes of data,” said Matt Blakely, business intelligence technology practice lead at SWI, a software consulting company. “While there are certainly applications within capital markets, most firms are only just beginning to understand and utilize Big Data.”

LinkedIn uses a Hadoop cluster to power features such as “People you may know” and Twitter (which generates over one terabyte of tweets every single day) uses it for both storage and analytics, said Blakely.

“Using MapReduce, it’s possible to effectively store and analyze petabytes of data, something which is generally not possible with standard databases,” he said.

NoSQL databases are another method of handling Big Data. “NoSQL databases, as the name implies, do not use SQL as their primary query language,” said Blakely. “They are highly optimized for retrieve and append operations, but offer little functionality beyond key-value record storing.”

NoSQL is typically used when extreme performance and real-time data appending and retrieval is more important than consistent results and query flexibility, Blakely said.

Open-source data storage systems such as Hadoop and Cassandra are ideal for capital markets apps because they can process, store and trigger actions based on a high-volume real-time event stream, perform analytics on historical data, and update models directly into the application.

“A number of our customers are running projects to evaluate and test new tools such as Hadoop and Cassandra,” said Roji Oommen, senior director, business development for financial services at Savvis, a web hosting and service provider.

The Cassandra data model is designed for distributed data on a very large scale. In a relational database, data is stored in tables and the tables comprising an application are typically related to each other.

Cassandra is a column-oriented database, meaning that it stores its content by column rather than by row. This has advantages for heavy-duty number crunching apps that involve complex queries.

“Columnar databases are faster for processing time-series data than relational databases,” said Oommen. “Cassandra is an open-source columnar database, and firms are testing its applicability to tick data management.”

Hadoop is an open-source framework that allows for distributed processing of large data sets across clusters of computers. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

“Hadoop is a distributed computing framework developed by Yahoo,” said Oommen. “Hadoop distributes data and workload to commodity services and can scale arbitrarily large, up to exobytes.”

SAP, the business software company that acquired Sybase in 2010, has created its own Big Data solution, called High-Performance Analytic Appliance, or HANA.

Sybase is planning to incorporate design elements of HANA into its own in-memory analytics product, Sybase RAP.

“Analytics can run 1,000 times faster in memory,” said Neil McGovern, senior director of strategy and financial services at enterprise software maker Sybase. “In-memory makes the challenge of analyzing huge amounts of data easier.”

HANA is designed to capture massive amounts of transactional data in memory, and to provide flexible views of analytic information in seconds.

SAP, which also owns Business Objects, a leading business intelligence software company, created HANA to tackle the most data-intensive applications encountered by the customers of Business Objects. “Business intelligence is a key market segment for SAP,” said McGovern. “They needed to speed up analytics, and came up with HANA.”

Like HANA, RAP employs a columnar-type database that can outperform traditional databases for complex analytics, including time series and event-stream processing.

Related articles

  1. The first amendments to the CFTC's swap data reporting rules come into effect on December 5.

  2. CEDX is planning to expand its range of products in 2023, subject to regulatory approvals.

  3. The paper proposes a path forward for standard SLD documentation.

  4. Exchange group’s crypto suite has had consistent volume and open interest growth.

  5. The derivatives venue owned by FTX wanted to offer products that were not fully collateralized.