Case Study: Multimodal Analysis of Disparate Datasets Using a Graph Database

The Institute for Systems Biology studies the mechanisms of cancer cells to enable the development of highly targeted drugs. Their challenge is the sheer volume of information to be processed. Unstructured data from medical articles must be combined with structured data from genomic and proteomic databases as well as wet lab experiments. ISB used a graph database to combine these disparate data sources into a powerful tool for scientific discovery. The database enables researchers to formulate a theory, quickly test it against all available data, and interactively refine the query. A combination of statistical techniques to gain an overview of known genomic and proteomic interactions to sophisticated pattern-based queries to test specific hypotheses were used.

This talk will describe the construction of the graph dataset, and best practices for multimodal analysis of the dataset as well as the process that lead to a major scientific discovery.

Amar Shan holds the position of Product Marketing Director at YarcData, a subsidiary of supercomputer leader Cray Inc. In this role, he works closely with customers and Cray engineers to translate customers’ scientific and engineering computing demands into innovative, high performance systems. Previously, Shan was responsible for introducing the Cray XD1 supercomputer – a Linux/Opteron system designed specifically for high performance computing (HPC) applications. Shan leverages his experience in network communications and complex system design to bring innovation to the HPC market. He was the keynote speaker at the 19th Annual International Parallel and Distributed Processing Symposium in 2005 and is frequently invited to speak at HPC events around the world. Shan holds a Master of Mathematics in Artificial Intelligence from the University of Waterloo and Bachelor of Applied Science in Electrical Engineering and Computer Science from the University of British Columbia.