T19 - 19th International Conference on Information Fusion

T19 Integration of Information to Identify Objects in Big Data

Length: 3 hours (half day)

Intended Audience: This tutorial is intended for both researchers and practitioners from a variety of areas, e.g., cancer research, health care, communication, business processes, and databases, who are interested in integration of information (including several data sets of the same type or data sets of distinct types) to filter noise in information and applying machine learning and statistical methods to identify objects of interest, e.g., the true mutations in DNA of cancer patients.

Description: Big data has tremendous potential to transform businesses and research but raises significant challenges in pre-processing and extracting useful information and information integration to identify objects of interest. In this tutorial, I will present some statistical methods/machine learning for fusion and analysis of big data in cancer research, e.g., DNA sequencing data, gene expression data (RNA-seq) from The Cancer Genome Atlas (TCGA), protein expression and clinical features of cancer patients. This tutorial aims to cover both useful statistical/data mining methods and the cutting-edge directions.

Topics include the following: (1) integration of data sets to filter noise in the information, (2) sampling of big data to reduce computational burden but retain certain prediction accuracy, (3) applying machine learning/statistics to identify true objects, e.g., true mutations in DNA sequencing data of cancer patients, and (4) integration of distinct types of information to identify objects, e.g., using DNA, RNA gene expression and protein data, and clinical features of cancer patients to find novel drug targets for cancers and identify prognosis markers of cancer patients.

Prerequisites: Basic knowledge of probability and statistics, data mining or databases will be helpful.

Presenter: Grace S. Shieh

Grace S. Shieh is a full research fellow/professor at Institute of Statistical Science, Academia Sinica/National Taiwan University. She received her PhD in Statistics from University of Wisconsin-Madison, taught at University of Missouri-Columbia in 1990-94, and joined ISS-AS since 1994; she branched into computational biology in 2000. Her research expertise includes integration of data (information), 2 information quality, machine learning, directional statistics and association. She has worked on problems of integrating distinct types of information (data) to uncover novel drug targets and find prognosis markers for cancers, preprocessing in information fusion, and integrating several data sets (especially the cutting-edge biotechnology such as next generation sequencing data) to identify true mutations in DNA, among others. Her research was funded by government agencies as well as IT companies such as Taiwan Semiconductor Manufacturing Company. She has published numerous papers and is an elected fellow of International Statistical Institute. She has served as a committee member, session chair, organizer and workshop/tutorial lecturer for numerous international conferences. She is also an associate editor for Statistical Methodology, Frontiers in Statistical Methodology and Genetics and STAT.

Back to Tutorials

T19 Integration of Information to Identify Objects in Big Data

Personal tools

Search

Banner