| Line 7: | Line 7: | ||
| style="padding:2px;" | <h2 id="mp-tfa-h2" style="margin:3px; background:#fff7bd; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #f2ea7e; text-align:left; color:#000; padding:0.2em 0.4em;">T19 Integration of Information to Identify Objects in Big Data</h2> | | style="padding:2px;" | <h2 id="mp-tfa-h2" style="margin:3px; background:#fff7bd; font-family:inherit; font-size:120%; font-weight:bold; border:1px solid #f2ea7e; text-align:left; color:#000; padding:0.2em 0.4em;">T19 Integration of Information to Identify Objects in Big Data</h2> | ||
|- | |- | ||
| − | | style="color:#000;" | <div id="mp-tfa" style="padding:2px 5px"> | + | | style="color:#000;" | <div id="mp-tfa" style="padding:2px 5px">'''Length:''' 3 hours (half day) |
| − | + | ||
| − | '''Description:''' | + | '''Intended Audience:''' This tutorial is intended for both researchers and practitioners from a variety of areas, |
| − | + | e.g., cancer research, health care, communication, business processes, and | |
| − | '''Presenter:''' Grace S. Shieh | + | databases, who are interested in integration of information (including several data |
| + | sets of the same type or data sets of distinct types) to filter noise in information and | ||
| + | applying machine learning and statistical methods to identify objects of interest, e.g., | ||
| + | the true mutations in DNA of cancer patients. | ||
| + | |||
| + | '''Description:''' Big data has tremendous potential to transform businesses and research but raises | ||
| + | significant challenges in pre-processing and extracting useful information and | ||
| + | information integration to identify objects of interest. In this tutorial, I will present | ||
| + | some statistical methods/machine learning for fusion and analysis of big data in | ||
| + | cancer research, e.g., DNA sequencing data, gene expression data (RNA-seq) from | ||
| + | The Cancer Genome Atlas (TCGA), protein expression and clinical features of | ||
| + | cancer patients. This tutorial aims to cover both useful statistical/data mining | ||
| + | methods and the cutting-edge directions. | ||
| + | |||
| + | Topics include the following: (1) integration of data sets to filter noise in the | ||
| + | information, (2) sampling of big data to reduce computational burden but retain | ||
| + | certain prediction accuracy, (3) applying machine learning/statistics to identify true | ||
| + | objects, e.g., true mutations in DNA sequencing data of cancer patients, and (4) | ||
| + | integration of distinct types of information to identify objects, e.g., using DNA, | ||
| + | RNA gene expression and protein data, and clinical features of cancer patients to | ||
| + | find novel drug targets for cancers and identify prognosis markers of cancer patients. | ||
| + | |||
| + | '''Prerequisites:''' Basic knowledge of probability and statistics, data mining or | ||
| + | databases will be helpful. | ||
| + | |||
| + | '''Presenter:''' [mailto:gshieh@webmail.stat.sinica.edu.tw Grace S. Shieh] | ||
| + | |||
| + | '''Grace S. Shieh''' is a full research fellow/professor at Institute of Statistical Science, | ||
| + | Academia Sinica/National Taiwan University. She received her PhD in Statistics | ||
| + | from University of Wisconsin-Madison, taught at University of Missouri-Columbia | ||
| + | in 1990-94, and joined ISS-AS since 1994; she branched into computational biology | ||
| + | in 2000. Her research expertise includes integration of data (information), | ||
| + | 2 | ||
| + | information quality, machine learning, directional statistics and association. She has | ||
| + | worked on problems of integrating distinct types of information (data) to uncover | ||
| + | novel drug targets and find prognosis markers for cancers, preprocessing in | ||
| + | information fusion, and integrating several data sets (especially the cutting-edge | ||
| + | biotechnology such as next generation sequencing data) to identify true mutations in | ||
| + | DNA, among others. Her research was funded by government agencies as well as IT | ||
| + | companies such as Taiwan Semiconductor Manufacturing Company. She has | ||
| + | published numerous papers and is an elected fellow of International Statistical | ||
| + | Institute. She has served as a committee member, session chair, organizer and | ||
| + | workshop/tutorial lecturer for numerous international conferences. She is also an | ||
| + | associate editor for Statistical Methodology, Frontiers in Statistical Methodology | ||
| + | and Genetics and STAT. | ||
| + | <div align="right"> | ||
| + | [[Tutorials| Back to Tutorials]] | ||
| + | </div> | ||
</div> | </div> | ||
|- | |- | ||
Revision as of 13:21, 24 February 2016
|

