Title: Big Data Analytics in Science
Big data analytics is the process of examining large amounts of data of a variety of types (big data) to uncover hidden patterns, unknown correlations and other useful information. Its revolutionary potential is now universally recognized. Data complexity, heterogeneity, scale, and timeliness make data analysis a clear bottleneck in many biomedical applications, due to the complexity of the patterns and lack of scalability of the underlying algorithms. Advanced machine learning and data mining algorithms are being developed to address one or more challenges listed above. It is typical that the complexity of potential patterns may grow exponentially with respect to the data complexity, and so is the size of the pattern space. To avoid an exhaustive search through the pattern space, machine learning and data mining algorithms usually employ a greedy approach to search for a local optimum in the solution space, or use a branch-and-bound approach to seek optimal solutions, and consequently, are often implemented as iterative or recursive procedures. To improve efficiency, these algorithms often exploit the dependencies between potential patterns to maximize in-memory computation and/or leverage special hardware for acceleration. These lead to strong data dependency, operation dependency, and hardware dependency, and sometimes ad hoc solutions that cannot be generalized to a broader scope. In this talk, I will present some open challenges faced by data scientist in biomedical fields and the current approaches taken to tackle these challenges.
Wei Wang is a professor in the Department of Computer Science at University of California, Los Angeles and the director of the Scalable Analytics Institute (ScAi). She received her PhD degree in Computer Science from the University of California, Los Angeles in 1999. She was a professor in Computer Science at the University of North Carolina at Chapel Hill from 2002 to 2012, and was a research staff member at the IBM T. J. Watson Research Center between 1999 and 2002. Dr. Wang's research interests include big data analytics, data mining, bioinformatics and computational biology, and databases. She has filed seven patents, and has published one monograph and more than one hundred seventy research papers in international journals and major peer-reviewed conference proceedings.
Dr. Wang received the IBM Invention Achievement Awards in 2000 and 2001. She was the recipient of an NSF Faculty Early Career Development (CAREER) Award in 2005. She was named a Microsoft Research New Faculty Fellow in 2005. She was honored with the 2007 Phillip and Ruth Hettleman Prize for Artistic and Scholarly Achievement at UNC. She was recognized with an IEEE ICDM Outstanding Service Award in 2012, an Okawa Foundation Research Award in 2013 and an ACM SIGKDD Service Award in 2016. Dr. Wang has been an associate editor of the IEEE Transactions on Knowledge and Data Engineering, IEEE Transactions on Big Data, ACM Transactions on Knowledge Discovery in Data, Journal of Knowledge and Information Systems, Data Mining and Knowledge Discovery, IEEE/ACM Transactions on Computational Biology and Bioinformatics, and International Journal of Knowledge Discovery in Bioinformatics. She serves on the organization and program committees of international conferences including ACM SIGMOD, ACM SIGKDD, ACM BCB, VLDB, ICDE, EDBT, ACM CIKM, IEEE ICDM, SIAM DM, SSDBM, RECOMB, BIBM. She was elected to the Board of Directors of the ACM Special Interest Group on Bioinformatics, Computational Biology, and Biomedical Informatics (SIGBio) in 2015.