Inference with Sparse CCA

Many modern biomedical applications study association between complex factors, which can be  high-dimensional by nature, but the association itself can be captured via low-dimensional structures. Studies involving multiple biological factors such as genetic markers, gene expressions, and disease phenotypes is  typical example. The traditional first step in these analyses is the assessment of linear association, formally known as the sparse canonical correlation analysis (SCCA). While SCCA succeeds at enforcing low dimensional structure (sparsity) through the likes of  l1l1-penalty, it loses its amenability to classical inference such as testing and  confidence intervals  via traditional methods. However, proper inferential measures are essential for separating true signals from noise.


95 % confidence intervals of SCCA loadings for some proteins in the Cytokine-cytokine receptor interaction pathway. This pathway is involved in cell-growth, differentiation, and cancer progression. We studied the linear interaction between the group of genes and proteins that are involved in this pathway. See our paper "On Statistical Inference with High Dimensional Sparse CCA" for more details on the data and the findings.



1. We appeal to the popular debiasing technology to provide inferential guarantees to SCCA.  See our paper "On Statistical Inference with High Dimensional Sparse CCA" for more information. Our methods can be implemented using the R package de.bias.CCA. 

 2. We construct methodology for systematic variable selection. In the process, we discover that the variable selection procedure transcends from being computationally easy, to NP hard (subject to some recently popularized conjectures), to information-theoretically impossible as the low-dimensional structure becomes more complex. See our paper On Support Recovery with Sparse CCA: Information Theoretic and Computational Limits for more information. Our methods can be implemented using the R package Support.CCA.