From 48e5710c689f2b57565f675d6c48a97cf5401773 Mon Sep 17 00:00:00 2001
From: Eric Coissac
Date: Mon, 2 Sep 2019 15:01:58 +0200
Subject: [PATCH] First version with a discussion

manuscript/main.Rnw  18 +++
manuscript/main.pdf  Bin 462808 > 0 bytes
manuscript/main.tex  23 +++++++
3 files changed, 10 insertions(+), 31 deletions()
diff git a/manuscript/main.Rnw b/manuscript/main.Rnw
index 8e5789f..d4f5216 100755
 a/manuscript/main.Rnw
+++ b/manuscript/main.Rnw
@@ 1263,24 +1263,12 @@ print(tab,
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion}
\begin{itemize}
\item Comme pour les autres coef déja publié, la méthode corrige efficacement le coed de correlation procrustean pour les données de grande dimmension.
\item il faut noter que cela fonctionne aussi pour les modèles linéraires réalisé sur un faible effectif.
\item Ce coefficient au carré représentant des part de variation partagées, il offre l'avantages sur les autres coefficient précédement corrigé d'être utilisale le cadre d'une analalyse de la variance des tableau de données.
\item l'éfficacité de la correction est moins forte pour l'estimation des coeficients partiels. Cependant les coefficients partiels théoriquement à zéro sont mieux prédit par notre estimateur
\end{itemize}
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
+Correcting the overadjustment effect on metrics assessing the relationship between high dimension datasets is a constant effort over the past decade. Therefore, $\irls$ can be considered as a continuation of the extension of the toolbox available to biologists for analyzing their omics data. The effect of the proposed correction on the classical $\rls$ coefficient is as strong as the other ones previously proposed for other correlation coefficients measuring relationship between vector data \citep[see Figure~\ref{fig:shared_variation}, e.g.][]{Smilde:09:00,SzeKely:13:00}. When applied to univariate data, $\rls$ is equal to the absolute value of the Pearson correlation coefficient, hence, and despite it is not the initial aim of that coefficient, $\irls$ can also be used to evaluate correlation between two univariate datasets. Using $\irls$ for such data sets is correcting for spurious correlations when the number of individual is small more efficiently than classical correction \citep[see Figure~\ref{fig:shared_variation_vector},][]{Theil:58:00}.
+The main advantage of $\irls$ over other matrix correlation coefficients is that it allows for estimating shared variation between two matrices according to the classical definition of variance partitioning used with linear models. This opens the opportunity to develop linear models to explain the variation of a high dimension dataset by a set of other high dimension data matrices.
+The second advantage of $\irls$ is that its definition implies that the variance/covariance matrix of a set of matrices is positivedefinite. That allows for estimating partial correlation coefficients matrix by inverting the variance/covariance matrix. The effect of the correction is less strong on such partial coefficients than on full correlation, but the partial coefficients that should theoretically be estimated to zero seem to be better identified after the correction.
diff git a/manuscript/main.pdf b/manuscript/main.pdf
index 83d367a..ba9be6f 100644
Binary files a/manuscript/main.pdf and b/manuscript/main.pdf differ
diff git a/manuscript/main.tex b/manuscript/main.tex
index ae92c26..fbfffb4 100644
 a/manuscript/main.tex
+++ b/manuscript/main.tex
@@ 353,7 +353,7 @@ To evaluate relative power of the three considered tests, pairs of to random mat
\begin{table}[!t]
\processtable{Estimation of $\overline{\rcovls(\X,\Y)}$ according to the number of random matrices (k) aligned.\label{tab:mrcovls}}{
% latex table generated in R 3.5.2 by xtable 1.84 package
% Fri Aug 23 14:31:42 2019
+% Mon Sep 2 14:59:46 2019
\begin{tabular}{rrrrrrr}
\hline
& & \multicolumn{2}{c}{normal} & & \multicolumn{2}{c}{exponential}\\ \cline{34} \cline{67}p & k &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd} & \multicolumn{1}{c}{ } &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd}\\\hline\multirow{3}{*}{10} & 10 & 0.5746 & $1.3687 \times 10^{2}$ & & 0.5705 & $1.1714 \times 10^{2}$ \\
@@ 455,7 +455,7 @@ whatever the $p$ tested (Table~\ref{tab:alpha_pvalue}). This ensure that the pro
of the distribution of $P_{values}$ correlation test to $\mathcal{U}(0,1)$
under the null hypothesis.\label{tab:alpha_pvalue}} {
% latex table generated in R 3.5.2 by xtable 1.84 package
% Fri Aug 23 14:31:45 2019
+% Mon Sep 2 14:59:50 2019
\begin{tabular*}{0.98\linewidth}{@{\extracolsep{\fill}}crrr}
\hline
& \multicolumn{3}{c}{CramerVon Mises p.value} \\
@@ 477,7 +477,7 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is
\begin{table}[!t]
\processtable{Power estimation of the procruste tests for two low level of shared variations $5\%$ and $10\%$.\label{tab:power}} {
% latex table generated in R 3.5.2 by xtable 1.84 package
% Fri Aug 23 14:31:45 2019
+% Mon Sep 2 14:59:50 2019
\begin{tabular}{lcrrrrrrrrr}
\hline
& $R^2$ & \multicolumn{4}{c}{5\%} & &\multicolumn{4}{c}{10\%} \\
@@ 509,17 +509,12 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion}
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
+Correcting the overadjustment effect on metrics assessing the relationship between high dimension datasets is a constant effort over the past decade. Therefore, $\irls$ can be considered as a continuation of the extension of the toolbox available to biologists for analyzing their omics data. The effect of the proposed correction on the classical $\rls$ coefficient is as strong as the other ones previously proposed for other correlation coefficients measuring relationship between vector data \citep[see Figure~\ref{fig:shared_variation}, e.g.][]{Smilde:09:00,SzeKely:13:00}. When applied to univariate data, $\rls$ is equal to the absolute value of the Pearson correlation coefficient, hence, and despite it is not the initial aim of that coefficient, $\irls$ can also be used to evaluate correlation between two univariate datasets. Using $\irls$ for such data sets is correcting for spurious correlations when the number of individual is small more efficiently than classical correction \citep[see Figure~\ref{fig:shared_variation_vector},][]{Theil:58:00}.
+The main advantage of $\irls$ over other matrix correlation coefficients is that it allows for estimating shared variation between two matrices according to the classical definition of variance partitioning used with linear models. This opens the opportunity to develop linear models to explain the variation of a high dimension dataset by a set of other high dimension data matrices.
+
+The second advantage of $\irls$ is that its definition implies that the variance/covariance matrix of a set of matrices is positivedefinite. That allows for estimating partial correlation coefficients matrix by inverting the variance/covariance matrix. The effect of the correction is less strong on such partial coefficients than on full correlation, but the partial coefficients that should theoretically be estimated to zero seem to be better identified after the correction.
@@ 531,10 +526,6 @@ Text Text Text Text Text Text Text Text.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%




\section{Conclusion}
A common approach to estimate strengh of the relationship between two variables is to estimate the part of shared variation. This single value ranging from zero to one is easy to interpret. Such value can also be computed between two sets of variable, but the estimation is more than for simple vector data subject to over estimation because the overfitting phenomena which is amplified for high dimensional data. With $\irls$ and its squared value, we propose an easy to compute correlation and determination coefficient far less biased than the original Procrustean correlation coefficient. Every needed function to estimate the proposed modified version of these coefficients are included in a R package ProcMod available for download from the Comprehensive R Archive Network (CRAN).

libgit2 0.26.0