Commit 48e5710c by Eric Coissac

First version with a discussion

parent c1e99829
...@@ -1263,24 +1263,12 @@ print(tab, ...@@ -1263,24 +1263,12 @@ print(tab,
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion} \section{Discussion}
\begin{itemize}
\item Comme pour les autres coef déja publié, la méthode corrige efficacement le coed de correlation procrustean pour les données de grande dimmension.
\item il faut noter que cela fonctionne aussi pour les modèles linéraires réalisé sur un faible effectif.
\item Ce coefficient au carré représentant des part de variation partagées, il offre l'avantages sur les autres coefficient précédement corrigé d'être utilisale le cadre d'une analalyse de la variance des tableau de données.
\item l'éfficacité de la correction est moins forte pour l'estimation des coeficients partiels. Cependant les coefficients partiels théoriquement à zéro sont mieux prédit par notre estimateur
\end{itemize}
Text Text Text Text Text Text Text Text. Correcting the over-adjustment effect on metrics assessing the relationship between high dimension datasets is a constant effort over the past decade. Therefore, $\irls$ can be considered as a continuation of the extension of the toolbox available to biologists for analyzing their omics data. The effect of the proposed correction on the classical $\rls$ coefficient is as strong as the other ones previously proposed for other correlation coefficients measuring relationship between vector data \citep[see Figure~\ref{fig:shared_variation}, e.g.][]{Smilde:09:00,SzeKely:13:00}. When applied to univariate data, $\rls$ is equal to the absolute value of the Pearson correlation coefficient, hence, and despite it is not the initial aim of that coefficient, $\irls$ can also be used to evaluate correlation between two univariate datasets. Using $\irls$ for such data sets is correcting for spurious correlations when the number of individual is small more efficiently than classical correction \citep[see Figure~\ref{fig:shared_variation_vector},][]{Theil:58:00}.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
The main advantage of $\irls$ over other matrix correlation coefficients is that it allows for estimating shared variation between two matrices according to the classical definition of variance partitioning used with linear models. This opens the opportunity to develop linear models to explain the variation of a high dimension dataset by a set of other high dimension data matrices.
The second advantage of $\irls$ is that its definition implies that the variance/co-variance matrix of a set of matrices is positive-definite. That allows for estimating partial correlation coefficients matrix by inverting the variance/co-variance matrix. The effect of the correction is less strong on such partial coefficients than on full correlation, but the partial coefficients that should theoretically be estimated to zero seem to be better identified after the correction.
......
No preview for this file type
...@@ -353,7 +353,7 @@ To evaluate relative power of the three considered tests, pairs of to random mat ...@@ -353,7 +353,7 @@ To evaluate relative power of the three considered tests, pairs of to random mat
\begin{table}[!t] \begin{table}[!t]
\processtable{Estimation of $\overline{\rcovls(\X,\Y)}$ according to the number of random matrices (k) aligned.\label{tab:mrcovls}}{ \processtable{Estimation of $\overline{\rcovls(\X,\Y)}$ according to the number of random matrices (k) aligned.\label{tab:mrcovls}}{
% latex table generated in R 3.5.2 by xtable 1.8-4 package % latex table generated in R 3.5.2 by xtable 1.8-4 package
% Fri Aug 23 14:31:42 2019 % Mon Sep 2 14:59:46 2019
\begin{tabular}{rrrrrrr} \begin{tabular}{rrrrrrr}
\hline \hline
& & \multicolumn{2}{c}{normal} & & \multicolumn{2}{c}{exponential}\\ \cline{3-4} \cline{6-7}p & k &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd} & \multicolumn{1}{c}{ } &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd}\\\hline\multirow{3}{*}{10} & 10 & 0.5746 & $1.3687 \times 10^{-2}$ & & 0.5705 & $1.1714 \times 10^{-2}$ \\ & & \multicolumn{2}{c}{normal} & & \multicolumn{2}{c}{exponential}\\ \cline{3-4} \cline{6-7}p & k &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd} & \multicolumn{1}{c}{ } &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd}\\\hline\multirow{3}{*}{10} & 10 & 0.5746 & $1.3687 \times 10^{-2}$ & & 0.5705 & $1.1714 \times 10^{-2}$ \\
...@@ -455,7 +455,7 @@ whatever the $p$ tested (Table~\ref{tab:alpha_pvalue}). This ensure that the pro ...@@ -455,7 +455,7 @@ whatever the $p$ tested (Table~\ref{tab:alpha_pvalue}). This ensure that the pro
of the distribution of $P_{values}$ correlation test to $\mathcal{U}(0,1)$ of the distribution of $P_{values}$ correlation test to $\mathcal{U}(0,1)$
under the null hypothesis.\label{tab:alpha_pvalue}} { under the null hypothesis.\label{tab:alpha_pvalue}} {
% latex table generated in R 3.5.2 by xtable 1.8-4 package % latex table generated in R 3.5.2 by xtable 1.8-4 package
% Fri Aug 23 14:31:45 2019 % Mon Sep 2 14:59:50 2019
\begin{tabular*}{0.98\linewidth}{@{\extracolsep{\fill}}crrr} \begin{tabular*}{0.98\linewidth}{@{\extracolsep{\fill}}crrr}
\hline \hline
& \multicolumn{3}{c}{Cramer-Von Mises p.value} \\ & \multicolumn{3}{c}{Cramer-Von Mises p.value} \\
...@@ -477,7 +477,7 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is ...@@ -477,7 +477,7 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is
\begin{table}[!t] \begin{table}[!t]
\processtable{Power estimation of the procruste tests for two low level of shared variations $5\%$ and $10\%$.\label{tab:power}} { \processtable{Power estimation of the procruste tests for two low level of shared variations $5\%$ and $10\%$.\label{tab:power}} {
% latex table generated in R 3.5.2 by xtable 1.8-4 package % latex table generated in R 3.5.2 by xtable 1.8-4 package
% Fri Aug 23 14:31:45 2019 % Mon Sep 2 14:59:50 2019
\begin{tabular}{lcrrrrrrrrr} \begin{tabular}{lcrrrrrrrrr}
\hline \hline
& $R^2$ & \multicolumn{4}{c}{5\%} & &\multicolumn{4}{c}{10\%} \\ & $R^2$ & \multicolumn{4}{c}{5\%} & &\multicolumn{4}{c}{10\%} \\
...@@ -509,17 +509,12 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is ...@@ -509,17 +509,12 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion} \section{Discussion}
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Text Text Text Text Text Text Text Text.
Correcting the over-adjustment effect on metrics assessing the relationship between high dimension datasets is a constant effort over the past decade. Therefore, $\irls$ can be considered as a continuation of the extension of the toolbox available to biologists for analyzing their omics data. The effect of the proposed correction on the classical $\rls$ coefficient is as strong as the other ones previously proposed for other correlation coefficients measuring relationship between vector data \citep[see Figure~\ref{fig:shared_variation}, e.g.][]{Smilde:09:00,SzeKely:13:00}. When applied to univariate data, $\rls$ is equal to the absolute value of the Pearson correlation coefficient, hence, and despite it is not the initial aim of that coefficient, $\irls$ can also be used to evaluate correlation between two univariate datasets. Using $\irls$ for such data sets is correcting for spurious correlations when the number of individual is small more efficiently than classical correction \citep[see Figure~\ref{fig:shared_variation_vector},][]{Theil:58:00}.
The main advantage of $\irls$ over other matrix correlation coefficients is that it allows for estimating shared variation between two matrices according to the classical definition of variance partitioning used with linear models. This opens the opportunity to develop linear models to explain the variation of a high dimension dataset by a set of other high dimension data matrices.
The second advantage of $\irls$ is that its definition implies that the variance/co-variance matrix of a set of matrices is positive-definite. That allows for estimating partial correlation coefficients matrix by inverting the variance/co-variance matrix. The effect of the correction is less strong on such partial coefficients than on full correlation, but the partial coefficients that should theoretically be estimated to zero seem to be better identified after the correction.
...@@ -531,10 +526,6 @@ Text Text Text Text Text Text Text Text. ...@@ -531,10 +526,6 @@ Text Text Text Text Text Text Text Text.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Conclusion} \section{Conclusion}
A common approach to estimate strengh of the relationship between two variables is to estimate the part of shared variation. This single value ranging from zero to one is easy to interpret. Such value can also be computed between two sets of variable, but the estimation is more than for simple vector data subject to over estimation because the over-fitting phenomena which is amplified for high dimensional data. With $\irls$ and its squared value, we propose an easy to compute correlation and determination coefficient far less biased than the original Procrustean correlation coefficient. Every needed function to estimate the proposed modified version of these coefficients are included in a R package ProcMod available for download from the Comprehensive R Archive Network (CRAN). A common approach to estimate strengh of the relationship between two variables is to estimate the part of shared variation. This single value ranging from zero to one is easy to interpret. Such value can also be computed between two sets of variable, but the estimation is more than for simple vector data subject to over estimation because the over-fitting phenomena which is amplified for high dimensional data. With $\irls$ and its squared value, we propose an easy to compute correlation and determination coefficient far less biased than the original Procrustean correlation coefficient. Every needed function to estimate the proposed modified version of these coefficients are included in a R package ProcMod available for download from the Comprehensive R Archive Network (CRAN).
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment