Commit c560f73f by Eric Coissac

Corrections on manuscript

parent 56a00fdb
......@@ -5,3 +5,4 @@ main.log
oldsty
SimulationRls.R
main_manuscrit.pdf
bioinfo.log
\def\mode{1}% Class bioinfo if 0; simple article otherwise
\def\mode{0}% Class bioinfo if 0; simple article otherwise
\if 0\mode
\documentclass{bioinfo}%
......@@ -172,7 +172,7 @@ $\mathbf{U}$ and $\mathbf{V}$ are two rotation matrices allowing to compute the
\label{eq:CovLsSVD}
\end{equation}
This expression illustrates that actually $\covls(\X,\Y)$ is the variance of the projections of $\X$ on $\Y$ or of the reciproque projection. Therefore $\covls(\X,\Y)$ and $\rls(\X,\Y)$ are always positive and rotation independante. Here we propose to partitionate this variance in two components. A fisrt one corresponding to the actual shared information between $\X$ and $\Y$, and a second part that corresponds to what two random matrices of same structure than $\X$ and $\Y$ are sharing. This second part is estimated as $\overline{\rcovls(\X,\Y)}$ the mean of such random correlation. $\icovls(\X,\Y)$, the informative part of $\covls(\X,\Y)$, is computed using Equation~(\ref{eq:ICovLs}).
This expression illustrates that actually $\covls(\X,\Y)$ is the variance of the projections of $\X$ on $\Y$ or of the reciproque projection. Therefore $\covls(\X,\Y)$ and $\rls(\X,\Y)$ are always positive and rotation independante. Here we propose to partitionate $\trace(\mathbf{\Sigma})$ the variation amount corresponding to $\covls(\X,\Y)$ in two components. The fisrt one corresponds to the actual shared information between $\X$ and $\Y$. The second part corresponds to the over-fitting effect that can be estimated as the average variation shared by two random matrices of same structure than $\X$ and $\Y$ noted $\overline{\rcovls(\X,\Y)}$. $\icovls(\X,\Y)$, the informative part of $\covls(\X,\Y)$, is computed using Equation~(\ref{eq:ICovLs}).
\begin{equation}
\icovls(\X,\Y) = Max \left\{ \begin{aligned}
......@@ -301,8 +301,8 @@ h0_sims = array(0,dim = c(n_sim,length(p_qs),8))
for (k in seq_len(n_sim))
for (i in seq_along(p_qs)) {
X <- simulate_matrix(n_indivdual,p_qs[i],equal_var = TRUE)
Y <- simulate_matrix(n_indivdual,p_qs[i],equal_var = TRUE)
X <- rmatrix(n_indivdual,p_qs[i],equal_var = TRUE)
Y <- rmatrix(n_indivdual,p_qs[i],equal_var = TRUE)
h0_sims[k,i,1] <- ProcMod::corls(X,Y,nrand = 0)[1,2]
h0_sims[k,i,2] <- ProcMod::corls(X,Y,nrand = n_rand)[1,2]
......@@ -396,7 +396,7 @@ initial_var <- 3
n_indivdual <- 20
supplement_vars <- 0:50
X = simulate_matrix(n = n_indivdual,p = initial_var,equal_var = TRUE)
X = rmatrix(n = n_indivdual,p = initial_var,equal_var = TRUE)
Y = simulate_correlation(reference = X, p = initial_var, r2 = 0.4)
h1_sims_over = array(0,dim = c(length(supplement_vars),8))
......@@ -460,7 +460,7 @@ if (compute) {
for (i in seq_along(n_indivduals))
for (j in seq_along(p_qs))
for (r in seq_along(r2s)) {
X <- simulate_matrix(n_indivduals[i],
X <- rmatrix(n_indivduals[i],
p_qs[j],
equal_var = TRUE)
Y <- simulate_correlation(X,
......@@ -564,7 +564,7 @@ r2_sims_vec = array(0,dim = c(n_sim,
for (k in seq_len(n_sim)) {
for (i in seq_along(n_indivduals))
for (r in seq_along(r2s)) {
X <- simulate_matrix(n_indivduals[i],
X <- rmatrix(n_indivduals[i],
1,
equal_var = TRUE)
Y <- simulate_correlation(X,
......@@ -668,7 +668,17 @@ grid.draw(venn.plot)
\label{fig:nested_shared_variation}
\end{figure}
To evaluate the capacity of partial determination coefficient $\irls_{partial}^2$ to distangle nested correlations, four matrices $\mathbf{A},\,\mathbf{B},\,\mathbf{C},\,\mathbf{D}$ of size $n \times p = \Sexpr{n_indivdual} \times \Sexpr{p_q}$ are generated according to the schema: $\mathbf{A}$ shares $\Sexpr{round(r2_AB *100)}\%$ of variation with $\mathbf{B}$, that shares $\Sexpr{round(r2_BC *100)}\%$ of variation with $\mathbf{C}$, sharing $\Sexpr{round(r2_CD *100)}\%$ of variation with $\mathbf{D}$. These direct correlations induce indirect ones spreading the total variation among each pair of matricies according to Figure~\ref{fig:nested_shared_variation}. The simulation is repeadted $\Sexpr{n_sim}$ times, for every simutation $\irls_{partial}^2$ and $\rls_{partial}^2$ are estimated for each pair of matrices.
To evaluate the capacity of partial determination coefficient $\irls_{partial}^2$ to distangle nested correlations, a set of correlated matrices are generated. To generate two random matrices $\mathbf{A}$, $\mathbf{B}$ sharing $w \in [0,1]$ part of variation, i) two independent random matrices $\mathbf{A}$ \,\text{and}\, $\mathbf{\Delta}$ are generated such as $\varls(\mathbf{A})=1$ and $\varls(\mathbf{\Delta})=1$, ii) The $\mathbf{\Delta}_{rot}$ matrix is computed as the aligment of $\mathbf{\Delta}$ on $\mathbf{A}$ using the optimal procruste rotation, iii) Then $\mathbf{B}$ is computed using equation \ref{eq:ABCor} :
\begin{equation}
\mathbf{B} = \mathbf{A} \times \sqrt{w} + \mathbf{\Delta}_{rot} \times \sqrt{1 - w}.
\label{eq:ABCor}
\end{equation}
Following this method, four matrices $\mathbf{A},\,\mathbf{B},\,\mathbf{C},\,\text{and} \, \mathbf{D}$ of size $n \times p = \Sexpr{n_indivdual} \times \Sexpr{p_q}$ are generated according to the schema: $\mathbf{A}$ shares $\Sexpr{round(r2_AB *100)}\%$ of variation with $\mathbf{B}$, that shares $\Sexpr{round(r2_BC *100)}\%$ of variation with $\mathbf{C}$, sharing $\Sexpr{round(r2_CD *100)}\%$ of variation with $\mathbf{D}$. These direct correlations induce indirect ones spreading the total variation among each pair of matricies according to Figure~\ref{fig:nested_shared_variation}. The simulation is repeated $\Sexpr{n_sim}$ times, for every simutation $\irls_{partial}^2$ and $\rls_{partial}^2$ are estimated for each pair of matrices.
<<estimate_partial_r2, cache=TRUE, message=FALSE, warning=FALSE, include=FALSE, dependson="estimate_partial_r2_setting">>=
......@@ -689,7 +699,7 @@ if (compute) {
if (file.exists(filename)) {
partial_r2_sims[k, , , ] <- get(load(filename))
} else {
A <- simulate_matrix(n_indivdual,
A <- rmatrix(n_indivdual,
p_q,
equal_var = TRUE)
......@@ -799,8 +809,8 @@ h0_alpha = array(0,dim = c(n_sim,length(p_qs),3))
for (k in seq_len(n_sim))
for (i in seq_along(p_qs)) {
X <- simulate_matrix(n_indivdual,p_qs[i],equal_var = TRUE)
Y <- simulate_matrix(n_indivdual,p_qs[i],equal_var = TRUE)
X <- rmatrix(n_indivdual,p_qs[i],equal_var = TRUE)
Y <- rmatrix(n_indivdual,p_qs[i],equal_var = TRUE)
h0_alpha[k,i,1] <- attr(corls(X,Y,nrand = n_rand),"p.value")[1,2]
h0_alpha[k,i,2] <- vegan::protest(X,Y,permutations = n_rand)$signif
......@@ -863,7 +873,7 @@ if (compute) {
for (i in seq_along(n_indivduals))
for (j in seq_along(p_qs))
for (r in seq_along(r2s)) {
X <- simulate_matrix(n_indivduals[i],
X <- rmatrix(n_indivduals[i],
p_qs[j],
equal_var = TRUE)
Y <- simulate_correlation(X,
......@@ -1343,6 +1353,7 @@ $\X'$ & The transpose of $\X$. \\
$\X \Y$ & Matrix multiplication of $\X$ and $\Y$. \\
$\diag(\X)$ & A column matrix composed of the diagonal
elements of $\X$. \\
$\X^{1/2}$ & Matrix square root of $\X$. \\
$\trace(\X)$ & The trace of $\X$.
\end{tabular}
......
No preview for this file type
\def\mode{1}% Class bioinfo if 0; simple article otherwise
\def\mode{0}% Class bioinfo if 0; simple article otherwise
\if 0\mode
\documentclass{bioinfo}\usepackage[]{graphicx}\usepackage[]{color}
......@@ -195,7 +195,7 @@ $\mathbf{U}$ and $\mathbf{V}$ are two rotation matrices allowing to compute the
\label{eq:CovLsSVD}
\end{equation}
This expression illustrates that actually $\covls(\X,\Y)$ is the variance of the projections of $\X$ on $\Y$ or of the reciproque projection. Therefore $\covls(\X,\Y)$ and $\rls(\X,\Y)$ are always positive and rotation independante. Here we propose to partitionate this variance in two components. A fisrt one corresponding to the actual shared information between $\X$ and $\Y$, and a second part that corresponds to what two random matrices of same structure than $\X$ and $\Y$ are sharing. This second part is estimated as $\overline{\rcovls(\X,\Y)}$ the mean of such random correlation. $\icovls(\X,\Y)$, the informative part of $\covls(\X,\Y)$, is computed using Equation~(\ref{eq:ICovLs}).
This expression illustrates that actually $\covls(\X,\Y)$ is the variance of the projections of $\X$ on $\Y$ or of the reciproque projection. Therefore $\covls(\X,\Y)$ and $\rls(\X,\Y)$ are always positive and rotation independante. Here we propose to partitionate $\trace(\mathbf{\Sigma})$ the variation amount corresponding to $\covls(\X,\Y)$ in two components. The fisrt one corresponds to the actual shared information between $\X$ and $\Y$. The second part corresponds to the over-fitting effect that can be estimated as the average variation shared by two random matrices of same structure than $\X$ and $\Y$ noted as $\overline{\rcovls(\X,\Y)}$. $\icovls(\X,\Y)$, the informative part of $\covls(\X,\Y)$, is computed using Equation~(\ref{eq:ICovLs}).
\begin{equation}
\icovls(\X,\Y) = Max \left\{ \begin{aligned}
......@@ -319,7 +319,17 @@ To evaluate the strength of that over-estimation and the relative effect of the
\label{fig:nested_shared_variation}
\end{figure}
To evaluate the capacity of partial determination coefficient $\irls_{partial}^2$ to distangle nested correlations, four matrices $\mathbf{A},\,\mathbf{B},\,\mathbf{C},\,\mathbf{D}$ of size $n \times p = 20 \times 200$ are generated according to the schema: $\mathbf{A}$ shares $80\%$ of variation with $\mathbf{B}$, that shares $40\%$ of variation with $\mathbf{C}$, sharing $20\%$ of variation with $\mathbf{D}$. These direct correlations induce indirect ones spreading the total variation among each pair of matricies according to Figure~\ref{fig:nested_shared_variation}. The simulation is repeadted $100$ times, for every simutation $\irls_{partial}^2$ and $\rls_{partial}^2$ are estimated for each pair of matrices.
To evaluate the capacity of partial determination coefficient $\irls_{partial}^2$ to distangle nested correlations, a set of correlated matrices are generated. To generate two random matrices $\mathbf{A}$, $\mathbf{B}$ sharing $w \in [0,1]$ part of variation, i) two independent random matrices $\mathbf{A}$ \,\text{and}\, $\mathbf{\Delta}$ are generated such as $\varls(\mathbf{A})=1$ and $\varls(\mathbf{\Delta})=1$, ii) The $\mathbf{\Delta}_{rot}$ matrix is computed as the aligment of $\mathbf{\Delta}$ on $\mathbf{A}$ using the optimal procruste rotation, iii) Then $\mathbf{B}$ is computed using equation \ref{eq:ABCor} :
\begin{equation}
\mathbf{B} = \mathbf{A} \times \sqrt{w} + \mathbf{\Delta}_{rot} \times \sqrt{1 - w}.
\label{eq:ABCor}
\end{equation}
Following this method, four matrices $\mathbf{A},\,\mathbf{B},\,\mathbf{C},\,\text{and} \, \mathbf{D}$ of size $n \times p = 20 \times 200$ are generated according to the schema: $\mathbf{A}$ shares $80\%$ of variation with $\mathbf{B}$, that shares $40\%$ of variation with $\mathbf{C}$, sharing $20\%$ of variation with $\mathbf{D}$. These direct correlations induce indirect ones spreading the total variation among each pair of matricies according to Figure~\ref{fig:nested_shared_variation}. The simulation is repeated $100$ times, for every simutation $\irls_{partial}^2$ and $\rls_{partial}^2$ are estimated for each pair of matrices.
......@@ -373,7 +383,7 @@ To evaluate relative the power of the three considered tests, pairs of to random
\begin{table}[!t]
\processtable{Estimation of $\overline{\rcovls(\X,\Y)}$ according to the number of random matrices (k) aligned.\label{tab:mrcovls}}{
% latex table generated in R 3.5.2 by xtable 1.8-4 package
% Tue Oct 1 18:48:35 2019
% Tue Oct 15 16:49:45 2019
\begin{tabular}{rrrrrrr}
\hline
& & \multicolumn{2}{c}{normal} & & \multicolumn{2}{c}{exponential}\\ \cline{3-4} \cline{6-7}p & k &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd} & \multicolumn{1}{c}{ } &\multicolumn{1}{c}{mean} & \multicolumn{1}{c}{sd}\\\hline\multirow{3}{*}{10} & 10 & 0.5746 & $1.3687 \times 10^{-2}$ & & 0.5705 & $1.1714 \times 10^{-2}$ \\
......@@ -475,7 +485,7 @@ whatever the $p$ tested (Table~\ref{tab:alpha_pvalue}). This ensure that the pro
of the distribution of $P_{values}$ correlation test to $\mathcal{U}(0,1)$
under the null hypothesis.\label{tab:alpha_pvalue}} {
% latex table generated in R 3.5.2 by xtable 1.8-4 package
% Tue Oct 1 18:48:38 2019
% Tue Oct 15 16:49:49 2019
\begin{tabular*}{0.98\linewidth}{@{\extracolsep{\fill}}crrr}
\hline
& \multicolumn{3}{c}{Cramer-Von Mises p.value} \\
......@@ -497,7 +507,7 @@ Power of the $CovLs$ test based on the estimation of $\overline{RCovLs(X,Y)}$ is
\begin{table}[!t]
\processtable{Power estimation of the procruste tests for two low level of shared variations $5\%$ and $10\%$.\label{tab:power}} {
% latex table generated in R 3.5.2 by xtable 1.8-4 package
% Tue Oct 1 18:48:38 2019
% Tue Oct 15 16:49:49 2019
\begin{tabular}{lcrrrrrrrrr}
\hline
& $R^2$ & \multicolumn{4}{c}{5\%} & &\multicolumn{4}{c}{10\%} \\
......@@ -588,6 +598,7 @@ $\X'$ & The transpose of $\X$. \\
$\X \Y$ & Matrix multiplication of $\X$ and $\Y$. \\
$\diag(\X)$ & A column matrix composed of the diagonal
elements of $\X$. \\
$\X^{1/2}$ & Matrix square root of $\X$. \\
$\trace(\X)$ & The trace of $\X$.
\end{tabular}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment