Difference between revisions of "Output files"

Latest revision as of 00:23, 4 November 2022

Default output files

General output files

lmt.log

General log file always generated. Provides information of the current state of operation.

Operation dependent output files

Solving the mixed model equation system

results.csv

The file contains the solutions for the mixed model equation system. The file has four columns:

factor name,
sub-factor name,
factor level id, and
solution.

Note that factor name s are derived according to the lmt factor naming convention, and sub-factor name s are user-defined and are extracted from the equation system. Further, the factor level id is the original as provided by the data, pedigree, etc. For variables undergoing polynomial expansions, e.g. sub-factor c in

 y=x*b+age(t(co(p(1,2))))*c+id*u(v(my_var(1)))

the sub-factor name will be expanded as well to sub-factor name_id , where "id" is the polynomial id. For the above example c would be expanded to c_1 and c_2 .

The file has as many records the system has equations. Fixed factor levels which have been omitted due to rank deficiencies are not printed.

For a random factor with user-defined name a with several correlated sub-factors one can recover the factor level effect matrices with the R code

results<-as.matrix(fread("results.csv")))
data<-matrix(results$V4[results$V1=="a"],ncol=length(unique(results$V2[results$V1=="a")),byrow=F)

PCG solver output files

so_conv.csv

Contains the iteration statistics with columns

iteration number
alpha
beta
$$||r_i M r_i||$$, where $$M$$ is the preconditioner matrix and $$r_i=Cx_i-b$$ for a system $$Cx=b$$
convergence criterion CD
convergence criterion CR
second per iteration

Variance component estimation using Gibbs sampling

<u.d. variance name>_sigma_SA.csv

The file contains the column-wise upper-triangular elements of the respective $$\Sigma$$ matrix estimated in each round. The file contains as many rows as MC-EM-REML iterations. Estimates for a $$\Sigma$$ matrix being part of variance structure named g , assuming 10,000 samples, a burn-in of 1,000 samples and a thinning of 50 samples, can be obtained by

d<-as.matrix(fread("g_sigma_SA.csv")))
n<-floor(sqrt(ncol(d)*2))
gMean<-matrix(0,n,n);gSd<-gMean
gMean[upper.tri(g,diag=TRUE)]<-colMeans(d[seq(1000,nrow(d),20),]);
gSd[upper.tri(g,diag=TRUE)]<-apply(d[seq(1000,nrow(d),20),],2,sd);

Variance component estimation using MC-EM-REML

mcemreml_conv.csv

Contains the iteration statistic with one parameter vector per iteration. The parameter vector contains the following elements:

seconds for the last MC-Em iteration
seconds for solving the equation system
seconds for sampling the traces
convergence criterion cd

<u.d. variance name>_sigma_SA.csv

The file contains the column-wise upper-triangular elements of the respective $$\Sigma$$ matrix estimated in each round. The file contains as many rows as MC-EM-REML iterations. Estimates at convergence for a $$\Sigma$$ matrix being part of variance structure named g maybe read into R by

d<-fread("g_sigma_SA.csv"))
n<-floor(sqrt(ncol(d)*2))
g<-matrix(0,n,n);g[upper.tri(g,diag=TRUE)]<-d[nrow(d),];

Variance component estimation using AI-REML-C

aic_conv.csv

Contains the iteration statistic with one parameter vector per iteration. The parameter vector contains the following elements:

iteration number
convergence criterion ng
convergence criterion cd
convergence criterion ll
log-likelihood
$$\beta$$
number of Levenber-Marquardt iterations
seconds for the last AI-REML iteration
$$\alpha$$

<u.d. variance name>_sigma_UPDATE.csv

The file contains the column-wise upper-triangular elements of the respective $$\Sigma$$ matrix estimated in each round. The file contains as many rows as AI-REML-C iterations plus one. The row before the last contains the co-variance estimates at convergence. The last row contains the approximate standard errors of the parameter estimates. Estimates at convergence for a $$\Sigma$$ matrix being part of variance structure named g maybe read into R by

d<-fread("g_sigma_UPDATE.csv"))
n<-floor(sqrt(ncol(d)*2))
g<-matrix(0,n,n);g[upper.tri(g,diag=TRUE)]<-d[nrow(d)-1,];

Note that when restarting lmt will not overwrite this file. Instead it will append records.

Intermediate output files

Files from processing pedigrees

"u.d. pedigree name"_sorted.csv

File contains a 3(ordinary pedigree) or 4(probabilistic pedigree) column matrix containing the sorted and renumbered pedigree generated from the original pedigree.

"u.d. pedigree name"_crossref.csv

File contains a vector of original ids of individuals in "u.d. pedigree name"_sorted.csv. That is, the original id of individual #1 in "u.d. pedigree name"_sorted.csv is located in record 1 of this file, etc.

"u.d. pedigree name".bin

Block file in binary format containing for blocks:

a: real vector of diagonal elements of $$A$$
ai: real vector of diagonal elements of $$A^{-1}$$
m: real vector of mendelian sampling terms
pe: 3 column integer matrix of the sorted and renumbered pedigree underlying $$A$$

Requested output files

Files from processing pedigrees

Genetic group regression matrix

lmt can write the genetic group regression matrix(aka $$Q$$) of a pedigree containing phantom parents to a user-defined file. For the necessary key string see here.

A $$Q$$ matrix written to Q.coocsv format maybe read into R

dim<-scan("Q.coocsv",n=2,sep=",")
Q<-matrix(0,d[1],d[2])
dat<-fread("Q.coocsv",skip=1)
Q[cbind(d$V1,d$V2)]<-d$V3

Files from building GRMs

Genomic relationship matrix

lmt can write a genomic relationship matrix to a user-nominated file after it has been constructed. Supported output file formats are csc and bin where the latter nominates a block file in binary format. In both cases only the upper triangular in column major order is written out. The matrix maybe reconstructed in R

d<-scan("mygrm.csv")
n<-floor(sqrt(length(d)*2))
G<-matrix(0,n,n)
G[upper.tri(G,diag=T)]<-d

Files from running AI-REML jobs

AI matrix, gradient vector and parameter vector

Upon provision of the respective switch in the instruction file lmt writes the AI matrix, gradient vector and parameter vector to files ai_ai.csv , ai_ja.csv and ai_pa.csv , respectively. Files ai_ai.csv and ai_ja.csv contain as many records as AI-REML iterations. File ai_pa.csv contains contains as many records as AI-REML iterations + 1, where the first record is the parameter vector at the start.

Each row of file ai_ai.csv contains the upper-triangular of the AI in column-major order. The AI matrix of say iteration 2 can be reconstructed by

d<-fread("ai_ai.csv"))
n<-floor(sqrt(ncol(d)*2))
ai<-matrix(0,n,n);ai[upper.tri(ai,diag=TRUE)]<-d[2,];

@@ Line 1: / Line 1: @@
-==General output files==
+==Default output files==
-==Operation dependent output files==
+===General output files===
-===Solving the mixed model equation system===
+====lmt.log====
-====results.csv====
+General log file always generated. Provides information of the current state of operation.
-The file contains the solutions for the mixed model equation system. The file has four columns: "factor name","sub-factor name","factor level id","solution".
+===Operation dependent output files===
-Note that "factor name"s are derived according to the lmt factor naming convention, and "sub-factor name"s are user-defined and are extracted from the equation system. Further, the "factor level id" is the original as provided by the data, pedigree, etc. For variables undergoing polynomial expansions, e.g. sub-factor {{cc|c}} in
+====Solving the mixed model equation system====
+=====results.csv=====
+The file contains the solutions for the mixed model equation system. The file has four columns:
+*factor name,
+*sub-factor name,
+*factor level id, and
+*solution.
+Note that {{cc|factor name}}s are derived according to the lmt factor naming convention, and {{cc|sub-factor name}}s are user-defined and are extracted from the equation system. Further, the {{cc|factor level id}} is the original as provided by the data, pedigree, etc. For variables undergoing polynomial expansions, e.g. sub-factor {{cc|c}} in
    y=x*b+age(t(co(p(1,2))))*c+id*u(v(my_var(1)))
-the sub-factor name will be expanded as well by {{cc|_id}}, where "id" is the polynomial id. For the above example {{cc|c}} would be expanded to {{cc|c_1}} and {{cc|c_2}}.
+the sub-factor name will be expanded as well to {{cc|sub-factor name_id}}, where "id" is the polynomial id. For the above example {{cc|c}} would be expanded to {{cc|c_1}} and {{cc|c_2}}.
 The file has as many records the system has equations. Fixed factor levels which have been omitted due to rank deficiencies are not printed.
-====results.bin====
+For a random factor with user-defined name {{cc|a}} with several correlated sub-factors one can recover the factor level effect matrices with the R code
-====PCG solver output files====
+<syntaxhighlight lang="R" line>
+results<-as.matrix(fread("results.csv")))
+data<-matrix(results$V4[results$V1=="a"],ncol=length(unique(results$V2[results$V1=="a")),byrow=F)
+</syntaxhighlight>
+=====PCG solver output files=====
+======so_conv.csv======
+Contains the iteration statistics with columns
+*iteration number
+*alpha
+*beta
+*$$||r_i M r_i||$$, where $$M$$ is the preconditioner matrix and $$r_i=Cx_i-b$$ for a system $$Cx=b$$
+*convergence criterion CD
+*convergence criterion CR
+*second per iteration
+====Variance component estimation using Gibbs sampling====
+=====<u.d. variance name>_sigma_SA.csv=====
+The file contains the column-wise upper-triangular elements of the respective $$\Sigma$$ matrix estimated in each round. The file contains as many rows as MC-EM-REML iterations.
+Estimates for a $$\Sigma$$ matrix being part of variance structure named {{cc|g}}, assuming 10,000 samples, a burn-in of 1,000 samples and a thinning of 50 samples, can be obtained by
+<syntaxhighlight lang="R" line>
+d<-as.matrix(fread("g_sigma_SA.csv")))
+n<-floor(sqrt(ncol(d)*2))
+gMean<-matrix(0,n,n);gSd<-gMean
+gMean[upper.tri(g,diag=TRUE)]<-colMeans(d[seq(1000,nrow(d),20),]);
+gSd[upper.tri(g,diag=TRUE)]<-apply(d[seq(1000,nrow(d),20),],2,sd);
+</syntaxhighlight>
+====Variance component estimation using MC-EM-REML====
+=====mcemreml_conv.csv=====
+Contains the iteration statistic with one parameter vector per iteration. The parameter vector contains the following elements:
+*seconds for the last MC-Em iteration
+*seconds for solving the equation system
+*seconds for sampling the traces
+*convergence criterion '''cd'''
+=====<u.d. variance name>_sigma_SA.csv=====
+The file contains the column-wise upper-triangular elements of the respective $$\Sigma$$ matrix estimated in each round. The file contains as many rows as MC-EM-REML iterations. Estimates at convergence for a $$\Sigma$$ matrix being part of variance structure named {{cc|g}} maybe read into R by
+<syntaxhighlight lang="R" line>
+d<-fread("g_sigma_SA.csv"))
+n<-floor(sqrt(ncol(d)*2))
+g<-matrix(0,n,n);g[upper.tri(g,diag=TRUE)]<-d[nrow(d),];
+</syntaxhighlight>
+====Variance component estimation using AI-REML-C====
+=====aic_conv.csv=====
+Contains the iteration statistic with one parameter vector per iteration. The parameter vector contains the following elements:
+*iteration number
+*convergence criterion '''ng'''
+*convergence criterion '''cd'''
+*convergence criterion '''ll'''
+*log-likelihood
+*[[Algorithms#REML_Iteration_mechanism|$$\beta$$]]
+*[[Algorithms#REML_Iteration_mechanism|number of Levenber-Marquardt iterations]]
+*seconds for the last AI-REML iteration
+*[[Algorithms#REML_Iteration_mechanism|$$\alpha$$]]
+=====<u.d. variance name>_sigma_UPDATE.csv=====
+The file contains the column-wise upper-triangular elements of the respective $$\Sigma$$ matrix estimated in each round. The file contains as many rows as AI-REML-C iterations plus one. The row before the last contains the co-variance estimates at convergence. The last row contains the approximate standard errors of the parameter estimates. Estimates at convergence for a $$\Sigma$$ matrix being part of variance structure named {{cc|g}} maybe read into R by
+<syntaxhighlight lang="R" line>
+d<-fread("g_sigma_UPDATE.csv"))
+n<-floor(sqrt(ncol(d)*2))
+g<-matrix(0,n,n);g[upper.tri(g,diag=TRUE)]<-d[nrow(d)-1,];
+</syntaxhighlight>
+Note that when restarting {{lmt}} will not overwrite this file. Instead it will append records.
+===Intermediate output files===
+====Files from processing pedigrees====
+====="u.d. pedigree name"_sorted.csv=====
+File contains a 3(ordinary pedigree) or 4(probabilistic pedigree) column matrix containing the sorted and renumbered pedigree generated from the original pedigree.
+====="u.d. pedigree name"_crossref.csv=====
+File contains a vector of original ids of individuals in '''"u.d. pedigree name"_sorted.csv'''. That is, the original id of individual #1 in '''"u.d. pedigree name"_sorted.csv''' is located in record 1 of this file, etc.
+====="u.d. pedigree name".bin=====
+Block file in binary format containing for blocks:
+*a: real vector of diagonal elements of $$A$$
+*ai: real vector of diagonal elements of $$A^{-1}$$
+*m: real vector of mendelian sampling terms
+*pe: 3 column integer matrix of the sorted and renumbered pedigree underlying $$A$$
+==Requested output files==
+===Files from processing pedigrees===
+====Genetic group regression matrix====
+lmt can write the genetic group regression matrix(aka $$Q$$) of a pedigree containing phantom parents to a user-defined file. For the necessary key string see [[Parameter_file_elements#<pedigree_name>|here]].
+A $$Q$$ matrix written to {{cc|Q.coocsv}} format maybe read into R
+<syntaxhighlight lang="R" line>
+dim<-scan("Q.coocsv",n=2,sep=",")
+Q<-matrix(0,d[1],d[2])
+dat<-fread("Q.coocsv",skip=1)
+Q[cbind(d$V1,d$V2)]<-d$V3
+</syntaxhighlight>
+===Files from building GRMs===
+====Genomic relationship matrix====
+lmt can write a genomic relationship matrix to a user-nominated file after it has been constructed. Supported output file formats are {{cc|csc}} and {{cc|bin}} where the latter nominates a block file in binary format. In both cases only the upper triangular in column major order is written out. The matrix maybe reconstructed in R
+<syntaxhighlight lang="R" line>
+d<-scan("mygrm.csv")
+n<-floor(sqrt(length(d)*2))
+G<-matrix(0,n,n)
+G[upper.tri(G,diag=T)]<-d
+</syntaxhighlight>
+===Files from running AI-REML jobs===
+====AI matrix, gradient vector and parameter vector====
+Upon provision of the respective [[parameter_file_elements#airemlc|switch]] in the instruction file {{lmt}} writes the AI matrix, gradient vector and parameter vector to files {{cc|ai_ai.csv}}, {{cc|ai_ja.csv}} and {{cc|ai_pa.csv}}, respectively. Files {{cc|ai_ai.csv}} and {{cc|ai_ja.csv}} contain as many records as AI-REML iterations. File {{cc|ai_pa.csv}} contains contains as many records as AI-REML iterations + 1, where the first record is the parameter vector at the start.
+Each row of file {{cc|ai_ai.csv}} contains the upper-triangular of the AI in column-major order. The AI matrix of say iteration 2 can be reconstructed by
+<syntaxhighlight lang="R" line>
+d<-fread("ai_ai.csv"))
+n<-floor(sqrt(ncol(d)*2))
+ai<-matrix(0,n,n);ai[upper.tri(ai,diag=TRUE)]<-d[2,];
+</syntaxhighlight>

Difference between revisions of "Output files"

Latest revision as of 00:23, 4 November 2022

Contents

Default output files

General output files

lmt.log

Operation dependent output files

Solving the mixed model equation system

results.csv

PCG solver output files

so_conv.csv

Variance component estimation using Gibbs sampling

<u.d. variance name>_sigma_SA.csv

Variance component estimation using MC-EM-REML

mcemreml_conv.csv

<u.d. variance name>_sigma_SA.csv

Variance component estimation using AI-REML-C

aic_conv.csv

<u.d. variance name>_sigma_UPDATE.csv

Intermediate output files

Files from processing pedigrees

"u.d. pedigree name"_sorted.csv

"u.d. pedigree name"_crossref.csv

"u.d. pedigree name".bin

Requested output files

Files from processing pedigrees

Genetic group regression matrix

Files from building GRMs

Genomic relationship matrix

Files from running AI-REML jobs

AI matrix, gradient vector and parameter vector

Navigation menu

Difference between revisions of "Output files"

Latest revision as of 00:23, 4 November 2022

Default output files

General output files

lmt.log

Operation dependent output files

Solving the mixed model equation system

results.csv

PCG solver output files

so_conv.csv

Variance component estimation using Gibbs sampling

<u.d. variance name>_sigma_SA.csv

Variance component estimation using MC-EM-REML

mcemreml_conv.csv

<u.d. variance name>_sigma_SA.csv

Variance component estimation using AI-REML-C

aic_conv.csv

<u.d. variance name>_sigma_UPDATE.csv

Intermediate output files

Files from processing pedigrees

"u.d. pedigree name"_sorted.csv

"u.d. pedigree name"_crossref.csv

"u.d. pedigree name".bin

Requested output files

Files from processing pedigrees

Genetic group regression matrix

Files from building GRMs

Genomic relationship matrix

Files from running AI-REML jobs

AI matrix, gradient vector and parameter vector

Navigation menu

Search