Difference between revisions of "Examples"
(168 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
The | The examples provided in this section are meant to provide a practical examples about the {{lmt}} facilities and the parameter file syntax. It is assumed that the reader is familiar with [[Parameterfile1|section]] | ||
== | == Solving linear mixed model equations == | ||
=== Estimating a mean in a uni-variate model === | |||
Estimating a mean is equivalent to obtaining the generalized least square solution $$b=(X'R^{-1}X)^{-1}X'R^{-1}y$$ for model $$y=Xb+e$$, where $$y$$ is a vector of $$n$$ observations, $$X$$ is as single column matrix of $$1$$, $$b$$ is a fixed factor (mean), $$e$$ is the residual and $$y\sim N(Xb,R)$$ where $$R$$ is a $$n \times n$$ co-variance matrix. | |||
From the above it follows that for task of solving for $$b$$ {{lmt}} needs following information: | |||
the data | |||
the residual variance $$R$$ | |||
the model | |||
the solver | |||
Assume we have a data file "data.csv" with content: | Assume we have a data file "data.csv" with content: | ||
Line 17: | Line 22: | ||
where the columns are comma-separated, the first row is commented out with “#” but contains the header, and all other rows contain data records. | where the columns are comma-separated, the first row is commented out with “#” but contains the header, and all other rows contain data records. | ||
A valid {{lmt}} xml parameter file would look like: | A valid {{lmt}} xml parameter file would look like: | ||
<syntaxhighlight lang="xml" line> | <syntaxhighlight lang="xml" line highlight="5,27"> | ||
<root> | <root> | ||
<models> | |||
<eqn attributes="strings"> | |||
y=mu*b | |||
</eqn> | |||
</models> | |||
<data> | <data> | ||
datafile: data.csv | |||
missingthreshold: -50.0 | |||
</data> | </data> | ||
<vars> | <vars> | ||
<res> | <res> | ||
<sigma> | <sigma> | ||
<matrix> | <matrix attributes="matrix"> | ||
5.0 | 5.0 | ||
</matrix> | </matrix> | ||
Line 31: | Line 42: | ||
</res> | </res> | ||
</vars> | </vars> | ||
</root> | </root> | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Following the introduced [[Parameterfile1|parameterfile terminology]] tags {{cc|<data>}}, {{cc|<vars>}} and {{cc|<model>}} are automatic-compulsory. Since {{cc|solve}} is the default job and we are using the default solver in default parameterization no further information about the job or solver is required. | |||
The most important aspect is the model definition in tag <eqn>, nested inside tag | |||
<model> | The most important aspect is the model definition in tag {{cc|<eqn>}}, nested inside tag {{cc|<model>}} $$y=mu*b$$. The variable names used here are either defined by the data file header, or by the user. That is, $$y$$ and $$mu$$ are defined in the data file header, whereas $$b$$ is a user-defined factor name. Translated this means that the content of the data column named $$y$$ should be regressed on the content of the data column named $$mu$$ with the regression coefficient named $$b$$. | ||
by the data file header, or by the user. That is, $$y$$ and $$mu$$ are defined in the data | |||
file header, whereas $$b$$ is a user-defined | Since there are no further specifications supplied about $$y$$, $$mu$$ and $$b$$, it is assumed that $$y$$ is a continuous variable, $$mu$$ is a classification variable, and $$b$$ is fixed factor. | ||
content of the data column named $$y$$ should be regressed on the content of the data | The necessary variances are defined by the content of the automatic-compulsory tag {{cc|<vars>}}. {{lmt}} requires one compulsory variance, the residual variance, which must be specified via tag {{cc|<res>}}. Therefore tag {{cc|res}} is automatic-compulsory. | ||
column named $$mu$$ with the regression coefficient named $$b$$. Since there are no further | |||
specifications supplied about $$y$$, $$mu$$ and $$b$$, it is assumed that $$y$$ is continuous , $$mu$$ is a | The default {{lmt}} variance structure is [https://en.wikipedia.org/wiki/Kronecker_product $$\Gamma\otimes\Sigma$$], where $$\Gamma$$ and $$\Sigma$$ are specified inside tags {{cc|<gamma>}} and {{cc|<sigma>}}, respectively. | ||
classification variable, and $$b$$ is fixed factor. | However, only tag {{cc|<sigma>}} is [[Parameterfile1#Parsing logic|automatic-compulsory]], whereas tag {{cc|<gamma>}} is [[Parameterfile1#Parsing logic|automatic-optional]]. A missing {{cc|<gamma>}} tag implies that [https://en.wikipedia.org/wiki/Identity_matrix $$\Gamma = I$$]. Note that for {{lmt}} $$\Sigma$$ is always a matrix, that is a scalar $$\sigma^2$$ is treated as a matrix $$1 \times 1$$ matrix. | ||
For the above example, the variance specification inside {{cc|<res>}} implies that $$\Gamma\otimes \Sigma \equiv I\otimes \Sigma$$. Since $$\Sigma$$ is a $$1\times 1$$ matrix with $$\Sigma[1,1]=\sigma_e^2$$, $$R$$ reduces to $$I\sigma_e^2$$. | |||
residual variance, which must be specified via tag <res>. | Note tag {{cc|<matrix>}} nested in tag {{cc|<sigma>}}. The content of tag {{cc|<matrix>}} does not comply with the formatting rules as pointed o ut [[Parameterfile1#Key strings|above]]. That is {{cc|5.0}} is not a valid key string. To let {{lmt}} know that the content of tag {{cc|<matrix>}} should not be evaluated as a key string, with a subsequent error message, [[Parameterfile1#Escaping tag content formatting rules|the tag must have attributes]]. In this example {{cc|1=matrix attributes="matrix"}} escapes the content of tag {{cc|<matrix>}} from the formatting rules. | ||
<sigma>. | Further, tag {{cc|<matrix>}} is automatic-optional. This might be confusing because, as pointed out above, $$\Sigma$$ forms an indispensable part of $$\Gamma\otimes \Sigma$$. However, tag {{cc|<matrix>}} belongs to a [[Parameterfile1#Group of mutually exclusive information sources|group of mutually exclusive information sources]] of which members are tag {{cc|<matrix>}} and key string {{cc|file: yourfilename}}. That is, $$\Sigma$$ maybe either embedded in the parameter file or supplied via an external file. | ||
Note that the spelling of | Note that the spelling of most tags used in the above parameter file is determined by {{lmt}} and must be abide by. | ||
{{lmt}} and must be abide by | |||
== Estimating a mean and genetic | === Estimating a fixed mean and a random genetic effect in a uni-variate model === | ||
Consider the linear model $$y=Xb+Zu+e$$ where all variables are those declared in | Consider the linear model $$y=Xb+Zu+e$$ where all variables are those declared in [[#Estimating a mean]], $$u$$ is vector of length $$m$$ of random direct genetic effects and $$Z$$ is a design matrix of dimension $$n \times m$$ linking genetic effects to their respective observations. Note that $$u\sim N(0,A\sigma_a^2)$$ where $$A$$ is the pedigree-derived relationship matrix and forms the $$\Gamma$$ part in $$\Gamma\otimes\Sigma$$. A possible data file for such mode may look like: | ||
#y,mu,id | #y,mu,id | ||
Line 101: | Line 85: | ||
and a valid {{lmt}} parameter file: | and a valid {{lmt}} parameter file: | ||
<syntaxhighlight lang="xml" line> | <syntaxhighlight lang="xml" line highlight="6"> | ||
<root> | <root> | ||
<data> | <data> | ||
Line 107: | Line 91: | ||
</data> | </data> | ||
<pedigrees> | <pedigrees> | ||
pedigrees: | pedigrees: my_ped | ||
< | <my_ped> | ||
file: ped.csv | file: ped.csv | ||
</ | </my_ped> | ||
</pedigrees> | </pedigrees> | ||
<vars> | <vars> | ||
<res> | <res> | ||
<sigma> | <sigma> | ||
<matrix> | <matrix attributes="matrix"> | ||
5.0 | 5.0 | ||
</matrix> | </matrix> | ||
</sigma> | </sigma> | ||
</res> | </res> | ||
vars: myvar | vars: my_var | ||
< | <my_var> | ||
<sigma> | |||
file: sigma.csv | |||
</sigma> | |||
<gamma> | |||
<A> | |||
pedigree: my_ped | |||
</A> | |||
</gamma> | |||
</my_var> | |||
</vars> | |||
<model> | |||
<eqn attributes="strings"> | |||
y = mu*b + id*u(v(my_var(1))) | |||
</eqn> | |||
</model> | |||
</root> | |||
</syntaxhighlight> | |||
Compared with the parameter file in example [[#Estimating a mean]] the one above contains only a few extra elements. One this the automatic-optional {{cc|<pedigrees>}} nested inside tag {{cc|<root>}}. This tag contains a keystring {{cc|pedigrees: myped}}, where the user-defined variable behind {{cc|pedigrees:}} is the name of a nominated-compulsory tag nested inside tag {cc|<pedigrees>}}. This concept allows to supply several pedigrees to lmt (e.g. a normal pedigree and a genetic group pedigree). In our example we have only one pedigree named my_ped, with tag {{cc|<my_ped>}} containing the information about this pedigree. Another additional element is the key string {{cc|vars: my_var}} nested in tag {{cc|<vars>}} where the variable of key string {{cc|vars: my_var}} provides the tag names of nominated-compulsory tags, in this example tag {{cc|<my_var>}}. | |||
Tag {{cc|<myvar>}} consist of two structural components: the automatic-compulsory tag {{cc|<sigma>}} and the automatic-optional {{cc|<gamma>}}. Since the the variance of $$u=A\sigma_a^2$$, where $$A=\Gamma$$ and $$\sigma_a^2=\Sigma$$, a {{cc|<gamma>}} tag must be supplied to fully specify the variance. '''Note that if the {{cc|<gamma>}} tag is missing or miss-spelled {{lmt}} will assume that the variance of $$u=I\sigma_a^2$$'''. Tag {{cc|<gamma>}} contains a automatic-compulsory tag {{cc|<A>}} which specifies the $$\Gamma=A$$. Since $$A$$ is build from a pedigree, tag {{cc|<A>}} contains a compulsory key string {{cc|pedigree: my_ped}} which nominates pedigree in tag {{cc|<my_ped>}} to be used for building $$A$$. | |||
Note the difference between the tags {{cc|<sigma>}} nested in tag {{cc|<res>}} and {{cc|<my_var>}}. The former specifies {{cc|<sigma>}} to be provided by tag {{cc|1=<matrix attributes="matrix">}}, whereas the latter specifies {{cc|<sigma>}} to be provided by a file. | |||
The model section in the above parameter file need to communicate to to {{lmt}} that $$u$$ is a random factor with a variance $$A\sigma_a^2$$. This is done by extending the u.d. factor name {{cc|u}} in {{cc|1=y = mu*b + id*u(v(my_var(1)))}} by a specifier {{cc|(v(my_var(1)))}}. Note that without a specifier {{cc|u}} would be regarded as a fixed factor. The specifier {{cc|u(v)}} communicates that {{cc|u}} has a variance assigned. Further, {{cc|v}} has a specifier assigned via {{cc|v(my_var)}} which communicates that the name of the variance is {{cc|my_var}}. The variance in tag {{cc|<my_var>}} contains a {{cc|<gamma>}} and a {{cc|<sigma>}} component. The integer number inside bracket {{cc|my_var(1)}} communicates that $$\sigma_a^2$$ of {{cc|u}} is located in the first diagonal element of $$\Sigma$$. | |||
In summary construct {{cc|u(v(my_var(1)))}} communicates that | |||
*{{cc|u}} has a variance assigned | |||
*the variance is named {{cc|my_var}} | |||
*the variance is located in the first diagonal element of the $$\Sigma$$ matrix specified in tag {{cc|<sigma>}} nested in tag {{cc|<my_var>>}} | |||
=== Estimating fixed means and a random genetic effects in a multi-variate model === | |||
Consider the linear model | |||
$$ | |||
\left( | |||
\begin{array}{c} | |||
y_1 \\ | |||
y_2 | |||
\end{array} | |||
\right)= | |||
\left( | |||
\begin{array}{cc} | |||
X_1 & 0 \\ | |||
0 & X_2 \\ | |||
\end{array} | |||
\right) | |||
\left( | |||
\begin{array}{c} | |||
b_1 \\ | |||
b_2 | |||
\end{array} | |||
\right) | |||
+ | |||
\left( | |||
\begin{array}{cc} | |||
Z & 0 \\ | |||
0 & Z | |||
\end{array} | |||
\right) | |||
\left( | |||
\begin{array}{c} | |||
u_1 \\ | |||
u_2 | |||
\end{array} | |||
\right) | |||
+ | |||
\left( | |||
\begin{array}{cc} | |||
I & 0 \\ | |||
0 & I | |||
\end{array} | |||
\right) | |||
\left( | |||
\begin{array}{c} | |||
e_1 \\ | |||
e_2 | |||
\end{array} | |||
\right) | |||
$$ | |||
where all variables are those declared in [[#Estimating a mean and a random genetic effect in a uni-variate model|above]], and subscripts $$1$$ and $$2$$ index trait $$1$$ and $$2$$, respectively. | |||
Note that $$[u_1,u_2]\sim N([0,0],A\otimes \Sigma_a)$$ where $$A$$ is the pedigree-derived relationship matrix and | |||
$$ | |||
\Sigma_a= | |||
\left(\begin{array}{cc} | |||
\sigma_{a_1}^2 & \sigma_{a_1,a_2}\\ | |||
\sigma_{a_2,a_1} & \sigma_{a_2}^2 | |||
\end{array}\right) | |||
$$ | |||
Further, $$[e_1,e_2]\sim N([0,0],I\otimes \Sigma_e)$$ with | |||
$$ | |||
\Sigma_e= | |||
\left(\begin{array}{cc} | |||
\sigma_{e_1}^2 & \sigma_{e_1,e_2}\\ | |||
\sigma_{e_2,e_1} & \sigma_{e_2}^2 | |||
\end{array}\right) | |||
$$. | |||
A possible data file for such mode may look like: | |||
#y1,y2,mu,id | |||
25.0,0.8,1,5 | |||
33.1,0.5,1,6 | |||
36.0,1.5,1,7 | |||
28.3,3.6,1,8 | |||
and the pedigree files is that provided in example [[#Estimating a mean and a random genetic effect in a uni-variate model]]. | |||
and a valid {{lmt}} parameter file: | |||
<syntaxhighlight lang="xml" line highlight="6"> | |||
<root> | |||
<data> | |||
file: data.csv | |||
</data> | |||
<pedigrees> | |||
pedigrees: my_ped | |||
<my_ped> | |||
file: ped.csv | |||
</my_ped> | |||
</pedigrees> | |||
<vars> | |||
<res> | |||
<sigma> | |||
<matrix attributes="matrix"> | |||
5.0,0.8 | |||
0.8,1.2 | |||
</matrix> | |||
</sigma> | |||
</res> | |||
vars: my_var | |||
<my_var> | |||
<sigma> | <sigma> | ||
file: | file: sigma.csv | ||
</sigma> | </sigma> | ||
<gamma> | <gamma> | ||
<A> | <A> | ||
pedigree: | pedigree: my_ped | ||
</A> | </A> | ||
</gamma> | </gamma> | ||
</ | </my_var> | ||
</vars> | </vars> | ||
<model> | <model> | ||
<eqn attributes="strings"> | <eqn attributes="strings"> | ||
y1 = mu*b1 + id*u1(v(my_var(1))) | |||
y2 = mu*b2 + id*u2(v(my_var(2))) | |||
</eqn> | </eqn> | ||
</model> | </model> | ||
</root> | |||
</syntaxhighlight> | |||
== Example code chunks == | |||
'''The following code chunks are only subset of a full parameter file. It is assumed that all other parts of the instruction file are functional and all necessary input data are available and the that the data file columns have the respective names.''' | |||
=== Providing pedigrees === | |||
==== Providing a pedigree containing genetic groups ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
phantomparents: 2 | |||
... | |||
</a> | |||
</pedigrees> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Providing a pedigree containing metafounders ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
metafile: mymeta.csv | |||
... | |||
</a> | |||
</pedigrees> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Providing a probabilistic pedigree ==== | |||
<syntaxhighlight lang="xml" line highlight="6"> | |||
<root> | |||
... | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
switch: probabilistic | |||
... | |||
</a> | |||
</pedigrees> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Providing several pedigrees ==== | |||
<syntaxhighlight lang="xml"> | |||
<root> | |||
... | |||
<pedigrees> | |||
pedigrees: a,b | |||
<a> | |||
... | |||
</a> | |||
<b> | |||
... | |||
</b> | |||
</pedigrees> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Providing Genotypes === | |||
==== Providing external allele frequencies ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
pqfile: mypq.csv <!-- file must contain a column vector of 2p --> | |||
</a> | |||
</genotypes> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Providing several genotype files ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<genotypes> | |||
genotypes: a,b | |||
<a> | |||
... | |||
</a> | |||
<b> | |||
... | |||
</b> | |||
</genotypes> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Providing GRMs === | |||
==== Constructing GRM from genotypes ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
</a> | |||
</genotypes> | |||
<grms> | |||
grms: x | |||
<x> | |||
genotype: a | |||
... | |||
</x> | |||
</grms> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Overriding the default GRM construction method ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
</a> | |||
</genotypes> | |||
<grms> | |||
grms: x | |||
<x> | |||
genotype: a | |||
method: YA <!-- method is now "Yang"("VanRaden2") --> | |||
... | |||
</x> | |||
</grms> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Constructing a GRM from file ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<grms> | |||
grms: x | |||
<x> | |||
file: mygrm.csv | |||
... | |||
</x> | |||
</grms> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Constructing several GRMs ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<genotypes> | |||
genotypes: a,b | |||
<a> | |||
... | |||
</a> | |||
<b> | |||
... | |||
</b> | |||
</genotypes> | |||
<grms> | |||
grms: x,y | |||
<x> | |||
genotype: a | |||
... | |||
</x> | |||
<y> | |||
genotype: b | |||
... | |||
</y> | |||
</grms> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Single step models === | |||
==== ssGBLUP model with GRM build from genotypes==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
... | |||
</a> | |||
<pedigrees> | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
pedigree: a | |||
</a> | |||
</genotypes> | |||
<grms> | |||
grms: x | |||
<x> | |||
genotype: a | |||
... | |||
</x> | |||
</grms> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
... | |||
<gamma> | |||
<H> | |||
... | |||
grm: x | |||
pedigree: a | |||
aweight: 0.05 | |||
</H> | |||
</gamma> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*u1(v(g(1)) | |||
y2=mu*b2+id*u1(v(g(2)) | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== ssGBLUP model with GRM supplied externally ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
... | |||
</a> | |||
<pedigrees> | |||
<grms> | |||
grms: x | |||
<x> | |||
file: mygrm.bin | |||
pedigree: a | |||
cross: id.csv | |||
</x> | |||
</grms> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
... | |||
<gamma> | |||
<H> | |||
... | |||
grm: x | |||
pedigree: a | |||
aweight: 0.05 | |||
</H> | |||
</gamma> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*u1(v(g(1)) | |||
y2=mu*b2+id*u1(v(g(2)) | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== ssGTBLUP model ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
file: pedigree.csv | |||
... | |||
</a> | |||
<pedigrees> | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
file: mygeno.txt | |||
pedigree: a | |||
cross: ids.csv | |||
... | |||
</a> | |||
</genotypes> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
... | |||
<gamma> | |||
<H> | |||
... | |||
type: tblup | |||
genotype: a | |||
pedigree: a | |||
aweight: 0.05 | |||
</H> | |||
</gamma> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*u1(v(g(1)) | |||
y2=mu*b2+id*u1(v(g(2)) | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== ssSNPBLUP model ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
... | |||
</a> | |||
<pedigrees> | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
pedigree: a | |||
</a> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
type: snpblup1 | |||
genotype: a | |||
aweight: 0.05 | |||
switch: adjustg2a | |||
<sigma> | |||
file: covG.csv | |||
</sigma> | |||
<marker_sb1> | |||
<sigma> | |||
file: covG.csv | |||
</sigma> | |||
</marker_sb1> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn> | |||
y1=mu*b1+individual*u1(v(g(1))+dam*m1(v(g(2)) | |||
y2=mu*b2+individual*u2(v(g(3))+dam*m2(v(g(4)) | |||
</eqn> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
==== ssSNPBLUP model with meta-founders ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
... | |||
metafile: mymeta.csv <!-- contains an nxn meta-founder co-variance matrix --> | |||
</a> | |||
<pedigrees> | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
pedigree: a | |||
pqfile: myp.csv <!-- contains a column vector of 1 which implies p=0.5--> | |||
</a> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
type: snpblup1 | |||
genotype: a | |||
aweight: 0.05 | |||
<sigma> | |||
file: covG.csv | |||
</sigma> | |||
<marker_sb1> | |||
<sigma> | |||
file: covG.csv | |||
</sigma> | |||
</marker_sb1> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn> | |||
y1=mu*b1+individual*u1(v(g(1))+dam*m1(v(g(2)) | |||
y2=mu*b2+individual*u2(v(g(3))+dam*m2(v(g(4)) | |||
</eqn> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
==== ssSNPBLUP model with a separate polygenic factor ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
... | |||
</a> | |||
<pedigrees> | |||
<genotypes> | |||
genotypes: a | |||
<a> | |||
... | |||
pedigree: a | |||
</a> | |||
<vars> | |||
... | |||
vars: a,g | |||
<a> | |||
<sigma> | |||
file: cov_polygenic.csv <!-- assumes that the polygenic weight has been absorbed into sigma --> | |||
</sigma> | |||
<gamma> | |||
<A> | |||
pedigree: a | |||
</A> | |||
</gamma> | |||
</a> | |||
<g> | |||
type: snpblup1 | |||
genotype: a | |||
aweight: 0.001 <!-- small "dummy" value required for the variance formulation --> | |||
switch: adjustg2a | |||
<sigma> | |||
file: cov_genomic.csv <!-- assumes that the genomic weight has been absorbed into sigma --> | |||
</sigma> | |||
<marker_sb1> | |||
<sigma> | |||
file: cov_genomic.csv | |||
</sigma> | |||
</marker_sb1> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn> | |||
y1=mu*b1+individual*ug1(v(g(1))+dam*mg1(v(g(2))+individual*ua1(v(a(1))+dam*ma1(v(a(2)) | |||
y2=mu*b2+individual*ug2(v(g(3))+dam*mg2(v(g(4))+individual*ua2(v(a(3))+dam*ma2(v(a(4)) | |||
</eqn> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
==== ssGBLUP with two genomic factors ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
... | |||
</a> | |||
<pedigrees> | |||
<genotypes> | |||
genotypes: a,b | |||
<a> | |||
... | |||
pedigree: a | |||
</a> | |||
<b> | |||
... | |||
pedigree: a | |||
</b> | |||
</genotypes> | |||
<grms> | |||
grms: x,y | |||
<x> | |||
genotype: a | |||
... | |||
</x> | |||
<y> | |||
genotype: b | |||
... | |||
</y> | |||
</grms> | |||
<vars> | |||
... | |||
vars: g1,g2 | |||
<g1> | |||
... | |||
<gamma> | |||
<H> | |||
... | |||
grm: x | |||
</H> | |||
</gamma> | |||
</g1> | |||
<g2> | |||
... | |||
<gamma> | |||
<H> | |||
... | |||
grm: y | |||
</H> | |||
</gamma> | |||
</g2> | |||
</vars> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*u11(v(g1(1))+id*u21(v(g2(1)) | |||
y2=mu*b2+id*u12(v(g1(2))+id*u22(v(g2(2)) | |||
</eqn> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
=== Regression on continuous co-variables === | |||
==== Linear regression ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+age(t(co))*age1 | |||
y2=mu*b2+age(t(co))*age2 | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== User-defined polynomial expansion ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+age(t(co(p(1,2))))*age1 | |||
y2=mu*b2+age(t(co(p(1,2))))*age2 | |||
</eqn> | |||
<eqn attributes="strings"> | |||
x^1 | |||
log(sqrt(x)) | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Using hard-coded Legendre polynomials ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+age(t(co(p(1,2,3))))*age1 | |||
y2=mu*b2+age(t(co(p(1,2))))*age2 | |||
</eqn> | |||
<eqn attributes="strings"> | |||
l0 | |||
l1 | |||
l2 | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Nested co-variables ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
weaningweight=mu*b1+age(t(co(p(1,2);n(sex))))*age | |||
intramuscularfatcontent=mu*b2+weight(t(co(p(1,2);n(sex))))*weight | |||
</eqn> | |||
<eqn attributes="strings"> | |||
x^1 | |||
x^2 | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Random-regression models === | |||
==== Nested continuous random co-variables ==== | |||
{{cc|days}} is a co-variable which is nested within {{cc|individual}} or {{cc|dam}}. | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+days(t(co(n(individual))))*u1(v(g(1))+days(t(co(n(dam))))*m1(v(g(2)) | |||
y2=mu*b2+days(t(co(n(individual))))*u2(v(g(3))+days(t(co(n(dam))))*m2(v(g(4)) | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Nested continuous random co-variables with polynomial expansion ==== | |||
Similar to [[#Nested continuous co-variables|Nested continuous co-variables]] but {{cc|days}} is expanded | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+days(t(co(p(1,2,3);n(individual))))*u1(v(g(1,2,3))+days(t(co(p(1,2,3);n(dam))))*m1(v(g(4,5,6)) | |||
y2=mu*b2+days(t(co(p(1,2,3);n(individual))))*u2(v(g(7,8,9))+days(t(co(p(1,2,3);n(dam))))*m2(v(g(10,11,12)) | |||
</eqn> | |||
<poly attributes="strings"> | |||
<! Legendre polynomials order 0, 1 and 2> | |||
l0 | |||
l1 | |||
l2 | |||
</poly> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Nested continuous co-variables with polynomial expansion and an integer co-variable ==== | |||
Similar to [[#Nested continuous co-variables with polynomial expansion|Nested continuous co-variables with polynomial expansion]], but an additional information {{cc|t(i)}} is provided telling {{lmt}} that {{cc|days}} is actually an integer. While the results do not differ from [[#Nested continuous co-variables with polynomial expansion|Nested continuous co-variables with polynomial expansion]] {{lmt}} can exploit this information for memory efficiency. | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+days(t(co(t(i);p(1,2,3);n(individual))))*u1(v(g(1,2,3))+days(t(co(t(i);p(1,2,3);n(dam))))*m1(v(g(7,8,9)) | |||
y2=mu*b2+days(t(co(t(i);p(1,2,3);n(individual))))*u2(v(g(4,5,6))+days(t(co(t(i);p(1,2,3);n(dam))))*m2(v(g(10,11,12)) | |||
</eqn> | |||
<poly attributes="strings"> | |||
<! Legendre polynomials of order 0, 1 and 2> | |||
l0 | |||
l1 | |||
l2 | |||
</poly> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
=== Defining equivalent models with genetic groups === | |||
Note that in the parameterization provided below [[#Defining a model with absorbed genetic groups|absorbed genetic groups]] and [[#Defining a model with genetic groups as extra factor|genetic groups as extra factor]] must yield the same results. However, only when using {{cc|absorbed genetic groups}} the factor level solutions are the actual breeding values. When modelling genetic groups as an extra factor genetic factor solutions and genetic group factor solutions must be added by the user. | |||
==== Defining a model with absorbed genetic groups ==== | |||
Note that the only information necessary is the number of phantom parents '''at the top of the pedigree'''({{cc|phantomparents: 10}}) and the information to the variance that the it should be constructed allowing for genetic groups({{cc|switch gg}}). | |||
<syntaxhighlight lang="xml" line highlight="6,19"> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a | |||
<a> | |||
file: myped.csv | |||
phantomparents: 10 | |||
</a> | |||
<pedigrees> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
<sigma> | |||
file: myG.csv <! must contain a 4x4 matrix> | |||
</sigma> | |||
<gamma> | |||
<A> | |||
pedigree: a | |||
switch: gg | |||
</A> | |||
</gamma> | |||
</g> | |||
</vars> | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*id1(v(g(1))+dam*dam1(v(g(3)) | |||
y2=mu*b2+id*id2(v(g(2))+dam*dam2(v(g(4)) | |||
</eqn> | |||
</models> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== Defining a model with genetic groups as extra random factor ==== | |||
Genetic groups are defined as an extra factor, which requires an extra variance({{cc|gg}}) and two pedigrees, the genetic group pedigree({{cc|a}}) and the normal pedigree({{cc|b}}). For a model equivalent to [[#Defining a model with absorbed genetic groups|absorption]] pedigree {{cc|b}} must be a subset of pedigree {{cc|a}}. Further, if breeding values are required {{lmt}} can provide the genetic group regression matrix {{cc|qfile: Q.coocsv}}. | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a,b | |||
<a> | |||
file: ggped.csv | |||
phantomparents: 10 | |||
qfile: Q.coocsv | |||
</a> | |||
<b> | |||
file: ped.csv | |||
</b> | |||
<pedigrees> | |||
<vars> | |||
... | |||
vars: g,gg | |||
<g> | |||
<sigma> | |||
file: myG.csv <! must contain a 4x4 matrix> | |||
</sigma> | |||
<gamma> | |||
<A> | |||
pedigree: b | |||
</A> | |||
</gamma> | |||
</g> | |||
<gg> | |||
<sigma> | |||
file: myG.csv <! must contain a 4x4 matrix. should be the same as for "g"> | |||
</sigma> | |||
</gg> | |||
</vars> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*id1(v(g(1))+dam*dam1(v(g(3))+id(t(gg(a)))*idgg1(v(gg(1))+dam(t(gg(a)))*damgg1(v(gg(3)) | |||
y2=mu*b2+id*id2(v(g(2))+dam*dam2(v(g(4))+id(t(gg(a)))*idgg2(v(gg(2))+dam(t(gg(a)))*damgg2(v(gg(4)) | |||
</eqn> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
==== Defining a model with fixed genetic groups ==== | |||
Fixed genetic groups are only supported if modeled as an extra factor. Therefore, the model is similar to [[#Defining a model with genetic groups as extra random factor|above]], but the extra variance is omitted. Note that when modeling genetic groups as fixed it is the users responsibility to omit one group from the respective pedigree to ensure that $$X$$ is of full column rank. [[#Linear models in lmt:Column rank of $$X$$|bla]] | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
<pedigrees> | |||
pedigrees: a,b | |||
<a> | |||
file: ggped.csv | |||
phantomparents: 10 | |||
qfile: Q.coocsv | |||
</a> | |||
<b> | |||
file: ped.csv | |||
</b> | |||
<pedigrees> | |||
<vars> | |||
... | |||
vars: g | |||
<g> | |||
<sigma> | |||
file: myG.csv <! must contain a 4x4 matrix> | |||
</sigma> | |||
<gamma> | |||
<A> | |||
pedigree: b | |||
</A> | |||
</gamma> | |||
</g> | |||
</vars> | |||
... | |||
<models> | |||
<eqn attributes="strings"> | |||
y1=mu*b1+id*id1(v(g(1))+dam*dam1(v(g(3))+id(t(gg(a)))*idgg1+dam(t(gg(a)))*damgg1 | |||
y2=mu*b2+id*id2(v(g(2))+dam*dam2(v(g(4))+id(t(gg(a)))*idgg2+dam(t(gg(a)))*damgg2 | |||
</eqn> | |||
</models> | |||
</root> | |||
</syntaxhighlight> | |||
=== Override the default job parameters === | |||
<syntaxhighlight lang="xml" line highlight="6"> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: default | |||
<default> | |||
conv: -9.21 <! log(10e-5)> | |||
</default> | |||
</jobs> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Use job "solve" instead of "default" === | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: solve | |||
<solve> | |||
solver: x | |||
</solve> | |||
</jobs> | |||
<solvers> | |||
solvers: x | |||
<x> | |||
<!-- since is nothing inhere "x" will be of default type: preconditioned gradient solver --> | |||
</x> | |||
</solvers> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Use a direct solver in stead of the default solver === | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | <jobs> | ||
jobs: solve | jobs: solve | ||
<solve> | <solve> | ||
solver: | solver: x | ||
</solve> | </solve> | ||
</jobs> | </jobs> | ||
<solvers> | <solvers> | ||
solvers: | solvers: x | ||
< | <x> | ||
<direct> | |||
</direct> | |||
</x> | |||
</solvers> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Estimating variance components === | |||
==== Gibbs sampling ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: sample | |||
<sample> | |||
sampler: x | |||
</sample> | |||
</jobs> | |||
<samplers> | |||
samplers: x | |||
<x> | |||
<blocked> | |||
samples: 100000 | |||
</blocked> | |||
</x> | |||
</samplers> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== MC-EM-REML ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: mcemreml | |||
<mcemreml> | |||
conv: -9.21034 | |||
rounds: 300 | |||
sampler: x | |||
solver: y | |||
</mcemreml> | |||
</jobs> | |||
<solvers> | |||
solvers: y | |||
<y> | |||
<pcgiod> | <pcgiod> | ||
conv: -16.1181 | |||
</pcgiod> | </pcgiod> | ||
</ | </y> | ||
</solvers> | </solvers> | ||
<samplers> | |||
samplers: x | |||
<x> | |||
<pe> | |||
samples: 50 | |||
switch: trace | |||
chains: 36 | |||
</pe> | |||
</x> | |||
</samplers> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== AI-REML-C ==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: airemlc | |||
<airemlc> | |||
</airemlc> | |||
</jobs> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
==== AI-REML-C using single-pass Gibbs sampling to obtain starting values==== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: sample,airemlc | |||
<sample> | |||
sampler: x | |||
</sample> | |||
<airemlc> | |||
</airemlc> | |||
</jobs> | |||
<samplers> | |||
samplers: x | |||
<x> | |||
<singlepass> | |||
samples: 200 | |||
</singlepass> | |||
</x> | |||
</samplers> | |||
... | |||
</root> | </root> | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Calculating exact prediction error co-variances using a direct solver=== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: pevsolve | |||
<pevsolve> | |||
solver: a | |||
factor: g <!-- this assumes that a variance named "g" exists which was used in the equations --> | |||
</pevsolve> | |||
</jobs> | |||
<solvers> | |||
solvers: a | |||
<a> | |||
<direct> | |||
</direct> | |||
</a> | |||
</solvers> | |||
... | |||
</root> | |||
</syntaxhighlight> | |||
=== Calculating prediction error co-variances for a target individual=== | |||
<syntaxhighlight lang="xml" line highlight=""> | |||
<root> | |||
... | |||
<jobs> | |||
jobs: pevsolve | |||
<pevsolve> | |||
solver: a | |||
factor: g <!-- this assumes that a variance named "g" exists which was used in the equations --> | |||
levels: 1156679414 <!-- this must be the original factor level, e.g. the original pedigree id --> | |||
</pevsolve> | |||
</jobs> | |||
<solvers> | |||
solvers: a | |||
<a> | |||
<!-- since there is nothing inhere "a" will be of default type: preconditioned gradient method --> | |||
</a> | |||
</solvers> | |||
... | |||
</root> | |||
</syntaxhighlight> |
Latest revision as of 05:04, 7 June 2022
The examples provided in this section are meant to provide a practical examples about the lmt facilities and the parameter file syntax. It is assumed that the reader is familiar with section
Solving linear mixed model equations
Estimating a mean in a uni-variate model
Estimating a mean is equivalent to obtaining the generalized least square solution $$b=(X'R^{-1}X)^{-1}X'R^{-1}y$$ for model $$y=Xb+e$$, where $$y$$ is a vector of $$n$$ observations, $$X$$ is as single column matrix of $$1$$, $$b$$ is a fixed factor (mean), $$e$$ is the residual and $$y\sim N(Xb,R)$$ where $$R$$ is a $$n \times n$$ co-variance matrix.
From the above it follows that for task of solving for $$b$$ lmt needs following information:
the data the residual variance $$R$$ the model the solver
Assume we have a data file "data.csv" with content:
#y,mu 25.0,1 33.1,1 36.0,1 28.3,1
where the columns are comma-separated, the first row is commented out with “#” but contains the header, and all other rows contain data records. A valid lmt xml parameter file would look like:
<root>
<models>
<eqn attributes="strings">
y=mu*b
</eqn>
</models>
<data>
datafile: data.csv
missingthreshold: -50.0
</data>
<vars>
<res>
<sigma>
<matrix attributes="matrix">
5.0
</matrix>
</sigma>
</res>
</vars>
</root>
Following the introduced parameterfile terminology tags , <vars> and <model> are automatic-compulsory. Since solve is the default job and we are using the default solver in default parameterization no further information about the job or solver is required.
The most important aspect is the model definition in tag <eqn> , nested inside tag <model> $$y=mu*b$$. The variable names used here are either defined by the data file header, or by the user. That is, $$y$$ and $$mu$$ are defined in the data file header, whereas $$b$$ is a user-defined factor name. Translated this means that the content of the data column named $$y$$ should be regressed on the content of the data column named $$mu$$ with the regression coefficient named $$b$$.
Since there are no further specifications supplied about $$y$$, $$mu$$ and $$b$$, it is assumed that $$y$$ is a continuous variable, $$mu$$ is a classification variable, and $$b$$ is fixed factor. The necessary variances are defined by the content of the automatic-compulsory tag <vars> . lmt requires one compulsory variance, the residual variance, which must be specified via tag <res> . Therefore tag res is automatic-compulsory.
The default lmt variance structure is $$\Gamma\otimes\Sigma$$, where $$\Gamma$$ and $$\Sigma$$ are specified inside tags <gamma> and <sigma> , respectively. However, only tag <sigma> is automatic-compulsory, whereas tag <gamma> is automatic-optional. A missing <gamma> tag implies that $$\Gamma = I$$. Note that for lmt $$\Sigma$$ is always a matrix, that is a scalar $$\sigma^2$$ is treated as a matrix $$1 \times 1$$ matrix.
For the above example, the variance specification inside <res> implies that $$\Gamma\otimes \Sigma \equiv I\otimes \Sigma$$. Since $$\Sigma$$ is a $$1\times 1$$ matrix with $$\Sigma[1,1]=\sigma_e^2$$, $$R$$ reduces to $$I\sigma_e^2$$.
Note tag <matrix> nested in tag <sigma> . The content of tag <matrix> does not comply with the formatting rules as pointed o ut above. That is 5.0 is not a valid key string. To let lmt know that the content of tag <matrix> should not be evaluated as a key string, with a subsequent error message, the tag must have attributes. In this example matrix attributes="matrix" escapes the content of tag <matrix> from the formatting rules.
Further, tag <matrix> is automatic-optional. This might be confusing because, as pointed out above, $$\Sigma$$ forms an indispensable part of $$\Gamma\otimes \Sigma$$. However, tag <matrix> belongs to a group of mutually exclusive information sources of which members are tag <matrix> and key string file: yourfilename . That is, $$\Sigma$$ maybe either embedded in the parameter file or supplied via an external file.
Note that the spelling of most tags used in the above parameter file is determined by lmt and must be abide by.
Estimating a fixed mean and a random genetic effect in a uni-variate model
Consider the linear model $$y=Xb+Zu+e$$ where all variables are those declared in #Estimating a mean, $$u$$ is vector of length $$m$$ of random direct genetic effects and $$Z$$ is a design matrix of dimension $$n \times m$$ linking genetic effects to their respective observations. Note that $$u\sim N(0,A\sigma_a^2)$$ where $$A$$ is the pedigree-derived relationship matrix and forms the $$\Gamma$$ part in $$\Gamma\otimes\Sigma$$. A possible data file for such mode may look like:
#y,mu,id 25.0,1,5 33.1,1,6 36.0,1,7 28.3,1,8
where the columns are comma-separated, the first row is commented out with “#” but contains the header, and all other rows contain data records. Further assume a pedigree in a file called "ped.csv" with content:
1,0,0 2,0,0 3,1,0 4,0,2 5,3,4 6,0,4 7,5,4 8,5,7
and a valid lmt parameter file:
<root>
<data>
file: data.csv
</data>
<pedigrees>
pedigrees: my_ped
<my_ped>
file: ped.csv
</my_ped>
</pedigrees>
<vars>
<res>
<sigma>
<matrix attributes="matrix">
5.0
</matrix>
</sigma>
</res>
vars: my_var
<my_var>
<sigma>
file: sigma.csv
</sigma>
<gamma>
<A>
pedigree: my_ped
</A>
</gamma>
</my_var>
</vars>
<model>
<eqn attributes="strings">
y = mu*b + id*u(v(my_var(1)))
</eqn>
</model>
</root>
Compared with the parameter file in example #Estimating a mean the one above contains only a few extra elements. One this the automatic-optional <pedigrees> nested inside tag <root> . This tag contains a keystring pedigrees: myped , where the user-defined variable behind pedigrees: is the name of a nominated-compulsory tag nested inside tag {cc|<pedigrees>}}. This concept allows to supply several pedigrees to lmt (e.g. a normal pedigree and a genetic group pedigree). In our example we have only one pedigree named my_ped, with tag <my_ped> containing the information about this pedigree. Another additional element is the key string vars: my_var nested in tag <vars> where the variable of key string vars: my_var provides the tag names of nominated-compulsory tags, in this example tag <my_var> .
Tag <myvar> consist of two structural components: the automatic-compulsory tag <sigma> and the automatic-optional <gamma> . Since the the variance of $$u=A\sigma_a^2$$, where $$A=\Gamma$$ and $$\sigma_a^2=\Sigma$$, a <gamma> tag must be supplied to fully specify the variance. Note that if the <gamma> tag is missing or miss-spelled lmt will assume that the variance of $$u=I\sigma_a^2$$. Tag <gamma> contains a automatic-compulsory tag <A> which specifies the $$\Gamma=A$$. Since $$A$$ is build from a pedigree, tag <A> contains a compulsory key string pedigree: my_ped which nominates pedigree in tag <my_ped> to be used for building $$A$$.
Note the difference between the tags <sigma> nested in tag <res> and <my_var> . The former specifies <sigma> to be provided by tag <matrix attributes="matrix"> , whereas the latter specifies <sigma> to be provided by a file.
The model section in the above parameter file need to communicate to to lmt that $$u$$ is a random factor with a variance $$A\sigma_a^2$$. This is done by extending the u.d. factor name u in y = mu*b + id*u(v(my_var(1))) by a specifier (v(my_var(1))) . Note that without a specifier u would be regarded as a fixed factor. The specifier u(v) communicates that u has a variance assigned. Further, v has a specifier assigned via v(my_var) which communicates that the name of the variance is my_var . The variance in tag <my_var> contains a <gamma> and a <sigma> component. The integer number inside bracket my_var(1) communicates that $$\sigma_a^2$$ of u is located in the first diagonal element of $$\Sigma$$.
In summary construct u(v(my_var(1))) communicates that
- u has a variance assigned
- the variance is named my_var
- the variance is located in the first diagonal element of the $$\Sigma$$ matrix specified in tag <sigma> nested in tag <my_var>>
Estimating fixed means and a random genetic effects in a multi-variate model
Consider the linear model
$$ \left( \begin{array}{c} y_1 \\ y_2 \end{array} \right)= \left( \begin{array}{cc} X_1 & 0 \\ 0 & X_2 \\ \end{array} \right) \left( \begin{array}{c} b_1 \\ b_2 \end{array} \right) + \left( \begin{array}{cc} Z & 0 \\ 0 & Z \end{array} \right) \left( \begin{array}{c} u_1 \\ u_2 \end{array} \right) + \left( \begin{array}{cc} I & 0 \\ 0 & I \end{array} \right) \left( \begin{array}{c} e_1 \\ e_2 \end{array} \right) $$
where all variables are those declared in above, and subscripts $$1$$ and $$2$$ index trait $$1$$ and $$2$$, respectively.
Note that $$[u_1,u_2]\sim N([0,0],A\otimes \Sigma_a)$$ where $$A$$ is the pedigree-derived relationship matrix and $$ \Sigma_a= \left(\begin{array}{cc} \sigma_{a_1}^2 & \sigma_{a_1,a_2}\\ \sigma_{a_2,a_1} & \sigma_{a_2}^2 \end{array}\right) $$ Further, $$[e_1,e_2]\sim N([0,0],I\otimes \Sigma_e)$$ with $$ \Sigma_e= \left(\begin{array}{cc} \sigma_{e_1}^2 & \sigma_{e_1,e_2}\\ \sigma_{e_2,e_1} & \sigma_{e_2}^2 \end{array}\right) $$.
A possible data file for such mode may look like:
#y1,y2,mu,id 25.0,0.8,1,5 33.1,0.5,1,6 36.0,1.5,1,7 28.3,3.6,1,8
and the pedigree files is that provided in example #Estimating a mean and a random genetic effect in a uni-variate model.
and a valid lmt parameter file:
<root>
<data>
file: data.csv
</data>
<pedigrees>
pedigrees: my_ped
<my_ped>
file: ped.csv
</my_ped>
</pedigrees>
<vars>
<res>
<sigma>
<matrix attributes="matrix">
5.0,0.8
0.8,1.2
</matrix>
</sigma>
</res>
vars: my_var
<my_var>
<sigma>
file: sigma.csv
</sigma>
<gamma>
<A>
pedigree: my_ped
</A>
</gamma>
</my_var>
</vars>
<model>
<eqn attributes="strings">
y1 = mu*b1 + id*u1(v(my_var(1)))
y2 = mu*b2 + id*u2(v(my_var(2)))
</eqn>
</model>
</root>
Example code chunks
The following code chunks are only subset of a full parameter file. It is assumed that all other parts of the instruction file are functional and all necessary input data are available and the that the data file columns have the respective names.
Providing pedigrees
Providing a pedigree containing genetic groups
<root>
...
<pedigrees>
pedigrees: a
<a>
phantomparents: 2
...
</a>
</pedigrees>
...
</root>
Providing a pedigree containing metafounders
<root>
...
<pedigrees>
pedigrees: a
<a>
metafile: mymeta.csv
...
</a>
</pedigrees>
...
</root>
Providing a probabilistic pedigree
<root>
...
<pedigrees>
pedigrees: a
<a>
switch: probabilistic
...
</a>
</pedigrees>
...
</root>
Providing several pedigrees
<root>
...
<pedigrees>
pedigrees: a,b
<a>
...
</a>
<b>
...
</b>
</pedigrees>
...
</root>
Providing Genotypes
Providing external allele frequencies
<root>
...
<genotypes>
genotypes: a
<a>
...
pqfile: mypq.csv <!-- file must contain a column vector of 2p -->
</a>
</genotypes>
...
</root>
Providing several genotype files
<root>
...
<genotypes>
genotypes: a,b
<a>
...
</a>
<b>
...
</b>
</genotypes>
...
</root>
Providing GRMs
Constructing GRM from genotypes
<root>
...
<genotypes>
genotypes: a
<a>
...
</a>
</genotypes>
<grms>
grms: x
<x>
genotype: a
...
</x>
</grms>
...
</root>
Overriding the default GRM construction method
<root>
...
<genotypes>
genotypes: a
<a>
...
</a>
</genotypes>
<grms>
grms: x
<x>
genotype: a
method: YA <!-- method is now "Yang"("VanRaden2") -->
...
</x>
</grms>
...
</root>
Constructing a GRM from file
<root>
...
<grms>
grms: x
<x>
file: mygrm.csv
...
</x>
</grms>
...
</root>
Constructing several GRMs
<root>
...
<genotypes>
genotypes: a,b
<a>
...
</a>
<b>
...
</b>
</genotypes>
<grms>
grms: x,y
<x>
genotype: a
...
</x>
<y>
genotype: b
...
</y>
</grms>
...
</root>
Single step models
ssGBLUP model with GRM build from genotypes
<root>
<pedigrees>
pedigrees: a
<a>
...
</a>
<pedigrees>
<genotypes>
genotypes: a
<a>
...
pedigree: a
</a>
</genotypes>
<grms>
grms: x
<x>
genotype: a
...
</x>
</grms>
<vars>
...
vars: g
<g>
...
<gamma>
<H>
...
grm: x
pedigree: a
aweight: 0.05
</H>
</gamma>
</g>
</vars>
...
<models>
<eqn attributes="strings">
y1=mu*b1+id*u1(v(g(1))
y2=mu*b2+id*u1(v(g(2))
</eqn>
</models>
...
</root>
ssGBLUP model with GRM supplied externally
<root>
<pedigrees>
pedigrees: a
<a>
...
</a>
<pedigrees>
<grms>
grms: x
<x>
file: mygrm.bin
pedigree: a
cross: id.csv
</x>
</grms>
<vars>
...
vars: g
<g>
...
<gamma>
<H>
...
grm: x
pedigree: a
aweight: 0.05
</H>
</gamma>
</g>
</vars>
...
<models>
<eqn attributes="strings">
y1=mu*b1+id*u1(v(g(1))
y2=mu*b2+id*u1(v(g(2))
</eqn>
</models>
...
</root>
ssGTBLUP model
<root>
<pedigrees>
pedigrees: a
<a>
file: pedigree.csv
...
</a>
<pedigrees>
<genotypes>
genotypes: a
<a>
file: mygeno.txt
pedigree: a
cross: ids.csv
...
</a>
</genotypes>
<vars>
...
vars: g
<g>
...
<gamma>
<H>
...
type: tblup
genotype: a
pedigree: a
aweight: 0.05
</H>
</gamma>
</g>
</vars>
...
<models>
<eqn attributes="strings">
y1=mu*b1+id*u1(v(g(1))
y2=mu*b2+id*u1(v(g(2))
</eqn>
</models>
...
</root>
ssSNPBLUP model
<root>
<pedigrees>
pedigrees: a
<a>
...
</a>
<pedigrees>
<genotypes>
genotypes: a
<a>
...
pedigree: a
</a>
<vars>
...
vars: g
<g>
type: snpblup1
genotype: a
aweight: 0.05
switch: adjustg2a
<sigma>
file: covG.csv
</sigma>
<marker_sb1>
<sigma>
file: covG.csv
</sigma>
</marker_sb1>
</g>
</vars>
...
<models>
<eqn>
y1=mu*b1+individual*u1(v(g(1))+dam*m1(v(g(2))
y2=mu*b2+individual*u2(v(g(3))+dam*m2(v(g(4))
</eqn>
</models>
</root>
ssSNPBLUP model with meta-founders
<root>
<pedigrees>
pedigrees: a
<a>
...
metafile: mymeta.csv <!-- contains an nxn meta-founder co-variance matrix -->
</a>
<pedigrees>
<genotypes>
genotypes: a
<a>
...
pedigree: a
pqfile: myp.csv <!-- contains a column vector of 1 which implies p=0.5-->
</a>
<vars>
...
vars: g
<g>
type: snpblup1
genotype: a
aweight: 0.05
<sigma>
file: covG.csv
</sigma>
<marker_sb1>
<sigma>
file: covG.csv
</sigma>
</marker_sb1>
</g>
</vars>
...
<models>
<eqn>
y1=mu*b1+individual*u1(v(g(1))+dam*m1(v(g(2))
y2=mu*b2+individual*u2(v(g(3))+dam*m2(v(g(4))
</eqn>
</models>
</root>
ssSNPBLUP model with a separate polygenic factor
<root>
<pedigrees>
pedigrees: a
<a>
...
</a>
<pedigrees>
<genotypes>
genotypes: a
<a>
...
pedigree: a
</a>
<vars>
...
vars: a,g
<a>
<sigma>
file: cov_polygenic.csv <!-- assumes that the polygenic weight has been absorbed into sigma -->
</sigma>
<gamma>
<A>
pedigree: a
</A>
</gamma>
</a>
<g>
type: snpblup1
genotype: a
aweight: 0.001 <!-- small "dummy" value required for the variance formulation -->
switch: adjustg2a
<sigma>
file: cov_genomic.csv <!-- assumes that the genomic weight has been absorbed into sigma -->
</sigma>
<marker_sb1>
<sigma>
file: cov_genomic.csv
</sigma>
</marker_sb1>
</g>
</vars>
...
<models>
<eqn>
y1=mu*b1+individual*ug1(v(g(1))+dam*mg1(v(g(2))+individual*ua1(v(a(1))+dam*ma1(v(a(2))
y2=mu*b2+individual*ug2(v(g(3))+dam*mg2(v(g(4))+individual*ua2(v(a(3))+dam*ma2(v(a(4))
</eqn>
</models>
</root>
ssGBLUP with two genomic factors
<root>
<pedigrees>
pedigrees: a
<a>
...
</a>
<pedigrees>
<genotypes>
genotypes: a,b
<a>
...
pedigree: a
</a>
<b>
...
pedigree: a
</b>
</genotypes>
<grms>
grms: x,y
<x>
genotype: a
...
</x>
<y>
genotype: b
...
</y>
</grms>
<vars>
...
vars: g1,g2
<g1>
...
<gamma>
<H>
...
grm: x
</H>
</gamma>
</g1>
<g2>
...
<gamma>
<H>
...
grm: y
</H>
</gamma>
</g2>
</vars>
...
<models>
<eqn attributes="strings">
y1=mu*b1+id*u11(v(g1(1))+id*u21(v(g2(1))
y2=mu*b2+id*u12(v(g1(2))+id*u22(v(g2(2))
</eqn>
</models>
</root>
Regression on continuous co-variables
Linear regression
<root>
...
<models>
<eqn attributes="strings">
y1=mu*b1+age(t(co))*age1
y2=mu*b2+age(t(co))*age2
</eqn>
</models>
...
</root>
User-defined polynomial expansion
<root>
...
<models>
<eqn attributes="strings">
y1=mu*b1+age(t(co(p(1,2))))*age1
y2=mu*b2+age(t(co(p(1,2))))*age2
</eqn>
<eqn attributes="strings">
x^1
log(sqrt(x))
</eqn>
</models>
...
</root>
Using hard-coded Legendre polynomials
<root>
...
<models>
<eqn attributes="strings">
y1=mu*b1+age(t(co(p(1,2,3))))*age1
y2=mu*b2+age(t(co(p(1,2))))*age2
</eqn>
<eqn attributes="strings">
l0
l1
l2
</eqn>
</models>
...
</root>
Nested co-variables
<root>
...
<models>
<eqn attributes="strings">
weaningweight=mu*b1+age(t(co(p(1,2);n(sex))))*age
intramuscularfatcontent=mu*b2+weight(t(co(p(1,2);n(sex))))*weight
</eqn>
<eqn attributes="strings">
x^1
x^2
</eqn>
</models>
...
</root>
Random-regression models
Nested continuous random co-variables
days is a co-variable which is nested within individual or dam .
<root>
...
<models>
<eqn attributes="strings">
y1=mu*b1+days(t(co(n(individual))))*u1(v(g(1))+days(t(co(n(dam))))*m1(v(g(2))
y2=mu*b2+days(t(co(n(individual))))*u2(v(g(3))+days(t(co(n(dam))))*m2(v(g(4))
</eqn>
</models>
...
</root>
Nested continuous random co-variables with polynomial expansion
Similar to Nested continuous co-variables but days is expanded
<root>
...
<models>
<eqn attributes="strings">
y1=mu*b1+days(t(co(p(1,2,3);n(individual))))*u1(v(g(1,2,3))+days(t(co(p(1,2,3);n(dam))))*m1(v(g(4,5,6))
y2=mu*b2+days(t(co(p(1,2,3);n(individual))))*u2(v(g(7,8,9))+days(t(co(p(1,2,3);n(dam))))*m2(v(g(10,11,12))
</eqn>
<poly attributes="strings">
<! Legendre polynomials order 0, 1 and 2>
l0
l1
l2
</poly>
</models>
...
</root>
Nested continuous co-variables with polynomial expansion and an integer co-variable
Similar to Nested continuous co-variables with polynomial expansion, but an additional information t(i) is provided telling lmt that days is actually an integer. While the results do not differ from Nested continuous co-variables with polynomial expansion lmt can exploit this information for memory efficiency.
<root>
...
<models>
<eqn attributes="strings">
y1=mu*b1+days(t(co(t(i);p(1,2,3);n(individual))))*u1(v(g(1,2,3))+days(t(co(t(i);p(1,2,3);n(dam))))*m1(v(g(7,8,9))
y2=mu*b2+days(t(co(t(i);p(1,2,3);n(individual))))*u2(v(g(4,5,6))+days(t(co(t(i);p(1,2,3);n(dam))))*m2(v(g(10,11,12))
</eqn>
<poly attributes="strings">
<! Legendre polynomials of order 0, 1 and 2>
l0
l1
l2
</poly>
</models>
</root>
Defining equivalent models with genetic groups
Note that in the parameterization provided below absorbed genetic groups and genetic groups as extra factor must yield the same results. However, only when using absorbed genetic groups the factor level solutions are the actual breeding values. When modelling genetic groups as an extra factor genetic factor solutions and genetic group factor solutions must be added by the user.
Defining a model with absorbed genetic groups
Note that the only information necessary is the number of phantom parents at the top of the pedigree( phantomparents: 10 ) and the information to the variance that the it should be constructed allowing for genetic groups( switch gg ).
<root>
<pedigrees>
pedigrees: a
<a>
file: myped.csv
phantomparents: 10
</a>
<pedigrees>
<vars>
...
vars: g
<g>
<sigma>
file: myG.csv <! must contain a 4x4 matrix>
</sigma>
<gamma>
<A>
pedigree: a
switch: gg
</A>
</gamma>
</g>
</vars>
<models>
<eqn attributes="strings">
y1=mu*b1+id*id1(v(g(1))+dam*dam1(v(g(3))
y2=mu*b2+id*id2(v(g(2))+dam*dam2(v(g(4))
</eqn>
</models>
...
</root>
Defining a model with genetic groups as extra random factor
Genetic groups are defined as an extra factor, which requires an extra variance( gg ) and two pedigrees, the genetic group pedigree( a ) and the normal pedigree( b ). For a model equivalent to absorption pedigree b must be a subset of pedigree a . Further, if breeding values are required lmt can provide the genetic group regression matrix qfile: Q.coocsv .
<root>
<pedigrees>
pedigrees: a,b
<a>
file: ggped.csv
phantomparents: 10
qfile: Q.coocsv
</a>
<b>
file: ped.csv
</b>
<pedigrees>
<vars>
...
vars: g,gg
<g>
<sigma>
file: myG.csv <! must contain a 4x4 matrix>
</sigma>
<gamma>
<A>
pedigree: b
</A>
</gamma>
</g>
<gg>
<sigma>
file: myG.csv <! must contain a 4x4 matrix. should be the same as for "g">
</sigma>
</gg>
</vars>
...
<models>
<eqn attributes="strings">
y1=mu*b1+id*id1(v(g(1))+dam*dam1(v(g(3))+id(t(gg(a)))*idgg1(v(gg(1))+dam(t(gg(a)))*damgg1(v(gg(3))
y2=mu*b2+id*id2(v(g(2))+dam*dam2(v(g(4))+id(t(gg(a)))*idgg2(v(gg(2))+dam(t(gg(a)))*damgg2(v(gg(4))
</eqn>
</models>
</root>
Defining a model with fixed genetic groups
Fixed genetic groups are only supported if modeled as an extra factor. Therefore, the model is similar to above, but the extra variance is omitted. Note that when modeling genetic groups as fixed it is the users responsibility to omit one group from the respective pedigree to ensure that $$X$$ is of full column rank. [[#Linear models in lmt:Column rank of $$X$$|bla]]
<root>
<pedigrees>
pedigrees: a,b
<a>
file: ggped.csv
phantomparents: 10
qfile: Q.coocsv
</a>
<b>
file: ped.csv
</b>
<pedigrees>
<vars>
...
vars: g
<g>
<sigma>
file: myG.csv <! must contain a 4x4 matrix>
</sigma>
<gamma>
<A>
pedigree: b
</A>
</gamma>
</g>
</vars>
...
<models>
<eqn attributes="strings">
y1=mu*b1+id*id1(v(g(1))+dam*dam1(v(g(3))+id(t(gg(a)))*idgg1+dam(t(gg(a)))*damgg1
y2=mu*b2+id*id2(v(g(2))+dam*dam2(v(g(4))+id(t(gg(a)))*idgg2+dam(t(gg(a)))*damgg2
</eqn>
</models>
</root>
Override the default job parameters
<root>
...
<jobs>
jobs: default
<default>
conv: -9.21 <! log(10e-5)>
</default>
</jobs>
...
</root>
Use job "solve" instead of "default"
<root>
...
<jobs>
jobs: solve
<solve>
solver: x
</solve>
</jobs>
<solvers>
solvers: x
<x>
<!-- since is nothing inhere "x" will be of default type: preconditioned gradient solver -->
</x>
</solvers>
...
</root>
Use a direct solver in stead of the default solver
<root>
...
<jobs>
jobs: solve
<solve>
solver: x
</solve>
</jobs>
<solvers>
solvers: x
<x>
<direct>
</direct>
</x>
</solvers>
...
</root>
Estimating variance components
Gibbs sampling
<root>
...
<jobs>
jobs: sample
<sample>
sampler: x
</sample>
</jobs>
<samplers>
samplers: x
<x>
<blocked>
samples: 100000
</blocked>
</x>
</samplers>
...
</root>
MC-EM-REML
<root>
...
<jobs>
jobs: mcemreml
<mcemreml>
conv: -9.21034
rounds: 300
sampler: x
solver: y
</mcemreml>
</jobs>
<solvers>
solvers: y
<y>
<pcgiod>
conv: -16.1181
</pcgiod>
</y>
</solvers>
<samplers>
samplers: x
<x>
<pe>
samples: 50
switch: trace
chains: 36
</pe>
</x>
</samplers>
...
</root>
AI-REML-C
<root>
...
<jobs>
jobs: airemlc
<airemlc>
</airemlc>
</jobs>
...
</root>
AI-REML-C using single-pass Gibbs sampling to obtain starting values
<root>
...
<jobs>
jobs: sample,airemlc
<sample>
sampler: x
</sample>
<airemlc>
</airemlc>
</jobs>
<samplers>
samplers: x
<x>
<singlepass>
samples: 200
</singlepass>
</x>
</samplers>
...
</root>
Calculating exact prediction error co-variances using a direct solver
<root>
...
<jobs>
jobs: pevsolve
<pevsolve>
solver: a
factor: g <!-- this assumes that a variance named "g" exists which was used in the equations -->
</pevsolve>
</jobs>
<solvers>
solvers: a
<a>
<direct>
</direct>
</a>
</solvers>
...
</root>
Calculating prediction error co-variances for a target individual
<root>
...
<jobs>
jobs: pevsolve
<pevsolve>
solver: a
factor: g <!-- this assumes that a variance named "g" exists which was used in the equations -->
levels: 1156679414 <!-- this must be the original factor level, e.g. the original pedigree id -->
</pevsolve>
</jobs>
<solvers>
solvers: a
<a>
<!-- since there is nothing inhere "a" will be of default type: preconditioned gradient method -->
</a>
</solvers>
...
</root>