|   |   | 
| (66 intermediate revisions by 2 users not shown) | 
| Line 2: | Line 2: | 
|  | 
 |  | 
 | 
|  | The <b>L</b>inear mixed <b>M</b>odels <b>T</b>oolbox ({{lmt}}) is a stand-alone single executable software for for large scale linear mixed model analysis. |  | The <b>L</b>inear mixed <b>M</b>odels <b>T</b>oolbox ({{lmt}}) is a stand-alone single executable software for for large scale linear mixed model analysis. | 
|  | It is the successor of DMU, the well-known and
 |  | 
|  | widely used software package for linear mixed model analysis developed and maintained
 |  | 
|  | by Per Madsen and Just Jensen.
 |  | 
|  | 
 |  | 
 | 
|  | Since the early days of software development instatistics andquantitative genetics
 |  | {{lmt}} supports all models commonly used in genetic evaluation and has various options to handle genomic markers. | 
|  | time hasmoved on in terms of what programming languages are capable of and therefore
 |  | 
|  | DMU has been given a thorough overhaul.
 |  | 
|  | 
 |  | 
 | 
|  | One result of the overhaul is the new name, {{lmt}},resulting from the difficulty to translate
 |  | {{lmt}} has been used successfully for genetic evaluation data sets with >>200k genotyped animals, >>15m animals, >>500m equations. | 
|  | the acronym DMU into something which is generally meaningful throughout time. For
 |  | 
|  | those who prefer the acronym DMU,they may refer to {{lmt}} as <b>DMU-next</b>.
 |  | 
|  | 
 |  | 
 | 
|  | The second area of the overhaul is the parameter file interface. {{lmt}}now comes with
 |  | {{lmt}} is only available for 64 bit Linux operation systems, is run from the Linux command line, and uses an [https://www.w3schools.com/xml/ xml] style parameter file which is supposed to allow for an easy understanding by the user. Further using [https://www.w3schools.com/xml/ xml] comes with support for automated commenting, uncommenting, indentation, code-folding and syntax highlighting by almost every editor, | 
|  | an xml style parameter file which is supposed to allow for a much easier understanding |  | thus easing to follow the structure of the parameter file even if it spans several tens of lines of code. | 
|  | by the user. Further using xml comes with support for automated commenting, un- |  | 
|  | commenting, indentation, code-folding and syntax highlighting by almost every editor,
 |  | 
|  | thus easing to follow the structure of the parameter file even if it spans several tens of |  | 
|  | lines of code. |  | 
|  | 
 |  | 
 | 
|  | The third area ofthe overhaul is the program structure. DMU was structured into
 |  | == Conditions of use == | 
|  | several programs (<i>DMU1, DMU4, DMU5, DMUAI, RJMC</i>). In contrast, {{lmt}}is meant
 |  | {{lmt}} can be used by the scientific community free of charge, but users must credit {{lmt}} | 
|  | to provide thefunctionalities all those programs via a single parameter file and a single
 |  | in any publications. | 
|  | executable.
 |  | Commercial users must obtain the explicit approval of the author before using {{lmt}} and must credit {{lmt}} in any publication in scientific journals. | 
|  |   |  | If {{lmt}} cannot be credited via citation the author must become a co-author. | 
|  | While {{lmt}}is finally meant to be a full scale successor of DMU, it does not yet provide
 |  | 
|  | all its functionalities insome areas, in others it already provides more.More specifi-
 |  | 
|  | cally, there no REML facilities available yet, but large scale linear mixed model solving
 |  | 
|  | provides Single-Step-T-BLUP facilities, uploading ofgenotypes andbuilding of genomic
 |  | 
|  | relationship matrices on the fly etc etc.
 |  | 
|  |   |  | 
|  | ==Supported features ==
 |  | 
|  |   |  | 
|  | === Supported operations ===
 |  | 
|  |   |  | 
|  | Currently {{lmt}}support thefollowing operations on linear mixed models:
 |  | 
|  | 
 |  | 
 | 
|  | *Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
 |  | == How to get it == | 
|  | *Gibbs sampling of variance components in single pass and blocked mode;
 |  | 
|  | *MC-EM-REML estimation of variance components
 |  | 
|  | *Sampling elements of the inverse of the mixed model coefficient matrix
 |  | 
|  | 
 |  | 
 | 
|  | === Supported factors and variables ===
 |  | {{lmt}} can be obtained '''on request''' from the [mailto:vinzent.boerner@qgg.au.dk author]. | 
|  | {{lmt}} supports |  | 
|  | *fixed
 |  | 
|  | *random factors
 |  | 
|  | *classification variables
 |  | 
|  | *continuous co-variables, which can benested. For continuous co-variables {{lmt}} support user-defined polynomials and hard coded [https://en.wikipedia.org/wiki/Legendre_polynomials Legendre polynomials] up to order 6.
 |  | 
|  | *genetic group co-variables
 |  | 
|  |   |  | 
|  | All classification and co-variables can be associated to a fixed or random factor.
 |  | 
|  |   |  | 
|  | === Supported variance structures ===
 |  | 
|  | For random factor {{lmt}} supports variance structures of
 |  | 
|  | *structure [https://en.wikipedia.org/wiki/Kronecker_product $$\Gamma\otimes\Sigma$$], where $$\Sigma$$ is an dense symmetric positive definite matrix, and
 |  | 
|  | *$$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$, where $$\Theta$$ is symmetric positive definite [https://en.wikipedia.org/wiki/Block_matrix#Block_diagonal_matrices block-diagonal matrix] of $$n$$ symmetric positive definite martices $$\Sigma_i, i=1,..,n$$, $$\Theta_L$$ is the lower [https://en.wikipedia.org/wiki/Cholesky_decomposition Cholesky factor] of $$\Theta$$ and $$I_{\Sigma}$$ is an identity matrix of dimension $$\Sigma_i$$.
 |  | 
|  |   |  | 
|  | When solving linear mixed models $$\Sigma$$ and $$\Gamma$$ are user determined constants, whereas when estimating variances $$\Gamma$$ is a user determined constant and $$\Sigma$$ is a function of the data.
 |  | 
|  |   |  | 
|  | Supported type for $$\Gamma$$ are
 |  | 
|  | *an [https://en.wikipedia.org/wiki/Identity_matrix identity matrix]
 |  | 
|  | *an arbitrary positive definite [https://en.wikipedia.org/wiki/Diagonal_matrix diagonal matrix]
 |  | 
|  | *a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders
 |  | 
|  | *a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
 |  | 
|  | *a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
 |  | 
|  | **as a sparse upper-triangular matrix stored in [https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) csr format]
 |  | 
|  | **as a dense matrix
 |  | 
|  | *a co-variance matrix of a selected auto-regressive process
 |  | 
|  |   |  | 
|  | === Supported linear mixed model solvers ===
 |  | 
|  | {{lmt}} supports
 |  | 
|  |   |  | 
|  | *a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
 |  | 
|  | *an iteration-on-data pre-conditioned gradient solver which '''does not''' require $$C$$
 |  | 
|  |   |  | 
|  | === Supported features related to genomic data ===
 |  | 
|  | *direct use of genomic marker data
 |  | 
|  | *building of genomic relationship matrices($$G$$) fromsupplied genomic data
 |  | 
|  | *uploading of a u.d. $$G$$
 |  | 
|  | *adjustment of $$G$$ to $$A_{gg}$$
 |  | 
|  | *solving Single-Step-G-BLUP models
 |  | 
|  | *sampling variances for Single-Step-G-BLUP models
 |  | 
|  | *solving Single-Step-T-BLUP models
 |  | 
|  | *solving Single-Step-SNP-BLUP models
 |  | 
|  | *all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.
 |  | 
|  |   |  | 
|  | === Supported pedigree types===
 |  | 
|  | *ordinary pedigrees
 |  | 
|  | *probabilistic pedigrees with an unlimited number of parent pairs per individual
 |  | 
|  | *genetic group pedigrees
 |  | 
|  | *meta-founders
 |  | 
|  |   |  | 
|  | == {{lmt}} Linear mixed model terminology ==
 |  | 
|  | === Matrix notation, factors and sub-factors ===
 |  | 
|  |   |  | 
|  | Consider the multi-variate linear mixed model
 |  | 
|  |   |  | 
|  | $$
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | y_1 \\
 |  | 
|  | y_2 \\
 |  | 
|  | y_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | =
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | X_1 & 0 & 0 \\
 |  | 
|  | 0 & X_2 & 0 \\
 |  | 
|  | 0 & 0 & X_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | b_1 \\
 |  | 
|  | b_2 \\
 |  | 
|  | b_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | +
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | Z_1 & 0 & 0\\
 |  | 
|  | 0 & Z_2 & 0\\
 |  | 
|  | 0 & 0 & Z_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | u_1 \\
 |  | 
|  | u_2 \\
 |  | 
|  | u_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | +
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | e_1 \\
 |  | 
|  | e_2 \\
 |  | 
|  | e_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | $$
 |  | 
|  |   |  | 
|  | where $$(y_1,y_2,y_3)'$$, $$(b_1,b_2,b_3)'$$, $$(u_1,u_2,u_3)'$$ and $$(e_1,e_2,e_3)'$$ are vectors of response variables, effects of fixed factors, effects of random factors and effects of residuals respectively, and matrices
 |  | 
|  | $$\left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | X_1 & 0 & 0 \\
 |  | 
|  | 0 & X_2 & 0 \\
 |  | 
|  | 0 & 0 & X_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)$$, and
 |  | 
|  | $$
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | Z_1 & 0 & 0\\
 |  | 
|  | 0 & Z_2 & 0\\
 |  | 
|  | 0 & 0 & Z_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | $$ are block-diagonal design matrices linking effects in the respective vectors to their related response variables. In usual mixed model terminology $$b_1$$, $$b_2$$ and $$b_3$$ are called fixed factors, and $$u_1$$, $$u_2$$ and $$u_3$$ are called random factors. Ignoring the residual the above model has in total 6 factors.
 |  | 
|  |   |  | 
|  | However, the model maybe rewritten in matrix formulation as
 |  | 
|  |   |  | 
|  | $$vec(Y)=Xvec(B)+Zvec(U)+vec(E)$$,
 |  | 
|  |   |  | 
|  | where $$vec$$ is the [https://en.wikipedia.org/wiki/Vectorization_(mathematics) vectorization operator], $$Y=[y_1,y_2,y_3]$$, $$B=[b_1,b_2,b_3]$$, $$U=[u_1,u_2,u_3]$$ and $$E=[e_1,e_2,e_3]$$ are column matrices of response variables, the effects of the fixed and random factor, and the residuals, respectively, and
 |  | 
|  | $$X=\left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | X_1 & 0 & 0 \\
 |  | 
|  | 0 & X_2 & 0 \\
 |  | 
|  | 0 & 0 & X_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)$$, and
 |  | 
|  | $$Z=
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | Z_1 & 0 & 0\\
 |  | 
|  | 0 & Z_2 & 0\\
 |  | 
|  | 0 & 0 & Z_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | $$.The distribution assumption for the random components in the model are $$vec(U^{'})\sim N((0,0,0)',\Gamma_u \otimes \Sigma_u)$$ and $$vec(E^{'})\sim N((0,0,0)',\Gamma_e \otimes \Sigma_e)$$. Note that the column and row dimensions of $$U$$ are determined by the column dimension of $$\Sigma_u$$ and $$\Gamma_u$$ respectively.
 |  | 
|  |   |  | 
|  | Slightly different to the above terminology, {{lmt}} refers to $$B$$ and $$U$$ as factors, and therefore the model has only two factors, whereas the columns in $$B$$ and $$U$$ are referred to as '''sub-factors'''. 
 |  | 
|  |   |  | 
|  | Following the above matrix notation {{lmt}} will always invoke only one factor for all modelled fixed classification variables and only one factor for all modelled fixed continuous co-variables. Sub-factors are summarized into a single random factors if they share the same $$\Sigma$$ matrix. Thus, {{lmt}} will invoke as many random factors as there are different $$\Gamma \otimes \Sigma$$ constructs. That is, in {{lmt}} terminology the multi-variate model
 |  | 
|  |   |  | 
|  | $$
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | y_1 \\
 |  | 
|  | y_2 \\
 |  | 
|  | y_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | =
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | X_1 & 0 & 0 \\
 |  | 
|  | 0 & X_2 & 0 \\
 |  | 
|  | 0 & 0 & X_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | b_1 \\
 |  | 
|  | b_2 \\
 |  | 
|  | b_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | +
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{cccccc}
 |  | 
|  | Z_{d,1} & 0 & 0 & Z_{m,1} & 0 & 0\\
 |  | 
|  | 0 & Z_{d,2} & 0 & 0 & Z_{m,2} & 0\\
 |  | 
|  | 0 & 0 & Z_{d,3} & 0 & 0 & Z_{m,3}\\
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | u_{d,1} \\
 |  | 
|  | u_{d,2} \\
 |  | 
|  | u_{d,3} \\
 |  | 
|  | u_{m,1} \\
 |  | 
|  | u_{m,2} \\
 |  | 
|  | u_{m,3}
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | +
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{ccc}
 |  | 
|  | W_1 & 0 & 0\\
 |  | 
|  | 0 & W_2 & 0\\
 |  | 
|  | 0 & 0 & W_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | v_1 \\
 |  | 
|  | v_2 \\
 |  | 
|  | v_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | +
 |  | 
|  | \left(
 |  | 
|  | \begin{array}{c}
 |  | 
|  | e_1 \\
 |  | 
|  | e_2 \\
 |  | 
|  | e_3
 |  | 
|  | \end{array}
 |  | 
|  | \right)
 |  | 
|  | $$
 |  | 
|  |   |  | 
|  | with
 |  | 
|  | $$(u_{d,1},u_{d,2},u_{d,3},u_{m,1},u_{m,2},u_{m,3})'\sim N((0,0,0,0,0,0)',\Sigma_u \otimes \Gamma_u)$$ and $$(v_1,v_2,v_3)'\sim N((0,0,0)',\Sigma_v \otimes \Gamma_v)$$, rewritten as $$vec(Y)=Xvec(B)+Zvec(U)+Wvec(V)+vec(E)$$ will have only 3 factors, $$B$$, $$U$$ and $$V$$ with $$b_1,b_2,b_3$$, $$u_{d,1},u_{d,2},u_{d,3},u_{m,1},u_{m,2},u_{m,3}$$ and $$v_1,v_2,v_3$$ being subfactors of $$U$$ and $$V$$ respectively.
 |  | 
|  |   |  | 
|  | === Model syntax ===
 |  | 
|  |   |  | 
|  | The syntax for communicating the model to {{lmt}} is effectively '''just write the model'''.
 |  | 
|  |   |  | 
|  | A valid {{lmt}} model string would {{cc|1=y=mu*b+id*u(v(my_var(1)))}}. The model string consist of
 |  | 
|  | *the response variable {{cc|y}}, which must be a column name in the data file
 |  | 
|  | *variables {{cc|mu}} and {{cc|id}}, which must be a column names in the data file
 |  | 
|  | *sub-factors {{cc|b}} and {{cc|u}} which are user-defined alpha-numeric character strings
 |  | 
|  | *relation operators {{cc|1==}}, {{cc|*}} and {{cc|+}}
 |  | 
|  | *a specifier {{cc|(v(my_var(1)))}} used to specify the nature of {{cc|u}}
 |  | 
|  |   |  | 
|  | The rules for using relational operators are
 |  | 
|  | *{{cc|1==}} links the response variable to the model
 |  | 
|  | *{{cc|*}} links a model variable to it's sub-factor, which together form a right hand side component
 |  | 
|  | *{{cc|+}} concatenates different right hand side components.
 |  | 
|  |   |  | 
|  | Variables and sub-factors maybe accompanied by a specifier. A specifier is a [https://en.wikipedia.org/wiki/Tree_structure#Nested_parentheses tree diagramm] in [https://en.wikipedia.org/wiki/Newick_format  Newick format] with all nodes named where the root node is the variable or sub-factor. It provides additional information about a variable or sub-factor. The {{lmt}} version of the above [https://en.wikipedia.org/wiki/Newick_format  tree diagram]differs in that
 |  | 
|  | *the parent nodes precede child nodes
 |  | 
|  | *child nodes within the same parent node are separated by semicolon
 |  | 
|  | *leaf nodes can contain a bracket space with additional, maybe comma-separated information
 |  | 
|  | '''Without any specifier {{lmt}} assumes that'''
 |  | 
|  | *'''variables are classification variables with the respective columns in the data file containing integer numbers coding for the different levels of the associated sub-factor'''
 |  | 
|  | *'''sub-factors are fixed'''
 |  | 
|  | ==== Sub-factor specifiers ====
 |  | 
|  | Sub-fatcor specifiers are used to communicate that a sub-factor is random. Following the above example {{cc|u(v(my_var(1)))}}, {{cc|u}} is the root node, {{cc|v}} is a child node to {{cc|u}} with a hard-coded name '''v''', {{cc|my_var}} is a child node to {{cc|v}} with a user-defined name '''my_var''' which references the user defined name of a $$\Gamma \otimes \Sigma$$ construct, and {{cc|1}} is an additional information to {{cc|my_var}} communicating that the diagonal element in $$\Sigma$$ related to {{cc|u}} is diagonal element #1.
 |  | 
|  | ==== Variable specifiers ====
 |  | 
|  | Variable specifiers are used to communicate further information which may be that the variable
 |  | 
|  | *is continuous but real numbers
 |  | 
|  | *is continuous but integer numbers
 |  | 
|  | *is a genetic group regression matrix
 |  | 
|  | *undergoes a polynomial expansion
 |  | 
|  | *is associated to a nesting variable
 |  | 
|  | etc.
 |  | 
|  |   |  | 
|  |   |  | 
|  | Variables as well as sub-factors maybe used across traits. That is a model
 |  | 
|  |  y1=mu*b1+id*u1(v(sigma(1))
 |  | 
|  |  y2=mu*b2+id*u2(v(sigma(2))
 |  | 
|  |   |  | 
|  | == Disclaimer ==
 |  | 
|  | {{lmt}} is under ongoing development and many of its features have been tested only a few
 |  | 
|  | times on a limited number of models and data sets. Thus, the users uses {{lmt}} completely
 |  | 
|  | on his/her own risk. This also applies to any decisions made based on the results provided
 |  | 
|  | by {{lmt}}.
 |  | 
|  |   |  | 
|  | == Conditions of use ==
 |  | 
|  | {{lmt}} can be used by the scientific community free of charge, but users must credit {{lmt}}
 |  | 
|  | in any publications. Commercial users must obtain the explicit approval of the author
 |  | 
|  | before using {{lmt}} and must credit {{lmt}} in any publication in scientific journals.
 |  | 
|  | 
 |  | 
 | 
|  | == Feedback and support == |  | == Feedback and support == | 
| Line 314: | Line 25: | 
|  | However, the author appreciates feedback about the program functionality, possible aborts (segmentation faults), usability of output and comprehensiveness of the manual. |  | However, the author appreciates feedback about the program functionality, possible aborts (segmentation faults), usability of output and comprehensiveness of the manual. | 
|  | 
 |  | 
 | 
|  |  | For feedback, wish list, questions and support contact [mailto:vinzent.boerner@qgg.au.dk vinzent.boerner@qgg.au.dk](infrequently checked) or [mailto:vinzent.boerner@gmx.de vinzent.boerner@gmx.de](frequently checked). | 
|  | 
 |  | 
 | 
|  |   |  | == [[supported features| Supported features]] == | 
|  | * [http://localhost/mediawiki/index.php/Run_It Run It]
 |  | == [[Algorithms|Algorithms]] == | 
|  | * [http://localhost/mediawiki/index.php/Inputfileformats Input file formats]
 |  | == [[Parameterfile1| Parameter file terminology]] == | 
|  | * [http://localhost/mediawiki/index.php/Parameterfile1 Parameter file terminologypart 1]
 |  | == [[linear mixed models in lmt| Linear mixed models in lmt]] == | 
|  | * [http://localhost/mediawiki/index.php/Jump_Start Jump Start]
 |  | == [[Genomic data in lmt| Genomic data lmt]] == | 
|  |  | == [[File formats|File formats]] == | 
|  |  | == [[Input files|Input files]] == | 
|  |  | == [[Output files|Output files]] == | 
|  |  | == [[Run_It|How to run it]] == | 
|  |  | == [[Examples|Examples]] == | 
|  |  | == [[Parameter file elements| Parameter file elements]] == |