Difference between revisions of "The Linear Mixed Models Toolbox"

From Linear Mixed Models Toolbox
Jump to navigation Jump to search
Line 63: Line 63:
*an [https://en.wikipedia.org/wiki/Identity_matrix identity matrix]
*an [https://en.wikipedia.org/wiki/Identity_matrix identity matrix]
*an arbitrary positive definite [https://en.wikipedia.org/wiki/Diagonal_matrix diagonal matrix]
*an arbitrary positive definite [https://en.wikipedia.org/wiki/Diagonal_matrix diagonal matrix]
*a pedigree-based numerator relationship matrix $$A$$
*a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders
*a pedigree- and genotype-based relationship matrix $$H$$
*a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
*a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
*a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
**as a sparse upper-triangular matrix stored in [https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) csr format]
**as a sparse upper-triangular matrix stored in [https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) csr format]
Line 96: Line 96:
*genetic group pedigrees
*genetic group pedigrees
*meta-founders
*meta-founders
== File formats for data input ==
{{lmt}} automatically detects the format of input files by the [https://en.wikipedia.org/wiki/Filename_extension filename extension]. Supported extensions are
*".csv" for ordinary [https://en.wikipedia.org/wiki/Comma-separated_values comma separated values] ascii text files
*".blkcsv" for comma separated value ascii text files in block format
*".bin" for binary files in block format
".csv" files may contain commented lines at the top only where the comment character is "#".
The type of the file content is determined by its prospective use, that is
*the data file is supposed to contain only real/float numbers which are transferred to integer if required,
*a file containing an ordinary pedigree is supposed to contain only integer numbers,
*a file containing a missing value indicator matrix is supposed to contain only character strings.
{{lmt}} accepts only single file containing the actual data where that file '''must''' have a commented line containing the column header. Co-variance matrices must be supplied as full squared symmetric matrices.


== Disclaimer ==
== Disclaimer ==

Revision as of 10:52, 28 December 2020

Introduction

The Linear mixed Models Toolbox (lmt) is a stand-alone single executable software for for large scale linear mixed model analysis. It is the successor of DMU, the well-known and widely used software package for linear mixed model analysis developed and maintained by Per Madsen and Just Jensen.

Since the early days of software development in statistics and quantitative genetics time has moved on in terms of what programming languages are capable of and therefore DMU has been given a thorough overhaul.

One result of the overhaul is the new name, lmt, resulting from the difficulty to translate the acronym DMU into something which is generally meaningful throughout time. For those who prefer the acronym DMU, they may refer to lmt as DMU-next.

The second area of the overhaul is the parameter file interface. lmt now comes with an xml style parameter file which is supposed to allow for a much easier understanding by the user. Further using xml comes with support for automated commenting, un- commenting, indentation, code-folding and syntax highlighting by almost every editor, thus easing to follow the structure of the parameter file even if it spans several tens of lines of code.

The third area of the overhaul is the program structure. DMU was structured into several programs (DMU1, DMU4, DMU5, DMUAI, RJMC). In contrast, lmt is meant to provide the functionalities all those programs via a single parameter file and a single executable.

While lmt is finally meant to be a full scale successor of DMU, it does not yet provide all its functionalities in some areas, in others it already provides more. More specifi- cally, there no REML facilities available yet, but large scale linear mixed model solving provides Single-Step-T-BLUP facilities, uploading of genotypes and building of genomic relationship matrices on the fly etc etc.

Supported features

Supported operations

Currently lmt support the following operations on linear mixed models:

  • Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
  • Gibbs sampling of variance components in single pass and blocked mode;
  • MC-EM-REML estimation of variance components
  • Sampling elements of the inverse of the mixed model coefficient matrix

Supported factors and variables

lmt supports

  • fixed
  • random factors
  • classification variables
  • continuous co-variables, which can be nested. For continuous co-variables lmt support user-defined polynomials and hard coded Legendre polynomials up to order 6.
  • genetic group co-variables

All classification and co-variables can be associated to a fixed or random factor.

Supported variance structures

For random factor lmt supports variance structures of

  • structure $$\Gamma\otimes\Sigma$$, where $$\Sigma$$ is an dense symmetric positive definite matrix, and
  • $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$, where $$\Theta$$ is symmetric positive definite block-diagonal matrix of $$n$$ symmetric positive definite martices $$\Sigma_i, i=1,..,n$$, $$\Theta_L$$ is the lower Cholesky factor of $$\Theta$$ and $$I_{\Sigma}$$ is an identity matrix of dimension $$\Sigma_i$$.

$$\Gamma$$ is

  • an identity matrix
  • an arbitrary positive definite diagonal matrix
  • a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders
  • a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
  • a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
    • as a sparse upper-triangular matrix stored in csr format
    • as a dense matrix
  • a co-variance matrix of a selected auto-regressive process


Supported linear mixed model solvers

lmt supports

  • a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
  • an iteration-on-data pre-conditioned gradient solver which does not require $$C$$

Supported features related to genomic data

  • direct use of genomic marker data
  • building of genomic relationship matrices($$G$$) from supplied genomic data
  • uploading of a u.d. $$G$$
  • adjustment of $$G$$ to $$A_{gg}$$
  • solving Single-Step-G-BLUP models
  • sampling variances for Single-Step-G-BLUP models
  • solving Single-Step-T-BLUP models
  • solving Single-Step-SNP-BLUP models
  • all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.

Supported pedigree types

  • ordinary pedigrees
  • probabilistic pedigrees with an unlimited number of parent pairs per individual
  • genetic group pedigrees
  • meta-founders

Disclaimer

lmt is under ongoing development and many of its features have been tested only a few times on a limited number of models and data sets. Thus, the users uses lmt completely on his/her own risk. This also applies to any decisions made based on the results provided by lmt.

Conditions of use

lmt can be used by the scientific community free of charge, but users must credit lmt in any publications. Commercial users must obtain the explicit approval of the author before using lmt and must credit lmt in any publication in scientific journals.

Feedback and support

lmt comes without any guaranteed support and the user is strongly advised to study the manual thoroughly. However, the author appreciates feedback about the program functionality, possible aborts (segmentation faults), usability of output and comprehensiveness of the manual.