Supported features

From Linear Mixed Models Toolbox
Jump to navigation Jump to search

Raw input data support

  • Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemptions are genomic data and files containing boolean data.
  • 64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
  • All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
  • Files containing human readable array data must be in comma-separated-value(csv) format.
  • Files containing human readable vector data must contain a single column vector.
  • Pedigree files must be complete. That is, all individuals occurring as parents (2. and 3. column) must occur as individuals (1. column). All individual ids which occur in the data file must occur in the pedigree.
  • Genomic marker data must be imputed to common density across all genotypes and must not contain missing marker.

Supported operations

Currently lmt support the following operations on linear mixed models:

  • Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
  • Gibbs sampling of variance components in single pass and blocked mode;
  • AI-REML estimation of variance components using the coefficient matrix of the mixed model equation system (AI-REML-C)
  • MC-EM-REML estimation of variance components
  • Sampling (block)diagonal elements of the inverse of the mixed model coefficient matrix
  • Solving for (block)diagonal elements of the inverse of the mixed model coefficient matrix

Supported factors and variables

lmt supports

  • fixed factors
  • random factors
  • classification variables
  • continuous co-variables, which can be nested. For continuous co-variables lmt support user-defined polynomials(e.g. sin(x) or x^(0.5) ) and hard coded Legendre polynomials up to order 6.
  • genetic group co-variables

All classification and co-variables can be associated to a fixed or random factor.

Supported variance structures

For random factor lmt supports variance structures of

  • structure $$\Gamma\otimes\Sigma$$, where $$\Sigma$$ is an dense symmetric positive definite matrix, and
  • $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$, where $$\Theta$$ is symmetric positive definite block-diagonal matrix of $$n$$ symmetric positive definite martices $$\Sigma_i, i=1,..,n$$, $$\Theta_L$$ is the lower Cholesky factor of $$\Theta$$ and $$I_{\Sigma}$$ is an identity matrix of dimension $$\Sigma_i$$.

When solving linear mixed models $$\Sigma$$ and $$\Gamma$$ are user determined constants, whereas when estimating variances $$\Gamma$$ is a user determined constant and $$\Sigma$$ is a function of the data.

Supported type for $$\Gamma$$ are

  • an identity matrix
  • an arbitrary positive definite diagonal matrix
  • a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders[1]
  • a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
  • genetic groups[2] absorbed into $$A$$ or $$H$$
  • a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
    • as a sparse upper-triangular matrix stored in csr format
    • as a dense matrix
  • a co-variance matrix of a selected auto-regressive process

Further lmt supports special variance structures which are not covered by the above description

  • SNP-BLUP co-variance structure[3] with the option to model marker co-variances as $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$.

Supported linear mixed model solvers

lmt supports

  • a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
  • a pre-conditioned gradient solver which does not require $$C$$

Supported features related to genomic data

  • direct use of genomic marker data
  • building of genomic relationship matrices($$G$$) from supplied genomic data
  • uploading of a u.d. $$G$$
  • adjustment of $$G$$ to $$A_{gg}$$ in ssGBLUP and ssSNPBLUP
  • solving ssGBLUP models[4]
  • Variance component estimation for ssGBLUP models
  • solving ssGTBLUP models[5]
  • solving ssSNPBLUP models[3]
  • calculation of true H matrix diagonal elements for ssGBLUP models
  • all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.

Supported pedigree types

  • ordinary pedigrees
  • probabilistic pedigrees with an unlimited number of parent pairs per individual
  • genetic group pedigrees[2]
  • meta-founder pedigrees[1]
  • ignoring of inbreeding
  • iterative derivation of inbreeding coefficients[6]

Supported features related to meta-founders and genetic groups

  • meta-founders can be modeled for all $$\Gamma$$ which contain $$A$$(.e.g. $$A$$, $$H$$ for ssGBLUP, ssGTBLUP and ssSNPBLUP)
  • genetic groups can be modeled as an extra factor or can be absorbed into all $$\Gamma$$ which contain $$A$$

References

  1. 1.0 1.1 Garcia et al.; Metafounders are related to F st fixation indices and reduce bias in single-step genomic evaluations; Genetics Selection Evolution; 2017
  2. 2.0 2.1 Westell et al.; Genetic Groups in an Animal Model; Journal of Dairy Science; 1988
  3. 3.0 3.1 Liu et al.;A single-step genomic model with direct estimation of marker effects; Journal of Dairy Science;2014
  4. Christensen et al.; Genomic prediction when some animals are not genotyped; Genetics Selection Evolution; 2010
  5. Mäntysaari et al.;Efficient single-step genomic evaluation for a multibreed beef cattle population having many genotyped animals; Journal of Animal Science;2017
  6. PM VanRanden; Accounting for Inbreeding and Crossbreeding in Genetic Evaluation of Large Populations; Journal of Dairy Science; 1992