Difference between revisions of "Supported features"

From Linear Mixed Models Toolbox
Jump to navigation Jump to search
Line 1: Line 1:
=== Raw input data requirements ===
=== Raw input data support ===


*Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemption is genomic data.
*Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemption is genomic data.
*64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
*64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
*All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
*All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
*Files containing human readable data must be in comma-separated-value(csv) format.
*Files containing human readable array data must be in comma-separated-value(csv) format.
*Files containing human readable vector data must contain a column vector.
*Pedigree files must be complete. That is, all individual occurring as parents must occur as individuals. All individual ids which occur in the data file must occur in the pedigree.
*Pedigree files must be complete. That is, all individual occurring as parents must occur as individuals. All individual ids which occur in the data file must occur in the pedigree.
*Genomic marker data must be imputed to common density across all genotypes and must contain no missing marker.
*Genomic marker data must be imputed to common density across all genotypes and must contain no missing marker.

Revision as of 06:00, 8 May 2022

Raw input data support

  • Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemption is genomic data.
  • 64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
  • All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
  • Files containing human readable array data must be in comma-separated-value(csv) format.
  • Files containing human readable vector data must contain a column vector.
  • Pedigree files must be complete. That is, all individual occurring as parents must occur as individuals. All individual ids which occur in the data file must occur in the pedigree.
  • Genomic marker data must be imputed to common density across all genotypes and must contain no missing marker.

Supported operations

Currently lmt support the following operations on linear mixed models:

  • Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
  • Gibbs sampling of variance components in single pass and blocked mode;
  • MC-EM-REML estimation of variance components
  • Sampling (block)diagonal elements of the inverse of the mixed model coefficient matrix
  • Solving for (block)diagonal elements of the inverse of the mixed model coefficient matrix

Supported factors and variables

lmt supports

  • fixed
  • random factors
  • classification variables
  • continuous co-variables, which can be nested. For continuous co-variables lmt support user-defined polynomials and hard coded Legendre polynomials up to order 6.
  • genetic group co-variables

All classification and co-variables can be associated to a fixed or random factor.

Supported variance structures

For random factor lmt supports variance structures of

  • structure $$\Gamma\otimes\Sigma$$, where $$\Sigma$$ is an dense symmetric positive definite matrix, and
  • $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$, where $$\Theta$$ is symmetric positive definite block-diagonal matrix of $$n$$ symmetric positive definite martices $$\Sigma_i, i=1,..,n$$, $$\Theta_L$$ is the lower Cholesky factor of $$\Theta$$ and $$I_{\Sigma}$$ is an identity matrix of dimension $$\Sigma_i$$.

When solving linear mixed models $$\Sigma$$ and $$\Gamma$$ are user determined constants, whereas when estimating variances $$\Gamma$$ is a user determined constant and $$\Sigma$$ is a function of the data.

Supported type for $$\Gamma$$ are

  • an identity matrix
  • an arbitrary positive definite diagonal matrix
  • a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders
  • a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
  • genetic groups absorbed into $$A$$ or $$H$$
  • a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
    • as a sparse upper-triangular matrix stored in csr format
    • as a dense matrix
  • a co-variance matrix of a selected auto-regressive process

Further lmt supports special variance structures which are not covered by the above description

  • SNP-BLUP variance for the model of Liu and Goddard 2014 with the option to model marker co-variances as above.

Supported linear mixed model solvers

lmt supports

  • a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
  • an iteration-on-data pre-conditioned gradient solver which does not require $$C$$

Supported features related to genomic data

  • direct use of genomic marker data
  • building of genomic relationship matrices($$G$$) from supplied genomic data
  • uploading of a u.d. $$G$$
  • adjustment of $$G$$ to $$A_{gg}$$ in ssGBLUP and ssSNPBLUP
  • solving ssGBLUP models
  • Variance component estimation for ssGBLUP models
  • solving ssGTBLUP models
  • solving ssSNPBLUP models
  • calculation of true H matrix diagonal elements for ssGBLUP models
  • all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.

Supported pedigree types

  • ordinary pedigrees
  • probabilistic pedigrees with an unlimited number of parent pairs per individual
  • genetic group pedigrees
  • meta-founder pedigrees
  • ignoring of inbreeding
  • iterative derivation of inbreeding coefficients

Supported features related to meta-founders and genetic groups

  • meta-founders can be modeled for all $$\Gamma$$ which contain $$A$$(.e.g. $$A$$, $$H$$ for ssGBLUP, ssGTBLUP and ssSNPBLUP)
  • genetic groups can be modeled as an extra factor or can be absorbed into all $$\Gamma$$ which contain $$A$$