Difference between revisions of "Supported features"

From Linear Mixed Models Toolbox
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
=== Raw input data support ===
=== Raw input data support ===


*Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemption is genomic data.
*Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemptions are genomic data and files containing boolean data.
*64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
*64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
*All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
*All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
*Files containing human readable array data must be in comma-separated-value(csv) format.
*Files containing human readable array data must be in comma-separated-value(csv) format.
*Files containing human readable vector data must contain a column vector.
*Files containing human readable vector data must contain a single column vector.
*Pedigree files must be complete. That is, all individual occurring as parents must occur as individuals. All individual ids which occur in the data file must occur in the pedigree.
*Pedigree files must be complete. That is, all individuals occurring as parents (2. and 3. column) must occur as individuals (1. column). All individual ids which occur in the data file must occur in the pedigree.
*Genomic marker data must be imputed to common density across all genotypes and must contain no missing marker.
*Genomic marker data must be imputed to common density across all genotypes and must not contain missing marker.


=== Supported operations ===
=== Supported operations ===
Line 15: Line 15:
*Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
*Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
*Gibbs sampling of variance components in single pass and blocked mode;
*Gibbs sampling of variance components in single pass and blocked mode;
*AI-REML estimation of variance components using the coefficient matrix of the mixed model equation system (AI-REML-C)
*MC-EM-REML estimation of variance components
*MC-EM-REML estimation of variance components
*Sampling (block)diagonal elements of the inverse of the mixed model coefficient matrix
*Sampling (block)diagonal elements of the inverse of the mixed model coefficient matrix
Line 21: Line 22:
=== Supported factors and variables ===
=== Supported factors and variables ===
{{lmt}} supports
{{lmt}} supports
*fixed
*fixed factors
*random factors
*random factors
*classification variables
*classification variables
*continuous co-variables, which can be nested. For continuous co-variables {{lmt}} support user-defined polynomials and hard coded [https://en.wikipedia.org/wiki/Legendre_polynomials Legendre polynomials] up to order 6.
*continuous co-variables, which can be nested. For continuous co-variables {{lmt}} support user-defined polynomials(e.g. {{cc|sin(x)}} or {{cc|x^(0.5)}}) and hard coded [https://en.wikipedia.org/wiki/Legendre_polynomials Legendre polynomials] up to order 6.
*genetic group co-variables
*genetic group co-variables


Line 39: Line 40:
*an [https://en.wikipedia.org/wiki/Identity_matrix identity matrix]
*an [https://en.wikipedia.org/wiki/Identity_matrix identity matrix]
*an arbitrary positive definite [https://en.wikipedia.org/wiki/Diagonal_matrix diagonal matrix]
*an arbitrary positive definite [https://en.wikipedia.org/wiki/Diagonal_matrix diagonal matrix]
*a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders
*a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders<ref name="Garcia2017"/>
*a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
*a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
*genetic groups absorbed into $$A$$ or $$H$$
*genetic groups<ref name="Westell1988"/> absorbed into $$A$$ or $$H$$
*a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
*a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
**as a sparse upper-triangular matrix stored in [https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) csr format]
**as a sparse upper-triangular matrix stored in [https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_row_(CSR,_CRS_or_Yale_format) csr format]
Line 48: Line 49:


Further {{lmt}} supports special variance structures which are not covered by the above description
Further {{lmt}} supports special variance structures which are not covered by the above description
*SNP-BLUP variance for the model of Liu and Goddard 2014 with the option to model marker co-variances as above.
*SNP-BLUP co-variance structure<ref name="Liu2014"/> with the option to model marker co-variances as $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$.


=== Supported linear mixed model solvers ===
=== Supported linear mixed model solvers ===
Line 54: Line 55:


*a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
*a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
*an iteration-on-data pre-conditioned gradient solver which '''does not''' require $$C$$
*a pre-conditioned gradient solver which '''does not''' require $$C$$


=== Supported features related to genomic data ===
=== Supported features related to genomic data ===
Line 61: Line 62:
*uploading of a u.d. $$G$$
*uploading of a u.d. $$G$$
*adjustment of $$G$$ to $$A_{gg}$$ in ssGBLUP and ssSNPBLUP
*adjustment of $$G$$ to $$A_{gg}$$ in ssGBLUP and ssSNPBLUP
*solving ssGBLUP models
*solving ssGBLUP models<ref name="Christensen2010" />
*Variance component estimation for ssGBLUP models
*Variance component estimation for ssGBLUP models
*solving ssGTBLUP models
*solving ssGTBLUP models<ref name="Mäntysaari2017" />
*solving ssSNPBLUP models
*solving ssSNPBLUP models<ref name="Liu2014" />
*calculation of true H matrix diagonal elements for ssGBLUP models
*calculation of true H matrix diagonal elements for ssGBLUP models
*all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.
*all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.
Line 71: Line 72:
*ordinary pedigrees
*ordinary pedigrees
*probabilistic pedigrees with an unlimited number of parent pairs per individual
*probabilistic pedigrees with an unlimited number of parent pairs per individual
*genetic group pedigrees
*genetic group pedigrees<ref name="Westell1988" />
*meta-founder pedigrees
*meta-founder pedigrees<ref name="Garcia2017" />
*ignoring of inbreeding
*ignoring of inbreeding
*iterative derivation of inbreeding coefficients
*iterative derivation of inbreeding coefficients<ref name="VanRaden1992" />


=== Supported features related to meta-founders and genetic groups ===
=== Supported features related to meta-founders and genetic groups ===
*meta-founders can be modeled for all $$\Gamma$$ which contain $$A$$(.e.g. $$A$$, $$H$$ for ssGBLUP, ssGTBLUP and ssSNPBLUP)
*meta-founders can be modeled for all $$\Gamma$$ which contain $$A$$(.e.g. $$A$$, $$H$$ for ssGBLUP, ssGTBLUP and ssSNPBLUP)
*genetic groups can be modeled as an extra factor or can be absorbed into all $$\Gamma$$ which contain $$A$$
*genetic groups can be modeled as an extra factor or can be absorbed into all $$\Gamma$$ which contain $$A$$
==References==
<references>
<ref name="VanRaden1992">PM VanRanden; Accounting for Inbreeding and Crossbreeding in Genetic Evaluation of Large Populations; Journal of Dairy Science; 1992</ref>
<ref name="Garcia2017">Garcia et al.; Metafounders are related to F st fixation indices and reduce bias in single-step genomic evaluations; Genetics Selection Evolution; 2017</ref>
<ref name="Westell1988">Westell et al.; Genetic Groups in an Animal Model; Journal of Dairy Science; 1988</ref>
<ref name="Christensen2010">Christensen et al.; Genomic prediction when some animals are not genotyped; Genetics Selection Evolution; 2010</ref>
<ref name="Liu2014">Liu et al.;A single-step genomic model with direct estimation of marker effects; Journal of Dairy Science;2014</ref>
<ref name="Mäntysaari2017">Mäntysaari et al.;Efficient single-step genomic evaluation for a multibreed beef cattle population having many genotyped animals; Journal of Animal Science;2017</ref>
</references>

Latest revision as of 00:44, 12 May 2022

Raw input data support

  • Only numeric input is supported. That is, data must be either integer or real numbers, but not characters. The only exemptions are genomic data and files containing boolean data.
  • 64 bit integers are used to store integer data. That is, integer data may range from -9.223372e+18 to +9.223372e+18.
  • All input data are renumbered automatically if required by the job and respective cross-reference files will be provided if necessary.
  • Files containing human readable array data must be in comma-separated-value(csv) format.
  • Files containing human readable vector data must contain a single column vector.
  • Pedigree files must be complete. That is, all individuals occurring as parents (2. and 3. column) must occur as individuals (1. column). All individual ids which occur in the data file must occur in the pedigree.
  • Genomic marker data must be imputed to common density across all genotypes and must not contain missing marker.

Supported operations

Currently lmt support the following operations on linear mixed models:

  • Solving for BLUP and BLUE solutions conditional on supplied variances for random and fixed factor, respectively;
  • Gibbs sampling of variance components in single pass and blocked mode;
  • AI-REML estimation of variance components using the coefficient matrix of the mixed model equation system (AI-REML-C)
  • MC-EM-REML estimation of variance components
  • Sampling (block)diagonal elements of the inverse of the mixed model coefficient matrix
  • Solving for (block)diagonal elements of the inverse of the mixed model coefficient matrix

Supported factors and variables

lmt supports

  • fixed factors
  • random factors
  • classification variables
  • continuous co-variables, which can be nested. For continuous co-variables lmt support user-defined polynomials(e.g. sin(x) or x^(0.5) ) and hard coded Legendre polynomials up to order 6.
  • genetic group co-variables

All classification and co-variables can be associated to a fixed or random factor.

Supported variance structures

For random factor lmt supports variance structures of

  • structure $$\Gamma\otimes\Sigma$$, where $$\Sigma$$ is an dense symmetric positive definite matrix, and
  • $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$, where $$\Theta$$ is symmetric positive definite block-diagonal matrix of $$n$$ symmetric positive definite martices $$\Sigma_i, i=1,..,n$$, $$\Theta_L$$ is the lower Cholesky factor of $$\Theta$$ and $$I_{\Sigma}$$ is an identity matrix of dimension $$\Sigma_i$$.

When solving linear mixed models $$\Sigma$$ and $$\Gamma$$ are user determined constants, whereas when estimating variances $$\Gamma$$ is a user determined constant and $$\Sigma$$ is a function of the data.

Supported type for $$\Gamma$$ are

  • an identity matrix
  • an arbitrary positive definite diagonal matrix
  • a pedigree-based numerator relationship matrix $$A$$ which may contain meta-founders[1]
  • a pedigree- and genotype-based relationship matrix $$H$$ which may contain meta-founders
  • genetic groups[2] absorbed into $$A$$ or $$H$$
  • a user-defined(u.d.) symmetric, positive definite matrix of which inverse is supplied
    • as a sparse upper-triangular matrix stored in csr format
    • as a dense matrix
  • a co-variance matrix of a selected auto-regressive process

Further lmt supports special variance structures which are not covered by the above description

  • SNP-BLUP co-variance structure[3] with the option to model marker co-variances as $$\Theta_L(\Gamma\otimes I_{\Sigma})\Theta_L^{'}$$.

Supported linear mixed model solvers

lmt supports

  • a direct solver requiring to explicitly build the linear mixed model equations left-hand-side coefficient matrix($$C$$)
  • a pre-conditioned gradient solver which does not require $$C$$

Supported features related to genomic data

  • direct use of genomic marker data
  • building of genomic relationship matrices($$G$$) from supplied genomic data
  • uploading of a u.d. $$G$$
  • adjustment of $$G$$ to $$A_{gg}$$ in ssGBLUP and ssSNPBLUP
  • solving ssGBLUP models[4]
  • Variance component estimation for ssGBLUP models
  • solving ssGTBLUP models[5]
  • solving ssSNPBLUP models[3]
  • calculation of true H matrix diagonal elements for ssGBLUP models
  • all Single-Step models can be run from "bottom-up", that is the user supplies the genotypes and all necessary ingredients(e.g. $$G$$) are built on the fly.

Supported pedigree types

  • ordinary pedigrees
  • probabilistic pedigrees with an unlimited number of parent pairs per individual
  • genetic group pedigrees[2]
  • meta-founder pedigrees[1]
  • ignoring of inbreeding
  • iterative derivation of inbreeding coefficients[6]

Supported features related to meta-founders and genetic groups

  • meta-founders can be modeled for all $$\Gamma$$ which contain $$A$$(.e.g. $$A$$, $$H$$ for ssGBLUP, ssGTBLUP and ssSNPBLUP)
  • genetic groups can be modeled as an extra factor or can be absorbed into all $$\Gamma$$ which contain $$A$$

References

  1. 1.0 1.1 Garcia et al.; Metafounders are related to F st fixation indices and reduce bias in single-step genomic evaluations; Genetics Selection Evolution; 2017
  2. 2.0 2.1 Westell et al.; Genetic Groups in an Animal Model; Journal of Dairy Science; 1988
  3. 3.0 3.1 Liu et al.;A single-step genomic model with direct estimation of marker effects; Journal of Dairy Science;2014
  4. Christensen et al.; Genomic prediction when some animals are not genotyped; Genetics Selection Evolution; 2010
  5. Mäntysaari et al.;Efficient single-step genomic evaluation for a multibreed beef cattle population having many genotyped animals; Journal of Animal Science;2017
  6. PM VanRanden; Accounting for Inbreeding and Crossbreeding in Genetic Evaluation of Large Populations; Journal of Dairy Science; 1992