Package 'TSDFGS' reference manual

Title:	Training Set Determination For Genomic Selection
Description:	We propose an optimality criterion to determine the required training set, r-score, which is derived directly from Pearson's correlation between the genomic estimated breeding values and phenotypic values of the test set <doi:10.1007/s00122-019-03387-0>. This package provides two main functions to determine a good training set and its size.
Authors:	Jen-Hsiang Ou [aut, cre] , Po-Ya Wu [aut] , Chen-Tuo Liao [aut, ths]
Maintainer:	Jen-Hsiang Ou <[email protected]>
License:	GPL (>= 3)
Version:	2.4.2
Built:	2025-03-10 03:03:18 UTC
Source:	https://github.com/oumarkme/tsdfgs

CD-score

Description

This function calculate CD-score <doi:10.1186/1297-9686-28-4-359> by given training set and test set.

Usage

cd_score(X, X0)
cd_score(X, X0)

Arguments

`X`	A numeric matrix. The training set genotypic information matrix can be given as genotype matrix (coded as -1, 0, 1) or principle component matrix (row: sample; column: marker).
`X0`	A numeric mareix. The test set genotypic information matrix can be given as genotype matrix (coded as -1, 0, 1) or principle component matrix (row: sample; column: marker).

Value

A floating-point number, CD score.

Author(s)

Jen-Hsiang Ou

Examples

data(geno)
## Not run: cd_score(geno[1:50, ], geno[51:100])

data(geno)
## Not run: cd_score(geno[1:50, ], geno[51:100])

Fit logistic growth curve model

Description

A function for fitting logisti growth model

Usage

FGCM(geno, nt = NULL, n_iter = NULL, multi.threads = FALSE)
FGCM(geno, nt = NULL, n_iter = NULL, multi.threads = FALSE)

Arguments

`geno`	Genotype information saved as a dataframe. Columns represent variants (SNPs or PCs).
`nt`	A numerical vector of training set sample size for estimating logistic growth curve parameters
`n_iter`	Number of simulation of each training set size. Automatically gave a suitable number by default.
`multi.threads`	Default: FALSE. Multi-thread function is only avalyble for mac or linux systems.

Value

Estimation of parameters.

Examples

data(geno)
## Not run: FGCM(geno)
data(geno)
## Not run: FGCM(geno)

Genotype information

Description

A PCA matrix of rice genotype information. This data was published by Zhao et al. (2011) <doi:10.1038/ncomms1467>

Usage

geno
geno

Format

A numeric matrix (PCA) with 404 rows (sample) and 404 columns (PCs).

Source

http://www.ricediversity.org/data/

Examples

data(geno)
data(geno)

Simulate r-scores of each training set size

Description

Calculate r-scores (un-target) by in parallel.

Usage

nt2r(geno, nt, n_iter = 30, multi.threads = FALSE)
nt2r(geno, nt, n_iter = 30, multi.threads = FALSE)

Arguments

`geno`	A numeric dataframe of genotype, column represent sites (genotype coding as 1, 0, -1)
`nt`	Numeric. Number of training set size
`n_iter`	Times of iteration. (default = 30)
`multi.threads`	Default: FALSE. Multi-thread function is only avalyble for mac or linux systems.

Value

A vector of r-scores of each iteration

Examples

data(geno)
## Not run: nt2r(geno, 50)
data(geno)
## Not run: nt2r(geno, 50)

Optimal training set determination

Description

This function is designed for determining optimal training set.

Usage

optTrain(
  geno,
  cand,
  n.train,
  subpop = NULL,
  test = NULL,
  method = "rScore",
  min.iter = NULL,
  console = TRUE
)
optTrain(
  geno,
  cand,
  n.train,
  subpop = NULL,
  test = NULL,
  method = "rScore",
  min.iter = NULL,
  console = TRUE
)

Arguments

`geno`	A numeric matrix of principal components (rows: individuals; columns: PCs).
`cand`	An integer vector of which rows of individuals are candidates of the training set in the geno matrix.
`n.train`	The size of the target training set. This could be determined with the help of the ssdfgp function provided in this package.
`subpop`	A character vector of sub-population's group name. The algorithm will ignore the population structure if it remains NULL.
`test`	An integer vector of which rows of individuals are in the test set in the geno matrix. The algorithm will use an un-target method if it remains NULL.
`method`	Choices are rScore, PEV and CD. rScore will be used by default.
`min.iter`	Minimum iteration of all methods can be appointed. One should always check if the algorithm is converged or not. A minimum iteration will set by considering the candidate and test set size if it remains NULL.
`console`	Default: TRUE. Set it to FALSE if you don't want the function printing out the number count of each iteration.

Value

This function will return 3 information including OPTtrain (a vector of chosen optimal training set), TOPscore (highest scores of before iteration), and ITERscore (criteria scores of each iteration).

Author(s)

Jen-Hsiang Ou

Examples

data(geno)
## Not run: optTrain(geno, cand = 1:404, n.train = 100)

data(geno)
## Not run: optTrain(geno, cand = 1:404, n.train = 100)

PEV score

Description

This function calculate prediction error variance (PEV) score <doi:10.1186/s12711-015-0116-6> by given training set and test set.

Usage

pev_score(X, X0)
pev_score(X, X0)

Arguments

`X`	A numeric matrix. The training set genotypic information matrix can be given as genotype matrix (coded as -1, 0, 1) or principle component matrix (row: sample; column: marker).
`X0`	A numeric mareix. The test set genotypic information matrix can be given as genotype matrix (coded as -1, 0, 1) or principle component matrix (row: sample; column: marker).

Value

A floating-point number, PEV score.

Author(s)

Jen-Hsiang Ou

Examples

data(geno)
## Not run: pev_score(geno[1:50, ], geno[51:100])

data(geno)
## Not run: pev_score(geno[1:50, ], geno[51:100])

r-score

Description

This function calculate r-score <doi:10.1007/s00122-019-03387-0> by given training set and test set.

Usage

r_score(X, X0)
r_score(X, X0)

Arguments

`X`	A numeric matrix. The training set genotypic information matrix can be given as genotype matrix (coded as -1, 0, 1) or principle component matrix (row: sample; column: marker).
`X0`	A numeric mareix. The test set genotypic information matrix can be given as genotype matrix (coded as -1, 0, 1) or principle component matrix (row: sample; column: marker).

Value

A floating-point number, r-score.

Author(s)

Jen-Hsiang Ou

Examples

data(geno)
## Not run: r_score(geno[1:50, ], geno[51:100])

data(geno)
## Not run: r_score(geno[1:50, ], geno[51:100])

Sample size determination for genomic selection

Description

This function is designed to generate an operating curve for sample size determination

Usage

SSDFGS(geno, nt = NULL, n_iter = NULL, multi.threads = FALSE)
SSDFGS(geno, nt = NULL, n_iter = NULL, multi.threads = FALSE)

Arguments

`geno`	A numeric data frame carried genotype information (column: PCs, row: sample)
`nt`	A numeric vector carried training set sizes for r-score simulation.
`n_iter`	Number of iterations for estimating parameters.
`multi.threads`	Default: False. If TRUE, this function will use 75% of threads if the computer has more than 4 threads. Multi-thread computing is only avalible in mac and linux environments.

Value

An operating curve and its information.

Author(s)

Jen-Hsiang Ou & Po-Ya Wu

Examples

data(geno)
## Not run: SSDFGS(geno)

data(geno)
## Not run: SSDFGS(geno)

Sub-population information

Description

Sub-population information of samples. This data was published by Zhao et al. (2011) <doi:10.1038/ncomms1467>

Usage

subpop
subpop

Format

A character vector.

Source

http://www.ricediversity.org/data/

Examples

data(subpop)
data(subpop)

We propose an optimality criterion to determine the required training set, r-score, which is derived directly from Pearson's correlation between the genomic estimated breeding values and phenotypic values of the test set <doi:10.1007/s00122-019-03387-0>. This package provides two main functions to determine a good training set and its size.

Package 'TSDFGS'

Help Index

CD-score

Description

Usage

Arguments

Value

Author(s)

Examples

Fit logistic growth curve model

Description

Usage

Arguments

Value

Examples

Genotype information

Description

Usage

Format

Source

Examples

Simulate r-scores of each training set size

Description

Usage

Arguments

Value

Examples

Optimal training set determination

Description

Usage

Arguments

Value

Author(s)

Examples

PEV score

Description

Usage

Arguments

Value

Author(s)

Examples

r-score

Description

Usage

Arguments

Value

Author(s)

Examples

Sample size determination for genomic selection

Description

Usage

Arguments

Value

Author(s)

Examples

Sub-population information

Description

Usage

Format

Source

Examples

TSDFGS

Description