Parameters and Configuration¶
VSOLassoBag¶
-
mat¶ Type: matrix Default: noneindependent variables. sample matrix that each column represent a variable and rows represent sample data points, all the entries in it should be numeric
-
out.mat¶ Type: vector / data.frame Default: nonedependent variables, which contains one column or two columns. vector or dataframe with the same rows as the sample size of mat
-
observed.fre¶ Type: data.frame Default: NULLdata.frame with columns variable and Frequency , which can be obtained from existed LASSOBag results for re-analysis; A warning will be issued if the variables in observed.fre not found in mat , and these variables will be excluded
-
bootN¶ Type: integer Default: 1000the size of re-sampled samples for bagging; smaller consumes less processing time but may not get robust results.
-
boot.rep¶ Type: boolean Default: TRUEwhether sampling with return or not in the bagging procedure
-
a.family¶ Type: character Default: noneOptions: "gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"a character determine the data type of out.mat, the same used in glmnet
-
additional.covariable¶ Type: data.frame Default: NULLprovide additional covariable(s) to build the cox model, only valid in Cox method (a.family == “cox”); a data.frame with same rows as mat
-
bagFreq.sigMethod¶ Type: character Default: "CEP"Options: "CEP", "PST", "PERT"a character to determine the cut-off point decision method for the importance measure (i.e. the observed selection frequency). Supported methods are “Parametric Statistical Test” (abbr. “PST”), “Curve Elbow Point Detection” (“CEP”) and “Permutation Test” (“PERT”). The default and preferable method is “CEP”. The method “PERT” is not recommended due to consuming time and memmory requirement
-
kneedle.S¶ Type: numeric Default: 10numeric, an important parameter that determines how aggressive the elbow points on the curve to be called, smaller means more aggressive and may find more elbow points; Default kneedle.S =10 seems fine, but feel free to try other values; The selection of kneedle.S should be based on the shape of observed frequency curve; It is suggested to use larger S first
-
auto.loose¶ Type: boolean Default: TRUEif TRUE, will reduce kneedle.S automatically in case no elbow point is found with the set kneedle.S ; only valid when bagFreq.sigMethod is “Curve Elbow Point Detection” (“CEP”)
-
loosing.factor¶ Type: numeric Default: 0.5a numeric value range in (0,1), which kneedle.S is multiplied by to reduce itself; only valid when auto.loose set to TRUE
-
min.S¶ Type: numeric Default: 0.1a numeric value determines the minimal value that kneedle.S will be loosed to; only valid when auto.loose set to TRUE
-
use.gpd¶ Type: boolean Default: FALSEwhether to fit Generalized Pareto Distribution to the permutation results to accelerate the process. Only valid when bagFreq.sigMethod is “Permutation Test” (“PERT”)
-
fit.pareto¶ Type: character Default: "gd"Options: "gd", "mle"the method of fitting Generalized Pareto Distribution, default choice is “gd”, for Gradient Descend, and alternative as “mle”, for Maximum Likelihood Estimation (only valid in “PERT” mode)
-
imputeN¶ Type: integer Default: 1000the initial permutation times (only valid in “PERT” mode)
-
imputeN.max¶ Type: integer Default: 2000the max permutation times. Regardless of whether p-value has meet the requirement (only valid in “PERT” mode)
-
permut.increase¶ Type: integer Default: 100if the initial imputeN times of permutation doesn’t meet the requirement, then we add permut.increase times of permutation to get more random/permutation values (only valid in “PERT” mode)
-
parallel¶ Type: boolean Default: FALSEwhether run in parallel mode; you also need to set n.cores to determine how much CPU resource to use
-
n.cores¶ Type: integer Default: 1how many threads/process to be assigned for this function; more threads used results in more resource of CPU and memory required
-
rd.seed¶ Type: numeric Default: 10867the random seed of this function, in case some of the experiments need to be reproduced
-
nfolds¶ Type: integer Default: 4an integer > 2, how many folds to be created for n-folds cross-validation LASSO in cv.glmnet
-
lambda.type¶ Type: character Default: "lambda.1se"Options: "lambda.1se", "lambda.min"character, which model should be used to obtain the variables selected in one bagging. Default is “lambda.1se” for less variables selected and lower probability being over-fitting. See the help of cv.glmnet for more details.
-
plot.freq¶ Type: character Default: "part"Options: "part", "full", "not"whether to show all the non-zero frequency in the final barplot or not. If “full”, all the variables(including zero frequency) will be plotted. If “part”(default), all the non-zero variables will be plotted. If “not”, will not print the plot.
-
plot.out¶ Type: boolean / character Default: FALSEthe file’s name of the frequency plot. If set to FALSE, no plot will be output. If you run this function in Linux command line, you don’t have to set this param for the plot.freq will output your plot to your current working directory with name “Rplot.pdf”.Default to FALSE.
-
do.plot¶ Type: boolean Default: TRUEif TRUE generate result plots
-
output.dir¶ Type: character Default: NAthe path to save result files generated by Lasso.bag (if not existed, will be created). Default is NA, will save in the same space as the current working dir
-
filter.method¶ Type: character Default: "auto"Options: "auto","pearson", "spearman", "kendall", "cox"the filter method applied to input matrix; default is “auto”, automatically select the filter method according to the data type of out.mat. Specific supported methods are “pearson”, “spearman”, “kendall” from cor.test method, and “cox” from coxph method, and “none” (no filter applied).
-
inbag.filter¶ Type: boolean Default: TRUEif TRUE, apply filters to the re-sampled bagging samples rather than the original samples
-
filter.thres.method¶ Type: character Default: "fdr"Options: "fdr","rank"the method determines the threshold of importance in filters. Supported methods are “fdr” and “rank”
-
filter.thres.P¶ Type: numeric Default: 0.05if filter.thres.method is “fdr”, use filter.thres.P as the (adjusted) cut-off p-value
-
filter.rank.cutoff¶ Type: numeric Default: 0.05if filter.thres.method is “rank”, use filter.rank.cutoff as the cut-off rank
-
filter.min.variables¶ Type: integer Default: -Infminimum important variables selected by filters. Useful when building a multi-variable cox model since cox model can only be built on limited variables. Default is -Inf (not applied)
-
filter.max.variables¶ Type: integer Default: Infmaximum important variables selected by filters. Useful when building a multi-variable cox model since cox model can only be built on limited variables. Default is Inf (not applied)
-
filter.result.report¶ Type: boolean Default: TRUEif TRUE generate filter reports for filter results, only vaild when inbag.filter set to FALSE (i.e. only valid in out-bag filters mode)
-
filter.report.all.variables¶ Type: boolean Default: TRUEif TRUE report all variables in the filter report, only valid when filter.result.report set to TRUE
-
post.regression¶ Type: boolean Default: FALSEbuild a regression model based on the variables selected by LASSOBag process
-
post.LASSO¶ Type: boolean Default: FALSEbuild a LASSO regression model based on the variables selected by LASSOBag process, only vaild when post.regression set to TRUE
-
pvalue.cutoff¶ Type: numeric Default: 0.05determine the cut-off p-value for what variables were selected by LASSOBag, only vaild when post.regression is TRUE and bagFreq.sigMethod set to “Parametric Statistical Test” or “Permutation Test”
-
used.elbow.point¶ Type: character Default: "middle"Options: "middle","first","last"determine which elbow point to use if multiple elbow points were detected for what variables were selected by LASSOBag. Supported methods are “first”, “middle” and “last”. Default is “middle”, use the middle one among all detected elbow points. Only vaild when post.regression is TRUE and bagFreq.sigMethod set to “Curve Elbow Point Detection”