- Least Absolute Shrinkage Selection Operator = LASSO
- Supervised Machine Learning Methods for prediction.
- Helps when aim is to select the best sub-set of predictors for an outcome.
- Determines which predictors are relevant for an outcome by applying a penalty (Lambda) to the OLS least square. This causes some coefficients to shrink to zero excluding them from the model.
- As Lambda increases, more varibles get excluded
- Results in Parsimonious model
Crossvaldiation
- It is a resampling technique for selection of observations for creating a model within the training dataset
- CV is done within the TRAIN dataset only
- Can be done k-times; eg. 10 fold crossvalidation
- Helps generate a model that is more relatistic for new cases
- by allowing the model to learn from the underlyng distribution
- Prevents overfitting
- Running the model k times allows us to chsose the model with best Lambda or AIC/BIC
By default, stata will select model with highest lambda. By default, stata fits up to 100 models with varying lambdas. The model with largest out of sample r-square and minimum CV mean prediction error gets selected by cross-validation
LASSO commands
- splitsample : to generate traing and validation /testing/hold-out sample sets
- Estimation:
- lasso
- elasticnet
- sqrtlasso
- Selection methods
- cross-validation
- adaptive
- plugin
- customized
- Graph:
- cvplot: cross-validation plot
- bicplot
- coefpath: coefficient path
- Exploratory tools:
- lassocoef: display lasso coefficients
- lassoinfo: summary of lasso fitting
- lassoknots: detailed tabulate table of knots
- lassoselect: manually select a tuning parameter
- Prediction
- lassogof: evaluate in-sample and out-of-sample prediction
- predict: prediction for linear, binary, count, survival data
SSC Addons based methods
- caliberationbelt – GiViTi Caliberation belt and test and plot for model valdiation between observed and predicted probability of outcome. It gives a test statistic and a p value in the plot. Large p-value ndicates there is no statistcially difference between model predictions and 45 degree line. 45 degree lines indicates that the observed and predicted rates are same. We want large p vales and a non-signifivant p value.
- cvauroc – AUC and Discrmination performance of the model – displaus AUC at each fold and mean AUC
- Rule of Thumb: cvAUC of 0.5 = Same as chance, AUC > 0.7 = Good MOdel, > 0.8 = strong model, 1 = Perfect fit
- rocreg – Alternative way to estimate AUC – uses bootstrap replication
Stabdard Lasso estimation commands
- lasso
- cvplot
- lassoknots
- lassoselect
- lassocoef
- lassogof
- bicplot
Lasso Inference commnds
- dsregress, poregress, xporegress
- dslogit pologic xpologit
- dspoisson, popoisson, xpopoisson
ds referes to double selection lasso regression
xpo referes to cross-fit partialling out lasso regression
Predict after LASSO
Two options:
- Penalised: Coefficients based prediction – default – penalized coefficients be used to calculate predictions. Penalized coefficients are those estimated by lasso in the calculation of the lasso penalty
- Postselection: specifies that postselection coefficients be used to calculate predictions. Postselection coefficients are calculated by taking the variables selected by lasso and refitting the model with the appropriate ordinary estimator: linear regression for linear models, logistic regression for logit models, probit regression for probit models, Poisson regression for poisson models, and Cox regression for cox models.
It has been mentioned that In the linear model, post-selection coefficients tend to be less biased and may have better out-of-sample prediction performance than the penalized coefficients http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf
Sample command sequence
splitsample , generate(sample) nsplit(2) rseed(1234)
keep if sample==1
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
lassocoef model1, display(coef, penalized) sort(coef, penalized)
predict double outcome_predicted, pr
calibrationbelt outcome outcome_predicted, devel("internal") clevel1(0.95) clevel2 (0.99) maxDeg(4) thres(0.95)
cvauroc outcome outcome_predicted, kfold(10) seed(1972) fit detail graphlowess
rocreg outcome outcome_predicted, bseed(123456)
******************** Example from StataCorp Youtube video
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
cvplot // Cross-validation plot - shows at what value fo lambda is the cross-validation function is minimized
est store cv
lassoknots, display(nonzero osr2 bic) // displays infor about all models fit during CV
* Select a specific model based on BIC or Number of Coef criteria
lassoselect id = 4 // Lowest BIC
cvplot
est store minBIC
** Adaptive LASSO model
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(adaptive) rseed(1234) folds(10)
est store model1
est store adaptive
** Compare variables included in various models, with largest standardzied coefficients displayed at top
lassocoef cv minBIC adaptive, sort(coef, standardized) nofvlabel
** Goodness of Fit of model on the test sample
lassogof cv minBIC adaptive, over(sample) postselection
* Can choose the model with minimum mean square error and largest r-square in testing dataset
********************************** LASSO INFERENCE
webuse cattaneo2
dsregress .........
Code language: JavaScript (javascript)
Sources
The Stata Blog » An introduction to the lasso in Stata
The Stata Blog » Using the lasso for inference in high-dimensional models
Using lasso and related estimators for prediction (stata.com)
Lasso for prediction and model selection | Stata
Predicting the individualized risk of poor adherence to ART medication among adolescents living with HIV in Uganda: the Suubi+Adherence study – PMC (nih.gov) – calibrationbelt
http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf