LASSO in Stata

Vivek GuptaMay 24, 2024

Least Absolute Shrinkage Selection Operator = LASSO
Supervised Machine Learning Methods for prediction.
Helps when aim is to select the best sub-set of predictors for an outcome.
Determines which predictors are relevant for an outcome by applying a penalty (Lambda) to the OLS least square. This causes some coefficients to shrink to zero excluding them from the model.
As Lambda increases, more varibles get excluded
Results in Parsimonious model

Crossvaldiation

It is a resampling technique for selection of observations for creating a model within the training dataset
CV is done within the TRAIN dataset only
Can be done k-times; eg. 10 fold crossvalidation
Helps generate a model that is more relatistic for new cases
- by allowing the model to learn from the underlyng distribution
- Prevents overfitting
Running the model k times allows us to chsose the model with best Lambda or AIC/BIC

By default, stata will select model with highest lambda. By default, stata fits up to 100 models with varying lambdas. The model with largest out of sample r-square and minimum CV mean prediction error gets selected by cross-validation

LASSO commands

splitsample : to generate traing and validation /testing/hold-out sample sets
Estimation:
- lasso
- elasticnet
- sqrtlasso
- Selection methods
  - cross-validation
  - adaptive
  - plugin
  - customized
Graph:
- cvplot: cross-validation plot
- bicplot
- coefpath: coefficient path
Exploratory tools:
- lassocoef: display lasso coefficients
- lassoinfo: summary of lasso fitting
- lassoknots: detailed tabulate table of knots
- lassoselect: manually select a tuning parameter
Prediction
- lassogof: evaluate in-sample and out-of-sample prediction
- predict: prediction for linear, binary, count, survival data

SSC Addons based methods

caliberationbelt – GiViTi Caliberation belt and test and plot for model valdiation between observed and predicted probability of outcome. It gives a test statistic and a p value in the plot. Large p-value ndicates there is no statistcially difference between model predictions and 45 degree line. 45 degree lines indicates that the observed and predicted rates are same. We want large p vales and a non-signifivant p value.
cvauroc – AUC and Discrmination performance of the model – displaus AUC at each fold and mean AUC
Rule of Thumb: cvAUC of 0.5 = Same as chance, AUC > 0.7 = Good MOdel, > 0.8 = strong model, 1 = Perfect fit
rocreg – Alternative way to estimate AUC – uses bootstrap replication

Stabdard Lasso estimation commands

lasso
cvplot
lassoknots
lassoselect
lassocoef
lassogof
bicplot

Lasso Inference commnds

dsregress, poregress, xporegress
dslogit pologic xpologit
dspoisson, popoisson, xpopoisson

ds referes to double selection lasso regression

xpo referes to cross-fit partialling out lasso regression

Predict after LASSO

Two options:

Penalised: Coefficients based prediction – default – penalized coefficients be used to calculate predictions. Penalized coefficients are those estimated by lasso in the calculation of the lasso penalty
Postselection: specifies that postselection coefficients be used to calculate predictions. Postselection coefficients are calculated by taking the variables selected by lasso and refitting the model with the appropriate ordinary estimator: linear regression for linear models, logistic regression for logit models, probit regression for probit models, Poisson regression for poisson models, and Cox regression for cox models.

It has been mentioned that In the linear model, post-selection coefficients tend to be less biased and may have better out-of-sample prediction performance than the penalized coefficients http://fmwww.bc.edu/RePEc/scon2019/chicago19_Liu.pdf

Sample command sequence

splitsample , generate(sample) nsplit(2) rseed(1234)
keep if sample==1
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
lassocoef model1, display(coef, penalized) sort(coef, penalized)
predict double outcome_predicted, pr


calibrationbelt outcome outcome_predicted, devel("internal") clevel1(0.95) clevel2 (0.99) maxDeg(4) thres(0.95)
cvauroc outcome outcome_predicted, kfold(10) seed(1972) fit detail graphlowess
rocreg outcome outcome_predicted, bseed(123456)


******************** Example from StataCorp Youtube video
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(cv) rseed(1234) folds(10)
est store model1
cvplot // Cross-validation plot - shows at what value fo lambda is the cross-validation function is minimized
est store cv
lassoknots, display(nonzero osr2 bic) // displays infor about all models fit during CV
* Select a specific model  based on BIC or Number of Coef criteria
lassoselect id = 4 // Lowest BIC
cvplot
est store minBIC

**  Adaptive LASSO model
lasso logit outcome predictor1 predictor2 predictor3 i.predictor4, selection(adaptive) rseed(1234) folds(10)
est store model1
est store  adaptive

** Compare variables included in various models, with largest standardzied coefficients displayed at top
lassocoef cv minBIC adaptive, sort(coef, standardized) nofvlabel

** Goodness of Fit of model on the test sample
lassogof cv minBIC adaptive, over(sample) postselection
 * Can choose the model with minimum mean square error and largest r-square in testing dataset



********************************** LASSO INFERENCE
webuse cattaneo2
dsregress .........

Code language: JavaScript (javascript)