Some links to Stata and ML resources
Conference Articles / Presentations / Stata Journal
- 2023 – Presentation_UK_Stata_Conf_2023 – A review of machine learning commands in Stata: Giovanni Cerulli
- 2022 – Applying Machine Learning Techniques in Stata to Predict Health Outcomes Using HIV-related Data (youtube.com) – use of LASSO in HIV setting using Stata
- 2021 – Cerulli_StataConf2021 : ML using Stata and Python
- 2019 – An Introduction to Machine Learning [.2cm] with Stata – Achim Ahrens
- 2024 – ddml – ddml: Double/debiased machine learning in Stata (sagepub.com)
- 2023 – pystackd – pystacked: Stacking generalization and machine learning in Stata (sagepub.com)
- 2020 – lassopack – lassopack: Model selection and prediction with regularized regression in Stata (sagepub.com)
- 2016 – Support Vector Machines (sagepub.com) svmachines
- ELASTICREGRESS: Stata module to perform elastic net regression, lasso regression, ridge regression (repec.org)
- LASSOPACK: Stata module for lasso, square-root lasso, elastic net, ridge, adaptive lasso estimation and cross-validation (repec.org)
- PDSLASSO: Stata module for post-selection and post-regularization OLS or IV estimation and inference (repec.org)
Stata Blog Articles from 2020
- The Stata Blog » Stata/Python integration part 1: Setting up Stata to use Python
- The Stata Blog » Stata/Python integration part 2: Three ways to use Python in Stata
- The Stata Blog » Stata/Python integration part 3: How to install Python packages
- The Stata Blog » Stata/Python integration part 4: How to use Python packages
- The Stata Blog » Stata/Python integration part 5: Three-dimensional surface plots of marginal predictions
- The Stata Blog » Stata/Python integration part 6: Working with APIs and JSON data
- The Stata Blog » Stata/Python integration part 7: Machine learning with support vector machines
- The Stata Blog » Stata/Python integration part 8: Using the Stata Function Interface to copy data from Stata to Python
- The Stata Blog » Stata/Python integration part 9: Using the Stata Function Interface to copy data from Python to Stata
Other Resources
- User’s corner: Machine learning| Stata News
- Giovanni Cerulli – Machine Learning in Stata (google.com)
- Towards better clinical prediction models: seven steps for development and an ABCD for validation – PMC (nih.gov)
Seven Steps in developing a prediction model
- Problem definition and data inspection/ Research Question
- What is the precise research question
- How were patients selected
- What is already known about the predictors?
- Define the Predictors
- Were the predictors reliably and completely measured?
- Define the outcomes of Interest
- Coding of predictors
- Categorical predictors
- Continuous predictors
- Model specification
- Selection of main effects?
- Assessment of assumptions?
- Overfitting?
- Model estimation – Estimate model parameters
- Shrinkage included ?
- Model performance:
- Calibration: Caliberation plot
- A: alpha – Calibration-in-the-large – Intercept in plot; the agreement between observed endpoints and predictions
- B: beta – Calibration slope – Regression slope in plot; related to shrinkage of regression coefficients
- Discrimination: the ability of the model to distinguish a patient with the endpoint from a patient without
- Concordance C-statistic –
- ROC curve – For a binary endpoint, c is identical to the area under the receiver operating characteristic (ROC) curve
- For a time-to-event endpoint, such as survival, the calculation of c may be affected by the amount of incomplete follow-up (censoring)
- Probability of correct classification for a pair of subjects with and without the endpoint
- A better discriminating model has more spread between the predictions than a poorly discriminating model
- Concordance C-statistic –
- Clinical usefulness:
- D – Decision-curve analysis – Net true-positive classification rate by using a model over a range of thresholds –
- Net benefit (NB)
- Calibration: Caliberation plot
- Model validation
- Internal validity
- External validity
- Techniques used: split smaple, cross-valdiation, etc
- Model presentation