Epidemiology & Technology

Age-Sex Direct Standardization in Stata

When using person level / individual level data (not aggregated data), quick reference commands are:

  • Age standardization within sex
    • proportion msvi, stdize(ageGrp) stdweight(per) over(sex)
      • per = percent of population in age group calculated separately for males and females
    • dstdize msvi p ageGrp if sex==1, by(sex) using(stdPop) print AND dstdize msvi p ageGrp if sex==2, by(sex) using(stdPop) print
      • p = 1 in main dataset, and
      • p = population size for age-sex group in using/standar population dataset
  • Age-sex standardized prevalence of full sample
    • dstdize msvi p ageGrp sex , by(byVar) using(stdPop) print
      • byVar = a constant within sample
  • Incorrect usage
    • proportion msvi, stdize(ageGrp sex) stdweight(per)
    • dstdize msvi p ageGrp , by(sex) using(stdPop) print
    • dstdize msvi p ageGrp sex , by(sex) using(stdPop) print

Datasets

Main dataset

Working with individual level data, we can use the dstdize command to generate direct standadrized by age-sex results. Variables to be created in main dataset:

  • Variable p=1 to represent each individual
  • ageGrp
  • sex

Also, here msvi is my outcome variable coded as 0 = no outcome /1 = outcome observed. Listing of this file

* Example generated by -dataex-. 
*  To install: ssc install dataex.         
clear   
input long id float p byte(ageGrp sex) float msvi

103845 1 1 1 0
102853 1 1 1 0
103914 1 1 1 0
103936 1 1 1 0
102329 1 1 1 0
100825 1 1 1 1
103638 1 1 1 0
103608 1 1 1 0
101622 1 1 1 0
104438 1 2 1 0
103501 1 2 1 0
104502 1 2 1 0
101033 1 2 1 0
104232 1 2 1 0
104856 1 2 1 0
104713 1 2 1 0
100355 1 2 1 0
104630 1 2 1 0
102349 1 2 1 0
100646 1 2 1 0
100452 1 2 1 1
100116 1 2 1 0
100622 1 2 1 0
100321 1 2 1 0
100501 1 2 1 0
102826 1 2 1 0
100956 1 2 1 0
104755 1 2 1 0
102945 1 2 1 0
104201 1 2 1 0
100918 1 2 1 0
101403 1 2 1 0
101311 1 3 1 0
101501 1 3 1 0
100809 1 3 1 0
103517 1 3 1 0
102358 1 3 1 0
101503 1 3 1 0
100915 1 3 1 0
100505 1 3 1 0
104141 1 3 1 0
102020 1 3 1 0
100827 1 3 1 1
101028 1 3 1 0
103101 1 3 1 0
100944 1 3 1 0
104901 1 3 1 0
101103 1 3 1 0
101535 1 1 2 0
100122 1 1 2 0
104508 1 1 2 0
101035 1 1 2 0
102520 1 1 2 0
104617 1 1 2 0
101732 1 1 2 0
103252 1 1 2 0
104313 1 1 2 0
102212 1 1 2 0
102029 1 1 2 0
102145 1 1 2 0
101606 1 1 2 0
101204 1 1 2 0
101409 1 1 2 0
102238 1 1 2 0
103733 1 1 2 0
102030 1 1 2 0
104014 1 1 2 1
100652 1 1 2 0
102720 1 2 2 0
100654 1 2 2 0
103934 1 2 2 1
104655 1 2 2 1
104709 1 2 2 0
100946 1 2 2 0
103901 1 2 2 0
102524 1 2 2 0
102636 1 2 2 1
101017 1 2 2 0
100939 1 2 2 0
100921 1 2 2 0
101358 1 2 2 0
100222 1 2 2 0
100931 1 2 2 1
103640 1 2 2 0
104344 1 2 2 0
100439 1 2 2 0
101858 1 2 2 0
103122 1 2 2 0
103120 1 2 2 1
103809 1 2 2 0
101723 1 2 2 0
102232 1 2 2 0
100217 1 2 2 0
104925 1 3 2 1
103229 1 3 2 0
102730 1 3 2 0
103527 1 3 2 1
102551 1 3 2 0
103011 1 3 2 0
101533 1 3 2 1
end
label values ageGrp ageGrp
label def ageGrp 1 "50-59", modify
label def ageGrp 2 "60-69", modify
label def ageGrp 3 "70-79", modify
label values sex sexlb
label def sexlb 1 "Male", modify
label def sexlb 2 "Female", modify
label values msvi yn

label def yn 0 "No", modify 

label def yn 1 "Yes", modify 

label var ageGrp "Age"  

label var sex "Gender"  

label var msvi "Mod.-Sev. VI"  

save prevData.dta, replace

table ageGrp msvi sex 

------------------------------------
          | Gender and Mod.-Sev. VI 
          | -- Male --    - Female -
      Age |   No   Yes      No   Yes
----------+-------------------------
    50-59 |    8     1      19     1
    60-69 |   22     1      20     5
    70-79 |   15     1       4     3
------------------------------------

tab ageGrp sex, col nof

           |        Gender
       Age |      Male     Female |     Total
-----------+----------------------+----------
     50-59 |     18.75      38.46 |     29.00 
     60-69 |     47.92      48.08 |     48.00 
     70-79 |     33.33      13.46 |     23.00 
-----------+----------------------+----------
     Total |    100.00     100.00 |    100.00 

Code language: JavaScript (javascript)

In another stdPop.dta file, save the following data about the standard population:

  • p = population size of that age-sex group.
  • ageGrp
  • sex
Standard Population dataset

In another stdPop.dta file, save the following data about the standard population:

  • p = population size of that age-sex group.
  • ageGrp
  • sex

Standard data example

* Example generated by -dataex-. To install: ssc install dataex

clear

input byte(ageGrp sex) double p
1 1 70193
2 1 41400
3 1 21287
1 2 61583
2 2 38746
3 2 21172
end
label values ageGrp ageGrp
label def ageGrp 1 "50-59", modify
label def ageGrp 2 "60-69", modify
label def ageGrp 3 "70-79", modify
label def ageGrp 4 ">=80", modify
label values sex sexlb
label def sexlb 1 "Male", modify
label def sexlb 2 "Female", modify
label var ageGrp "Age" 
label var sex "Gender" 
label var p "Each Dist age-group wise male, female population" 
*************
bysort sex: egen per = pc(p)

save stdPop.dta, replace
Code language: Stata (stata)

As we can see, there are differnces in the age distrbution of males and females in sample compred to the standard population.

Age sex structures of the two datasets

use prevData.dta, clear

tab ageGrp sex, col nof
           |        Gender
       Age |      Male     Female |     Total
-----------+----------------------+----------
     50-59 |     18.75      38.46 |     29.00 
     60-69 |     47.92      48.08 |     48.00 
     70-79 |     33.33      13.46 |     23.00 

-----------+----------------------+----------
     Total |    100.00     100.00 |    100.00 Code language: Stata (stata)

Standard data example

use stdPop.dta, clear
table ageGrp sex , c(min per) format(%5.2f)

--------------------------
          |     Gender    
      Age |   Male  Female
----------+---------------
    50-59 |  52.82   50.69
    60-69 |  31.16   31.89
    70-79 |  16.02   17.43
--------------------------Code language: Stata (stata)

As we can see, there are differnces in the age distrbution of males and females in sample compred to the standard population.

Age Standardization with-in Sex

use prevData.dta, clear

merge m:1 ageGrp sex using stdPop.dta

* CRUDE PREVALENCE
proportion msvi,  over(sex)  

--------------------------------------------------------------
             |                                   Logit
        Over | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
Yes          |
        Male |      .0625   .0349386       .020012    .1787423
      Female |   .1730769   .0524627      .0918451    .3022431

* AGE STANDARDIZATION WITHIN EACH SEX 
* CORRECT WAY
proportion msvi, stdize(ageGrp) stdweight(per) over(sex)  

             |                                   Logit
        Over | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
Yes          |
        Male |   .0822521   .0577207      .0192773    .2900981
      Female |   .1638016   .0482005       .088793    .2825279

* Males
dstdize msvi p  age if sex==1, by(sex) using(stdPop)  print 

Summary of Study Populations:
      sex             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
        1            48   0.062500     0.082252    [  0.000000,    0.195383]

* Females
dstdize msvi p  age if sex==2, by(sex) using(stdPop)  print 

Summary of Study Populations:
      sex             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
        2            52   0.173077     0.163802    [  0.069330,    0.258273]


Code language: Stata (stata)

Age-sex standardized [full population]

dstdize requires a by(var) mandatorily since usually the point of teh command is to compare prevalence across different populations. If if we are not interested in comparing across populations but just in estimating for our one population, what to do? Is actually quite simple. Create a new variable balled byVar (or any name that you wish) and make it constant across your entire sample.

gen byVar=1

dstdize msvi p  ageGrp sex  , by(byVar) using(stdPop)  print 

    byVar             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
        1           100   0.120000     0.121203    [  0.046850,    0.195556]
Code language: Stata (stata)

Wrong commands

***  WRONG COMMANDS that may seem correct when you are typing them out. 
**** Pay careful attention to the standard population output and 
****  these proportions being applied to the stratum-specific prevalences

dstdize msvi p  ageGrp , by(sex) using(stdPop)  print 

      sex             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
        1            48   0.062500     0.081689    [  0.000000,    0.193017]
        2            52   0.173077     0.160447    [  0.067534,    0.253361]
** Here , the population age disribution (not the age-sex distribution) is 
** being applied to  both sexes

** ALSO WRONG
dstdize msvi p  ageGrp sex  , by(sex) using(stdPop)  print 

      sex             N      Crude     Adj_Rate       Confidence Interval
 --------------------------------------------------------------------------
     Male            48   0.062500     0.042966    [  0.000000,    0.102061]
  Female            52   0.173077     0.078237    [  0.033115,    0.123360]

* The sum of standard population distrbution of males or females in not 100% 
* but closer to 50%.
* This would explain almost half the prevalence estimate observed here 
* comapred to corerct results

Code language: Stata (stata)

Related Posts