When using person level / individual level data (not aggregated data), quick reference commands are:
- Age standardization within sex
- proportion msvi, stdize(ageGrp) stdweight(per) over(sex)
- per = percent of population in age group calculated separately for males and females
- dstdize msvi p ageGrp if sex==1, by(sex) using(stdPop) print AND dstdize msvi p ageGrp if sex==2, by(sex) using(stdPop) print
- p = 1 in main dataset, and
- p = population size for age-sex group in using/standar population dataset
- proportion msvi, stdize(ageGrp) stdweight(per) over(sex)
- Age-sex standardized prevalence of full sample
- dstdize msvi p ageGrp sex , by(byVar) using(stdPop) print
- byVar = a constant within sample
- dstdize msvi p ageGrp sex , by(byVar) using(stdPop) print
- Incorrect usage
- proportion msvi, stdize(ageGrp sex) stdweight(per)
- dstdize msvi p ageGrp , by(sex) using(stdPop) print
- dstdize msvi p ageGrp sex , by(sex) using(stdPop) print
Datasets
Main dataset
Working with individual level data, we can use the dstdize
command to generate direct standadrized by age-sex results. Variables to be created in main dataset:
- Variable
p=1
to represent each individual ageGrp
sex
Also, here msvi
is my outcome variable coded as 0 = no outcome /1 = outcome observed. Listing of this file
* Example generated by -dataex-.
* To install: ssc install dataex.
clear
input long id float p byte(ageGrp sex) float msvi
103845 1 1 1 0
102853 1 1 1 0
103914 1 1 1 0
103936 1 1 1 0
102329 1 1 1 0
100825 1 1 1 1
103638 1 1 1 0
103608 1 1 1 0
101622 1 1 1 0
104438 1 2 1 0
103501 1 2 1 0
104502 1 2 1 0
101033 1 2 1 0
104232 1 2 1 0
104856 1 2 1 0
104713 1 2 1 0
100355 1 2 1 0
104630 1 2 1 0
102349 1 2 1 0
100646 1 2 1 0
100452 1 2 1 1
100116 1 2 1 0
100622 1 2 1 0
100321 1 2 1 0
100501 1 2 1 0
102826 1 2 1 0
100956 1 2 1 0
104755 1 2 1 0
102945 1 2 1 0
104201 1 2 1 0
100918 1 2 1 0
101403 1 2 1 0
101311 1 3 1 0
101501 1 3 1 0
100809 1 3 1 0
103517 1 3 1 0
102358 1 3 1 0
101503 1 3 1 0
100915 1 3 1 0
100505 1 3 1 0
104141 1 3 1 0
102020 1 3 1 0
100827 1 3 1 1
101028 1 3 1 0
103101 1 3 1 0
100944 1 3 1 0
104901 1 3 1 0
101103 1 3 1 0
101535 1 1 2 0
100122 1 1 2 0
104508 1 1 2 0
101035 1 1 2 0
102520 1 1 2 0
104617 1 1 2 0
101732 1 1 2 0
103252 1 1 2 0
104313 1 1 2 0
102212 1 1 2 0
102029 1 1 2 0
102145 1 1 2 0
101606 1 1 2 0
101204 1 1 2 0
101409 1 1 2 0
102238 1 1 2 0
103733 1 1 2 0
102030 1 1 2 0
104014 1 1 2 1
100652 1 1 2 0
102720 1 2 2 0
100654 1 2 2 0
103934 1 2 2 1
104655 1 2 2 1
104709 1 2 2 0
100946 1 2 2 0
103901 1 2 2 0
102524 1 2 2 0
102636 1 2 2 1
101017 1 2 2 0
100939 1 2 2 0
100921 1 2 2 0
101358 1 2 2 0
100222 1 2 2 0
100931 1 2 2 1
103640 1 2 2 0
104344 1 2 2 0
100439 1 2 2 0
101858 1 2 2 0
103122 1 2 2 0
103120 1 2 2 1
103809 1 2 2 0
101723 1 2 2 0
102232 1 2 2 0
100217 1 2 2 0
104925 1 3 2 1
103229 1 3 2 0
102730 1 3 2 0
103527 1 3 2 1
102551 1 3 2 0
103011 1 3 2 0
101533 1 3 2 1
end
label values ageGrp ageGrp
label def ageGrp 1 "50-59", modify
label def ageGrp 2 "60-69", modify
label def ageGrp 3 "70-79", modify
label values sex sexlb
label def sexlb 1 "Male", modify
label def sexlb 2 "Female", modify
label values msvi yn
label def yn 0 "No", modify
label def yn 1 "Yes", modify
label var ageGrp "Age"
label var sex "Gender"
label var msvi "Mod.-Sev. VI"
save prevData.dta, replace
table ageGrp msvi sex
------------------------------------
| Gender and Mod.-Sev. VI
| -- Male -- - Female -
Age | No Yes No Yes
----------+-------------------------
50-59 | 8 1 19 1
60-69 | 22 1 20 5
70-79 | 15 1 4 3
------------------------------------
tab ageGrp sex, col nof
| Gender
Age | Male Female | Total
-----------+----------------------+----------
50-59 | 18.75 38.46 | 29.00
60-69 | 47.92 48.08 | 48.00
70-79 | 33.33 13.46 | 23.00
-----------+----------------------+----------
Total | 100.00 100.00 | 100.00
Code language: JavaScript (javascript)
In another stdPop.dta
file, save the following data about the standard population:
p
= population size of that age-sex group.ageGrp
sex
Standard Population dataset
In another stdPop.dta
file, save the following data about the standard population:
p
= population size of that age-sex group.ageGrp
sex
Standard data example
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(ageGrp sex) double p
1 1 70193
2 1 41400
3 1 21287
1 2 61583
2 2 38746
3 2 21172
end
label values ageGrp ageGrp
label def ageGrp 1 "50-59", modify
label def ageGrp 2 "60-69", modify
label def ageGrp 3 "70-79", modify
label def ageGrp 4 ">=80", modify
label values sex sexlb
label def sexlb 1 "Male", modify
label def sexlb 2 "Female", modify
label var ageGrp "Age"
label var sex "Gender"
label var p "Each Dist age-group wise male, female population"
*************
bysort sex: egen per = pc(p)
save stdPop.dta, replace
Code language: Stata (stata)
As we can see, there are differnces in the age distrbution of males and females in sample compred to the standard population.
Age sex structures of the two datasets
use prevData.dta, clear
tab ageGrp sex, col nof
| Gender
Age | Male Female | Total
-----------+----------------------+----------
50-59 | 18.75 38.46 | 29.00
60-69 | 47.92 48.08 | 48.00
70-79 | 33.33 13.46 | 23.00
-----------+----------------------+----------
Total | 100.00 100.00 | 100.00
Code language: Stata (stata)
Standard data example
use stdPop.dta, clear
table ageGrp sex , c(min per) format(%5.2f)
--------------------------
| Gender
Age | Male Female
----------+---------------
50-59 | 52.82 50.69
60-69 | 31.16 31.89
70-79 | 16.02 17.43
--------------------------
Code language: Stata (stata)
As we can see, there are differnces in the age distrbution of males and females in sample compred to the standard population.
Age Standardization with-in Sex
use prevData.dta, clear
merge m:1 ageGrp sex using stdPop.dta
* CRUDE PREVALENCE
proportion msvi, over(sex)
--------------------------------------------------------------
| Logit
Over | Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Yes |
Male | .0625 .0349386 .020012 .1787423
Female | .1730769 .0524627 .0918451 .3022431
* AGE STANDARDIZATION WITHIN EACH SEX
* CORRECT WAY
proportion msvi, stdize(ageGrp) stdweight(per) over(sex)
| Logit
Over | Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Yes |
Male | .0822521 .0577207 .0192773 .2900981
Female | .1638016 .0482005 .088793 .2825279
* Males
dstdize msvi p age if sex==1, by(sex) using(stdPop) print
Summary of Study Populations:
sex N Crude Adj_Rate Confidence Interval
--------------------------------------------------------------------------
1 48 0.062500 0.082252 [ 0.000000, 0.195383]
* Females
dstdize msvi p age if sex==2, by(sex) using(stdPop) print
Summary of Study Populations:
sex N Crude Adj_Rate Confidence Interval
--------------------------------------------------------------------------
2 52 0.173077 0.163802 [ 0.069330, 0.258273]
Code language: Stata (stata)
Age-sex standardized [full population]
dstdize requires a by(var) mandatorily since usually the point of teh command is to compare prevalence across different populations. If if we are not interested in comparing across populations but just in estimating for our one population, what to do? Is actually quite simple. Create a new variable balled byVar (or any name that you wish) and make it constant across your entire sample.
gen byVar=1
dstdize msvi p ageGrp sex , by(byVar) using(stdPop) print
byVar N Crude Adj_Rate Confidence Interval
--------------------------------------------------------------------------
1 100 0.120000 0.121203 [ 0.046850, 0.195556]
Code language: Stata (stata)
Wrong commands
*** WRONG COMMANDS that may seem correct when you are typing them out.
**** Pay careful attention to the standard population output and
**** these proportions being applied to the stratum-specific prevalences
dstdize msvi p ageGrp , by(sex) using(stdPop) print
sex N Crude Adj_Rate Confidence Interval
--------------------------------------------------------------------------
1 48 0.062500 0.081689 [ 0.000000, 0.193017]
2 52 0.173077 0.160447 [ 0.067534, 0.253361]
** Here , the population age disribution (not the age-sex distribution) is
** being applied to both sexes
** ALSO WRONG
dstdize msvi p ageGrp sex , by(sex) using(stdPop) print
sex N Crude Adj_Rate Confidence Interval
--------------------------------------------------------------------------
Male 48 0.062500 0.042966 [ 0.000000, 0.102061]
Female 52 0.173077 0.078237 [ 0.033115, 0.123360]
* The sum of standard population distrbution of males or females in not 100%
* but closer to 50%.
* This would explain almost half the prevalence estimate observed here
* comapred to corerct results
Code language: Stata (stata)