Sample size for Prevalence in R using epiR

Vivek GuptaMarch 30, 2025

Simple Random Sampling

Expected prevalence 50%, relative error 10%, Finite population correction – None

epi.sssimpleestb(
     Py = 0.5, epsilon = 0.10, error = "relative", N = NA,
     se = 1, sp = 1, nfractional = FALSE, conf.level = 0.95)

[1] 385Code language: PHP (php)

Expected prevalence 50%, absolute error 10%, Finite population correction – None

epi.sssimpleestb(
     Py = 0.5, epsilon = 0.10, error = "absolute", N = NA,
     se = 1, sp = 1, nfractional = FALSE, conf.level = 0.95)

[1] 97Code language: PHP (php)

One Stage Cluster Sampling

Estimate number of clusters when Expected prevalence = 50% , relative error 10%, cluster size 75, ICC 0.2, 95% Confidence. Entire cluster will be selected.

epi.ssclus1estb(
    Py = 0.5, epsilon = 0.10, error = "relative", 
    b = 75, rho = 0.20, N.psu = NA,
    nfractional = FALSE, conf.level = 0.95)

## RESULTS
$n.psu = 81
$n.ssu = 6070
$DEF = 15.8
$rho =  0.2Code language: PHP (php)

Estimate number of clusters when Expected prevalence = 50% , relative error 10%, cluster size 75, ICC 0.2, 95% Confidence when total number of clusters in sampling frame is 1000

epi.ssclus1estb(
     Py = 0.5, epsilon = 0.10, error = "relative", 
     b = 75, rho = 0.20, N.psu = 1000,
     nfractional = FALSE, conf.level = 0.95)
$n.psu = 75
$n.ssu = 5621
$DEF = 15.8
$rho = 0.2Code language: PHP (php)

In case the cluster size is not equal, then we need mean and SD of cluster size. b is a vector of length two the first element represents the mean number of individual listing units to be sampled from each cluster and the second element represents the standard deviation of the number of individual listing .

Example with average cluster size of 75 and SD of 20

epi.ssclus1estb(
     Py = 0.5, epsilon = 0.10, error = "relative", 
     b = c(75, 20), rho = 0.20, N.psu = 1000,
     nfractional = FALSE, conf.level = 0.95)
$n.psu
[1] 80
$n.ssu
[1] 5970
$DEF
[1] 16.86667
$rho
[1] 0.2Code language: PHP (php)

Two stage cluster sampling

Use when the entire cluster will NOT be selected but sampling will be done within each cluster.

EXAMPLE 1 (from Bennett et al. 1991 p 102):
The expected prevalence of disease is thought to be around 20% +/- 5% . ICC/rho 0.02. 95% Confidence; Plan to sample 20 individuals per cluster.

epi.ssclus2estb(
     Py = 0.2, epsilon = 0.05, error = "absolute", 
     b = 20, rho = 0.02, N.psu = NA,
     nfractional = FALSE, conf.level = 0.95)
$n.psu
[1] 17
$n.ssu
[1] 340
$DEF
[1] 1.38
$rho
[1] 0.02     Code language: PHP (php)

A total of 17 clusters need to be sampled to meet the specifications of this study. epi.ssclus2estb returns a warning message that the number of clusters is less than 25.

Guidance

as per epi.ssclus2estb help – As a rule of thumb, around 30 PSUs will provide good estimates of the true population value with an acceptable level of precision (Binkin et al. 1992) when: (1) the true population value is between 10% and 90%; and (2) the desired absolute error is around 5%. For a fixed number of individual listing units selected per cluster (e.g., 10 individuals per cluster or 30 individuals per cluster), collecting information on more than 30 clusters can improve the precision of the final population estimate, however, beyond around 60 clusters the improvement in precision is minimal.