Power Analysis for Logistic Regression Coefficient (Wald's Z-Test)

Calculates power or sample size (only one can be NULL at a time) to test a single coefficient in logistic regression. power.z.logistic() and power.z.logreg() are the same functions, as well as pwrss.z.logistic() and pwrss.z.logreg().

The distribution of the predictor variable can be one of the following: c("normal", "poisson", "uniform", "exponential", "binomial", "bernouilli", "lognormal") for Demidenko (2007) procedure but only c("normal", "binomial", "bernouilli") for Hsieh et al. (1998) procedure. The default parameters for these distributions are

distribution = list(dist = "normal", mean = 0, sd = 1)
distribution = list(dist = "poisson", lambda = 1)
distribution = list(dist = "uniform", min = 0, max = 1)
distribution = list(dist = "exponential", rate = 1)
distribution = list(dist = "binomial", size = 1, prob = 0.50)
distribution = list(dist = "bernoulli", prob = 0.50)
distribution = list(dist = "lognormal", meanlog = 0, sdlog = 1)

Parameters defined in list() form can be modified, but element names should be kept the same. It is sufficient to use distribution's name for default parameters (e.g. dist = "normal").

NOTE: The pwrss.z.logistic() and its alias pwrss.z.logreg() are deprecated. However, they will remain available as wrappers for the power.z.logistic() function.

Formulas are validated using G*Power and tables in PASS documentation.

Usage

power.z.logistic(prob = NULL, base.prob = NULL,
                 odds.ratio  = (prob/(1-prob))/(base.prob/(1-base.prob)),
                 beta0 = log(base.prob/(1-base.prob)), beta1 = log(odds.ratio),
                 n = NULL, power = NULL, r.squared.predictor = 0,
                 alpha = 0.05, alternative = c("two.sided", "one.sided"),
                 method = c("demidenko(vc)", "demidenko", "hsieh"),
                 distribution = "normal", ceiling = TRUE,
                 verbose = TRUE, pretty = FALSE)

Arguments

base.prob: base probability under null hypothesis (probability that an event occurs without the influence of the predictor - or when the value of the predictor is zero).
prob: probability under alternative hypothesis (probability that an event occurs when the value of the predictor is increased from 0 to 1). Warning: This is base probability + incremental increase.
beta0: regression coefficient defined as
beta0 = log(base.prob/(1-base.prob))
beta1: regression coefficient for the predictor X defined as
beta1 = log((prob/(1-prob))/(base.prob/(1-base.prob)))
odds.ratio: odds ratio defined as
odds.ratio = exp(beta1) = (prob/(1-prob))/(base.prob/(1-base.prob))
n: integer; sample size
power: statistical power, defined as the probability of correctly rejecting a false null hypothesis, denoted as \(1 - \beta\).
r.squared.predictor: proportion of variance in the predictor accounted for by other covariates. This is not a pseudo R-squared. To compute it, regress the predictor on the covariates and extract the adjusted R-squared from that model.
alpha: type 1 error rate, defined as the probability of incorrectly rejecting a true null hypothesis, denoted as \(\alpha\).
alternative: character; direction or type of the hypothesis test: "not equal", "greater", "less"
method: character; analytic method. "demidenko(vc)" stands for Demidenko (2007) procedure with variance correction; "demidenko" stands for Demidenko (2007) procedure without variance correction; "hsieh" stands for Hsieh et al. (1998) procedure. "demidenko" and "hsieh" methods produce similar results but "demidenko(vc)" is more precise
distribution: character; distribution family. Can be one of the c("noramal", "poisson", "uniform", "exponential", "binomial", "bernouilli", "lognormal") for Demidenko (2007) procedure but only c("normal", "binomial", "bernouilli") for Hsieh et al. (1998) procedure.
ceiling: logical; whether sample size should be rounded up. TRUE by default.
verbose: logical; whether the output should be printed on the console. TRUE by default.
pretty: logical; whether the output should show Unicode characters (if encoding allows for it). FALSE by default.

Value

parms: list of parameters used in calculation.
test: type of the statistical test (Z-Test).
mean: mean of the alternative distribution.
sd: standard deviation of the alternative distribution.
null.mean: mean of the null distribution.
null.sd: standard deviation of the null distribution.
z.alpha: critical value(s).
power: statistical power \((1-\beta)\).
n: sample size.

References

Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statistics in Medicine, 26(18), 3385-3397. doi:10.1002/sim.2771

Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine, 17(4), 1623-1634.

Examples


###########################################
# predictor X follows normal distribution #
###########################################

## probability specification
power.z.logistic(base.prob = 0.15, prob = 0.20,
                 alpha = 0.05, power = 0.80,
                 dist = "normal")
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Normal
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 511  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.199
#>   Statistical Power    = 0.801
#> 

## odds ratio specification
power.z.logistic(base.prob = 0.15, odds.ratio = 1.416667,
                 alpha = 0.05, power = 0.80,
                 dist = "normal")
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Normal
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 511  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.199
#>   Statistical Power    = 0.801
#> 

## regression coefficient specification
power.z.logistic(beta0 = -1.734601, beta1 = 0.3483067,
                 alpha = 0.05, power = 0.80,
                 dist = "normal")
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Normal
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 511  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.199
#>   Statistical Power    = 0.801
#> 

## change parameters associated with predictor X
pred.dist <- list(dist = "normal", mean = 10, sd = 2)
power.z.logistic(base.prob = 0.15, beta1 = 0.3483067,
                 alpha = 0.05, power = 0.80,
                 dist = pred.dist)
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Normal
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 134  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.199
#>   Statistical Power    = 0.801
#> 


##############################################
# predictor X follows Bernoulli distribution #
# (such as treatment/control groups)         #
##############################################

## odds ratio specification
power.z.logistic(base.prob = 0.15, odds.ratio = 1.416667,
                 alpha = 0.05, power = 0.80,
                 dist = "bernoulli")
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Bernoulli
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 1816  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.200
#>   Statistical Power    = 0.8
#> 

## change parameters associated with predictor X
pred.dist <- list(dist = "bernoulli", prob = 0.30)
power.z.logistic(base.prob = 0.15, odds.ratio = 1.416667,
                 alpha = 0.05, power = 0.80,
                 dist = pred.dist)
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Bernoulli
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 2114  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.200
#>   Statistical Power    = 0.8
#> 

####################################
# predictor X is an ordinal factor #
####################################

## generating an ordinal predictor
x.ord <- sample(
  x = c(1, 2, 3, 4), # levels
  size = 1e5, # sample size large enough to get stable estimates
  prob = c(0.25, 0.25, 0.25, 0.25), # category probabilities
  replace = TRUE
)

## dummy coding the ordinal predictor
x.ord <- factor(x.ord, ordered = TRUE)
contrasts(x.ord) <- contr.treatment(4, base = 4)
x.dummy <- model.matrix( ~ x.ord)[,-1]
x.data <- as.data.frame(x.dummy)

## fit linear regression to get multiple r-squared
x.fit <- lm(x.ord1 ~ x.ord2 + x.ord3, data = x.data)

## extract parameters
bern.prob <- mean(x.data$x.ord1)
r.squared.pred <- summary(x.fit)$adj.r.squared

## change parameters associated with predictor X
pred.dist <- list(dist = "bernoulli", prob = bern.prob)
power.z.logistic(base.prob = 0.15, odds.ratio = 1.416667,
               alpha = 0.05, power = 0.80,
               r.squared.pred = r.squared.pred,
               dist = pred.dist)
#> +--------------------------------------------------+
#> |             SAMPLE SIZE CALCULATION              |
#> +--------------------------------------------------+
#> 
#> Logistic Regression Coefficient (Wald's Z-Test)
#> 
#>   Method          : Demidenko (Variance Corrected)
#>   Predictor Dist. : Bernoulli
#> 
#> ---------------------------------------------------
#> Hypotheses
#> ---------------------------------------------------
#>   H0 (Null Claim) : Odds Ratio = 1
#>   H1 (Alt. Claim) : Odds Ratio != 1
#> 
#> ---------------------------------------------------
#> Results
#> ---------------------------------------------------
#>   Sample Size          = 3549  <<
#>   Type 1 Error (alpha) = 0.050
#>   Type 2 Error (beta)  = 0.200
#>   Statistical Power    = 0.8
#>