### Abstract

In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.

Original language | English |
---|---|

Pages (from-to) | 469-485 |

Number of pages | 17 |

Journal | Biostatistics |

Volume | 7 |

Issue number | 3 |

DOIs | |

State | Published - Jul 1 2006 |

### Fingerprint

### Keywords

- Follow-up time process
- Generalized estimating equations
- Maximum likelihood
- Multinomial distribution
- Pseudolikelihood

### ASJC Scopus subject areas

- Medicine(all)
- Statistics and Probability
- Statistics, Probability and Uncertainty

### Cite this

*Biostatistics*,

*7*(3), 469-485. https://doi.org/10.1093/biostatistics/kxj019

**Estimation in regression models for longitudinal binary data with outcome-dependent follow-up.** / Fitzmaurice, Garrett M.; Lipsitz, Stuart R.; Ibrahim, Joseph G.; Gelber, Richard; Lipshultz, Steven E.

Research output: Contribution to journal › Article

*Biostatistics*, vol. 7, no. 3, pp. 469-485. https://doi.org/10.1093/biostatistics/kxj019

}

TY - JOUR

T1 - Estimation in regression models for longitudinal binary data with outcome-dependent follow-up

AU - Fitzmaurice, Garrett M.

AU - Lipsitz, Stuart R.

AU - Ibrahim, Joseph G.

AU - Gelber, Richard

AU - Lipshultz, Steven E

PY - 2006/7/1

Y1 - 2006/7/1

N2 - In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.

AB - In many observational studies, individuals are measured repeatedly over time, although not necessarily at a set of pre-specified occasions. Instead, individuals may be measured at irregular intervals, with those having a history of poorer health outcomes being measured with somewhat greater frequency and regularity. In this paper, we consider likelihood-based estimation of the regression parameters in marginal models for longitudinal binary data when the follow-up times are not fixed by design, but can depend on previous outcomes. In particular, we consider assumptions regarding the follow-up time process that result in the likelihood function separating into two components: one for the follow-up time process, the other for the outcome measurement process. The practical implication of this separation is that the follow-up time process can be ignored when making likelihood-based inferences about the marginal regression model parameters. That is, maximum likelihood (ML) estimation of the regression parameters relating the probability of success at a given time to covariates does not require that a model for the distribution of follow-up times be specified. However, to obtain consistent parameter estimates, the multinomial distribution for the vector of repeated binary outcomes must be correctly specified. In general, ML estimation requires specification of all higher-order moments and the likelihood for a marginal model can be intractable except in cases where the number of repeated measurements is relatively small. To circumvent these difficulties, we propose a pseudolikelihood for estimation of the marginal model parameters. The pseudolikelihood uses a linear approximation for the conditional distribution of the response at any occasion, given the history of previous responses. The appeal of this approximation is that the conditional distributions are functions of the first two moments of the binary responses only. When the follow-up times depend only on the previous outcome, the pseudolikelihood requires correct specification of the conditional distribution of the current outcome given the outcome at the previous occasion only. Results from a simulation study and a study of asymptotic bias are presented. Finally, we illustrate the main results using data from a longitudinal observational study that explored the cardiotoxic effects of doxorubicin chemotherapy for the treatment of acute lymphoblastic leukemia in children.

KW - Follow-up time process

KW - Generalized estimating equations

KW - Maximum likelihood

KW - Multinomial distribution

KW - Pseudolikelihood

UR - http://www.scopus.com/inward/record.url?scp=33745471883&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745471883&partnerID=8YFLogxK

U2 - 10.1093/biostatistics/kxj019

DO - 10.1093/biostatistics/kxj019

M3 - Article

C2 - 16428260

AN - SCOPUS:33745471883

VL - 7

SP - 469

EP - 485

JO - Biostatistics

JF - Biostatistics

SN - 1465-4644

IS - 3

ER -