Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US

A Machine Learning Approach

Yue Pan, Hongmei Liu, Lisa R. Metsch, Daniel J Feaster

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9%. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.

Original languageEnglish (US)
Pages (from-to)534-546
Number of pages13
JournalAIDS and Behavior
Volume21
Issue number2
DOIs
StatePublished - Feb 1 2017

Fingerprint

Substance-Related Disorders
HIV
Condoms
Counseling
Machine Learning
National Institute on Drug Abuse (U.S.)
Logistic Models
Program Evaluation
Self Efficacy
Sexual Behavior
Forests
Clinical Trials
Learning
Depression

Keywords

  • HIV testing
  • Multivariate analysis
  • Random forest
  • Sexual risk behaviors
  • Substance use
  • Supervised learning

ASJC Scopus subject areas

  • Social Psychology
  • Public Health, Environmental and Occupational Health
  • Infectious Diseases

Cite this

Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US : A Machine Learning Approach. / Pan, Yue; Liu, Hongmei; Metsch, Lisa R.; Feaster, Daniel J.

In: AIDS and Behavior, Vol. 21, No. 2, 01.02.2017, p. 534-546.

Research output: Contribution to journalArticle

@article{7ac6b4d54aaf4e47ac141478bb548d33,
title = "Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US: A Machine Learning Approach",
abstract = "HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9{\%}. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.",
keywords = "HIV testing, Multivariate analysis, Random forest, Sexual risk behaviors, Substance use, Supervised learning",
author = "Yue Pan and Hongmei Liu and Metsch, {Lisa R.} and Feaster, {Daniel J}",
year = "2017",
month = "2",
day = "1",
doi = "10.1007/s10461-016-1628-y",
language = "English (US)",
volume = "21",
pages = "534--546",
journal = "AIDS and Behavior",
issn = "1090-7165",
publisher = "Springer New York",
number = "2",

}

TY - JOUR

T1 - Factors Associated with HIV Testing Among Participants from Substance Use Disorder Treatment Programs in the US

T2 - A Machine Learning Approach

AU - Pan, Yue

AU - Liu, Hongmei

AU - Metsch, Lisa R.

AU - Feaster, Daniel J

PY - 2017/2/1

Y1 - 2017/2/1

N2 - HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9%. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.

AB - HIV testing is the foundation for consolidated HIV treatment and prevention. In this study, we aim to discover the most relevant variables for predicting HIV testing uptake among substance users in substance use disorder treatment programs by applying random forest (RF), a robust multivariate statistical learning method. We also provide a descriptive introduction to this method for those who are unfamiliar with it. We used data from the National Institute on Drug Abuse Clinical Trials Network HIV testing and counseling study (CTN-0032). A total of 1281 HIV-negative or status unknown participants from 12 US community-based substance use disorder treatment programs were included and were randomized into three HIV testing and counseling treatment groups. The a priori primary outcome was self-reported receipt of HIV test results. Classification accuracy of RF was compared to logistic regression, a standard statistical approach for binary outcomes. Variable importance measures for the RF model were used to select the most relevant variables. RF based models produced much higher classification accuracy than those based on logistic regression. Treatment group is the most important predictor among all covariates, with a variable importance index of 12.9%. RF variable importance revealed that several types of condomless sex behaviors, condom use self-efficacy and attitudes towards condom use, and level of depression are the most important predictors of receipt of HIV testing results. There is a non-linear negative relationship between count of condomless sex acts and the receipt of HIV testing. In conclusion, RF seems promising in discovering important factors related to HIV testing uptake among large numbers of predictors and should be encouraged in future HIV prevention and treatment research and intervention program evaluations.

KW - HIV testing

KW - Multivariate analysis

KW - Random forest

KW - Sexual risk behaviors

KW - Substance use

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85001820110&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85001820110&partnerID=8YFLogxK

U2 - 10.1007/s10461-016-1628-y

DO - 10.1007/s10461-016-1628-y

M3 - Article

VL - 21

SP - 534

EP - 546

JO - AIDS and Behavior

JF - AIDS and Behavior

SN - 1090-7165

IS - 2

ER -