Optimizing Identification of People Living with HIV from Electronic Medical Records: Computable Phenotype Development and Validation

Yiyang Liu, Khairul A. Siddiqi, Robert L. Cook, Jiang Bian, Patrick J. Squires, Elizabeth A. Shenkman, Mattia Prosperi, Dushyantha T. Jayaweera

Research output: Contribution to journalArticlepeer-review


Background Electronic health record (EHR)-based computable phenotype algorithms allow researchers to efficiently identify a large virtual cohort of Human Immunodeficiency Virus (HIV) patients. Built upon existing algorithms, we refined, improved, and validated an HIV phenotype algorithm using data from the OneFlorida Data Trust, a repository of linked claims data and EHRs from its clinical partners, which provide care to over 15 million patients across all 67 counties in Florida. Methods Our computable phenotype examined information from multiple EHR domains, including clinical encounters with diagnoses, prescription medications, and laboratory tests. To identify an HIV case, the algorithm requires the patient to have at least one diagnostic code for HIV and meet one of the following criteria: have 1+ positive HIV laboratory, have been prescribed with HIV medications, or have 3+ visits with HIV diagnostic codes. The computable phenotype was validated against a subset of clinical notes. Results Among the 15+ million patients from OneFlorida, we identified 61,313 patients with confirmed HIV diagnosis. Among them, 8.05% met all four inclusion criteria, 69.7% met the 3+ HIV encounters criteria in addition to having HIV diagnostic code, and 8.1% met all criteria except for having positive laboratories. Our algorithm achieved higher sensitivity (98.9%) and comparable specificity (97.6%) relative to existing algorithms (77-83% sensitivity, 86-100% specificity). The mean age of the sample was 42.7 years, 58% male, and about half were Black African American. Patients' average follow-up period (the time between the first and last encounter in the EHRs) was approximately 4.6 years. The median number of all encounters and HIV-related encounters were 79 and 21, respectively. Conclusion By leveraging EHR data from multiple clinical partners and domains, with a considerably diverse population, our algorithm allows more flexible criteria for identifying patients with incomplete laboratory test results and medication prescribing history compared with prior studies.

Original languageEnglish (US)
JournalMethods of Information in Medicine
StateAccepted/In press - 2021


  • computable phenotype
  • diagnosis
  • electronic health records
  • HIV
  • virtual cohort

ASJC Scopus subject areas

  • Health Informatics
  • Advanced and Specialized Nursing
  • Health Information Management


Dive into the research topics of 'Optimizing Identification of People Living with HIV from Electronic Medical Records: Computable Phenotype Development and Validation'. Together they form a unique fingerprint.

Cite this