Automatic detection of spontaneous facial Action Units (AUs) in video has many applications including understanding infants' emotion-mediated interactions and development. The target AUs for detection are those essential to positive and negative emotion (i.e., AU 6, AU 12, and AU 20). Tracking and extraction of facial features is especially challenging in infants. Face shape and texture markedly differ from that in adults, jaw contour often is occult, sudden changes in pose and expression are common, and AU often occur in complex combinations. We investigate the association among AUs central to positive and negative emotion and propose a methodology for jointly detecting positively correlated facial AUs of infants during spontaneous interactions with their parents. We apply a subject-independent structural output model to (1) recognize combinations of AUs simultaneously, and (2) model the dependencies between AUs. Using this approach, we improved the reliability of automatic detection of AU 12 and AU 20 in a total 90-minute video of infant-parent interaction of 12 infants.