Group detection is a fundamental problem in sociological and behavioral data analysis and has attracted much attention in recent years. Most of the current approaches focus on using visual data, such as still images and videos, to detect groups. One of the most important applications of group detection is to assist psychologists to understand the classroom dynamics. However, the camera recordings may be unavailable and it could be infeasible to set up the cameras without blind spots. Therefore, as an alternative approach to group detection, we propose an audio-based framework that utilizes multiple synchronized audio data streams collected from wearable devices on each subject. In this paper, the audio recordings collected from a preschool classroom over multiple days are used to produce the group detection results which are validated by clustering the subject locations collected along with the audio data. The experiment shows on average 0.391 Normalized Mutual Information (NMI) scores for the detected groups by the audio-based framework and location-based clustering.