Optical and sonar imaging system are routinely deployed for underwater search, inspection and scientific surveys of manmade and natural structures, etc. We have previously demonstrated that the integration of visual cues in optical and sonar images is an effective strategy to overcome certain shortcomings of each system alone, for 3-D reconstruction from 2-D images. In this work, we deal with the problem of structure from motion, in addressing the recovery of 3-D motion and scene structure from images taken in different poses relative to the scene. In particular, we explore the bundle adjustment formulation, where the estimation of motion and structure is carried out over all the data in a batch process. We analyze the 3-D reconstruction accuracy with computer generated noisy data with various number of points that are tracked and number of views. We also present results from an experiment with real data.