Target-based positioning and 3-D target reconstruction are critical capabilities in deploying submersible platforms for a range of underwater applications, e.g., search and inspection missions. While optical cameras provide high-resolution and target details, they are constrained by limited visibility range. In highly turbid waters, target at up to distances of 10s of meters can be recorded by high-frequency (MHz) 2-D sonar imaging systems that have become introduced to the commercial market in recent years. Because of lower resolution and SNR level and inferior target details compared to optical camera in favorable visibility conditions, the integration of both sensing modalities can enable operation in a wider range of conditions with generally better performance compared to deploying either system alone. In this paper, estimate of the 3-D motion of the integrated system and the 3-D reconstruction of scene features are addressed. We do not require establishing matches between optical and sonar features, referred to as opti-acoustic correspondences, but rather matches in either the sonar or optical motion sequences. In addition to improving the motion estimation accuracy, advantages of the system comprise overcoming certain inherent ambiguities of monocular vision, e.g., the scale-factor ambiguity, and dual interpretation of planar scenes. We discuss how the proposed solution provides an effective strategy to address the rather complex opti-acoustic stereo matching problem. Experiment with real data demonstrate our technical contribution.