Using a stereo vision system, the authors show how the translational motion and scene structure can be recovered directly from image gradients and time derivatives. There is no need to estimate or establish correspondences between features across images. The direction of motion is recovered using a procedure which involves minimizing the sum of the squared error of a linear constraint equation over the entire image. The magnitude of the motion is estimated from the stereo disparity. The scene structure is recovered in the form of a depth map using the recovered motion, image gradients, and time derivatives. Experimental results using real images are presented.