Each retina has a two-dimensional image of the world. ‘The third dimension’ explains how distance is perceived using binocular stereopsis, which deals with retinal disparity, and monocular cues. Monocular cues come from the pictorial content of the image and the motion of parts of the image when the head and body move. The most common pictorial clues are relative size, perspective, texture, elevation, and occlusion. The velocity flow field is the pattern of retinal motion that results from one's own locomotion and is a powerful cue to distance. How do all of these cues come together to give the convincing, whole image of the surroundings that we experience?