Monte-Carlo RT is extended concept of Classical ray tracing ( Whitted RT ). And in a few minutes you will know what I mean when I say that.
Classical ray tracing takes one ray, at every intersection of a ray with the scene, a single ray is spawned in three most important ( most significant ) & exact directions: towards the lights, mirror reflections & perfect refractions. This does not handle glossy or diffuse inter-reflections.
Monte-Carlo ray tracing has two main types.
Path tracing takes the approach of having a single ray reflect or refract through the scene, changing direction at each surface intersection. This may sound exactly similar to classical ray tracing, but there is an important difference- Monte-Carlo ray tracing picks us a random direction for each reflection or refraction ray. The direction is obviously not completely random, but is influenced by the BRDF. The BRDF, in some sense, acts like a probability density function. And the BRDFs can be arbitrary. For example, for a glossy surface with a narrow Phong reflection lobe, reflection rays would be much more likely to be spawned near the center of the lobe than anywhere else. Weighing the random direction in this way is called as importance sampling.
Distribution ray tracing spawns many random rays from every surface intersection.
Why distribution ray tracing would give us more photo-realism?
Lets come back to the analogy of BRDF as probability distribution function. This analogy is natural since definition of BRDF essentially is how much (in terms of ratio) of input radiance will be reflected in every specific possible direction. This ratio naturally defines the probability of any given ray getting reflected in that specific direction. So, the point is, BRDF = probability distribution function. So, say given 100 input (i.e. incoming) rays, 40 will go in a preffered mirror reflection dirction and the rest 60 will be reflected in rest of the infinite hemispherical directions. Note that, if we assume that the underlying surface roughly does follow the estimated BRDF, then these rays going in all these directions in all these proportions is physically happening.
Now, if we just ignored 99 of these rays and just followed a single ray in a single specific direction ( which could be random if path tracing, and fixed if classical RT), then, we are clearly losing out on a lot of rays which are physically contributing to the resulting illumination story at that point.
Distribution Ray tracing gives required respect and consideration to these 99 rays. It turns out, that these extra rays help us get closer to the reality, closer to the real picture.
I like that! So why not just use it?
Because, in reality, 99 becomes infinite. And that will make our program finish its computations after lightyears. You might be, but I, personally, am not OK with this. So, how to do distribution ray tracing today?
As you might have guessed, we don't sample all the infinite rays. Instead, we choose some number of random directions. Since we choose, we would like them to be the important ones.
As a final note, I should state this: Monte Carlo ray tracing fully solves Kajiya's rendering equation, given enough time and rays.
Reference: Real-time Rendering, section 9.8.2 ( ray tracing )
Further Reading: