So it's made of two components: a normal-based detector and a depth-based detector. The normal-based detector finds edges within objects and the depth-based detector finds edges between overlapping objects, let's say we have a wall and a box in front of it, if the wall and the box are both facing the same direction then they have the same normal value so the normal detector won't find an edge between them so that's why you need the depth-based detector.

So imagine a step change in the pixel depth values. The depth values of the wall face are high high high high high, and then when you come to the box in front of the wall, the depth values suddenly become low low low low low. And so there's a negative spike in the first derivative, so the first derivative was zero, then it goes to negative infinity, then it goes back to zero. So the first derivative first goes down and then goes back up, so in the second derivative there is a zero-crossing (the second derivative is negative while the first derivative is going down, then becomes positive while the first derivative is going back up).

So that's why we look for zero-crossings in the second derivative - to look for spikes in the first derivative. We can't just look for high values in the first or second derivative because if you look at the ground the depth value goes to infinity as you near the horizon. So you have to look for spikes in the first derivative, which are strong zero-crossings in the 2nd derivative.

Note that the zero-crossing depth-based detector doesn't really work well within objects, because there often isn't a step change in the depth value. But that's where the normal-based detector comes in - looking for step changes in the normal detector shows us the edges within an object.