Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perspective projection transform #465

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

hannobraun
Copy link
Contributor

Please review this commit carefully, as I only partially know what I'm
doing. I believe that setting m44 to one for the perspective
projection matrix is wrong.

First, the sources I can find that explain this all set it to zero.
Granted, I can't find many sources online that explain this clearly, but
the few that I can find support my point (see Wikipedia[1] and this
university lecture[2], for example).

Second, I did some transformation on paper, to better understand this,
and while the approaches in the cited sources starting making sense to
me, I can't make head or tails of that one in the m44 position. It just
seems to get in the way.

So while I can't rule out that the way things currently are make total
sense for reasons I don't understand yet, as of right now, it seems more
like an oversight.

[1] https://en.wikipedia.org/wiki/Transformation_matrix#Perspective_projection
[2]
https://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect5.pdf,
page 23

Please review this commit carefully, as I only partially know what I'm
doing. I believe that setting `m44` to one for the perspective
projection matrix is wrong.

First, the sources I can find that explain this all set it to zero.
Granted, I can't find many sources online that explain this clearly, but
the few that I can find support my point (see Wikipedia[1] and this
university lecture[2], for example).

Second, I did some transformation on paper, to better understand this,
and while the approaches in the cited sources starting making sense to
me, I can't make head or tails of that one in the m44 position. It just
seems to get in the way.

So while I can't rule out that the way things currently are make total
sense for reasons I don't understand yet, as of right now, it seems more
like an oversight.

[1] https://en.wikipedia.org/wiki/Transformation_matrix#Perspective_projection
[2]
https://courses.cs.washington.edu/courses/cse455/09wi/Lects/lect5.pdf,
page 23
@nical
Copy link
Contributor

nical commented Aug 21, 2020

You had me scratch my head for a good hour and a half (and I learned stuff in the process).

Before we go into why, let's start with references (it at least gave me some confidence that we aren't doing something crazy)

Why then ?

It's interesting because your reasoning and your sources do make sense.

My understanding is that it is a question of which plane the perspective transform is projecting things on. In the case of euclid, the matrix projects points onto the plane defined by z = -d. In that wikipedia page, the transform is projecting to the z = 1 plane, same for that washington.edu course (see the "Homogenous coordinates: Geometric intuition" slide).

I think that it makes most sense for euclid to keep perspective as is and continue following the CSS specification. That said this method really need better documentation.
It wouldn't hurt to add other perspective transformations. I don't thing one that projects to z=1 would be particularly useful, but maybe equivalents to gluPerspective and gluLookAt that seem to be popular.

Anyway I'll close this now since I believe that we shouldn't change the behavior of perspective.

@nical nical closed this Aug 21, 2020
bors-servo added a commit that referenced this pull request Aug 21, 2020
Improve the documentation of Transform3D::perspective

There was a bit of confusion about the formulation of the perspective transform in #465. This clarifies things.
@hannobraun
Copy link
Contributor Author

Thank you for your reply! I hate to take up more of your time, but here I go.

Before we go into why, let's start with references (it at least gave me some confidence that we aren't doing something crazy)

Those are certainly some strong sources. If I had been aware of those at an earlier point, I probably wouldn't have even posted this pull request. However, I'm too deep into it by now to just yield to these, especially since I don't quite follow your explanation :-)

My understanding is that it is a question of which plane the perspective transform is projecting things on. In the case of euclid, the matrix projects points onto the plane defined by z = -d. In that wikipedia page, the transform is projecting to the z = 1 plane, same for that washington.edu course (see the "Homogenous coordinates: Geometric intuition" slide).

I think this isn't quite on point, for two reasons.

First, yes, the Wikipedia matrix projects to z = 1, but that is just because a specific value was chosen there. The Washington matrix actually does project into z = -d, as best as I can tell. I believe that the slide you cited is just an example that doesn't have a 1-1 relation to the matrices on the next slide.

Second, I don't believe that projecting to z = -d is what the current Euclid implementation is actually doing. Let me demonstrate.

First, let's take a closer look at the matrices. I'm going to ignore the Wikipedia matrix from here on, as it's basically equivalent to the Washington matrix (modulo the hardcoded distance and different viewing direction), and the Washington matrix is much closer to the Euclid one, hence being better suited to demonstrate the difference.

Here's the Washington matrix:

| 1 0   0  0 |
| 0 1   0  0 |
| 0 0   1  0 |
| 0 0 -1/d 0 |

And here's the Euclid matrix. Please note that the Euclid documentation formats matrices differently, based on the order of their fields in memory. I'm translating it into the format used by all of the sources (including yours) here, for clarity:

| 1 0   0  0 |
| 0 1   0  0 |
| 0 0   1  0 |
| 0 0 -1/d 1 |

The only difference between them is the m44 value in the lower-right corner.

I'm claiming that the Washington matrix projects points into the z = -d plane and that the Euclid matrix doesn't. To demonstrate that, I'm doing a simple test: If a point already is in the z = -d plane, projecting it into that plane should not change the point.

Let's try that for the Washington matrix with d = 1:

| 1 0  0 0 |   |  2 |   |  2 |
| 0 1  0 0 |   |  3 |   |  3 |
| 0 0  1 0 | x | -1 | = | -1 |
| 0 0 -1 0 |   |  1 |   |  1 |

We're getting the same point out that we put in. w is still 1, so when the GPU does the perspective division to transform the point into normalized device coordinates, nothing will change either. To the best of my understanding, this is how it should be.

Let's do the same with the Euclid matrix:

| 1 0  0 0 |   |  2 |   |  2 |
| 0 1  0 0 |   |  3 |   |  3 |
| 0 0  1 0 | x | -1 | = | -1 |
| 0 0 -1 1 |   |  1 |   |  2 |

The 1 in m44 changes the value of w to 2. This means the GPU will divide by 2 for the perspective division, resulting in the following normalized device coordinates:

|  1   |
|  1.5 |
| -0.5 |

Which are not the same coordinates that our original point had, even though we were projecting it into the plane it was already in.

Doing the same for other planes leads to similar results. I've done lots more calculations along the same lines, and the result is always the same: The Washington approach (with 0 at m44) makes sense, the Euclid approach (with 1 at m44) doesn't.

Maybe I'm missing something. Maybe the Euclid approach has some advantage that I'm not seeing. Maybe something else needs to happen before or after the perspective transformation to correct for this (I don't know what that might be, or why it might be an advantage).

Thoughts on your sources

If this was just a Euclid thing, I'd be fully convinced that this is buggy behavior that needs to change. However, that you can point to those sources that are doing it the same way, is a strong argument that there is something going on here that I don't understand.

I don't have much insight into browser and web standards development, so what I'm about to say might be preposterous. Still, let me float this possibility: Is it possible, that this is an error in the spec, and everyone just copied the same mistake from the spec and/or each other?

Please consider the following notes, that could make this theory less preposterous:

  • If this is a bug, it is not an app-breaking one. All it does, as far as I can tell, is making the viewing angle wider than it should be. If I had encountered this bug (if it is one) in a different context, I would not have thought to scrutinize the projection matrix. My reaction would have been more along the lines of, "Oh, this is what a 90° field of view looks like? Guess I'll adjust the value".
  • I don't understand the context of this CSS spec. Is the resulting transformation matrix passed to the GPU directly or at all? I'm passing my matrix directly to the GPU (which does the perspective division). Maybe the same doesn't happen in the browsers.

Again, this might be preposterous, as I don't know how much scrutiny goes into the development of these standards and their implementation.

Additional sources

While articles explaining this subject clearly (and without resorting to "call this perspective function to make the matrix") seem to be rare, all I have seen use a 0 in the m44 position (I didn't save the additional ones I found, unfortunately, but can probably dig some of them up again, if requested).

I figured it wouldn't hurt to dig up some more sources, preferably involving real code. Here's what I came up with:

I didn't omit any sources. I just looked up the math libraries I could come up with off the top of my head, and all of them set m44 to 0. This certainly isn't conclusive evidence, but it does support my "web people got it wrong" theory.

I'm sorry for being so persistent when it's so likely I'm just missing something, but I've become obsessed at this point. If the behavior in Euclid is correct, I absolutely need to know why :-)

@nical
Copy link
Contributor

nical commented Aug 23, 2020

I'm going to have to ponder on this for a bit, don't worry if you don't hear back for a time.
Changing this behavior is definitely app breaking for a browser that has to strictly abide to web specifications, but I'd like to at least better understand what this does and document it properly. And if this behavior only ever make sense for applications tied to CSS specifications, we can always add enough decorum to make it clear and add another function more useful to the wider audience.

@jrmuizel do you have any insight or know who was involved in this spec?

@nical nical reopened this Aug 23, 2020
@yihuang
Copy link

yihuang commented Aug 31, 2020

my two cents on this(after discussed with @lezzokafka):

  • the only effect of the m44 = 1 is accumulate the w component contained in the point.
  • if the point start with w=0, and the transformations before applying perspective don't modify the w component, then the m44 has no effect. so normally the matrix is correct, assuming the initial w component is 0 in web browsers.
  • and one potential use-case for m44 = 1 is that, one can set the w component in previous custom transformation matrix, and the w value will get accumulated(rather than ignored) by the perspective matrix.

@hannobraun
Copy link
Contributor Author

Thank you for your comment, @yihuang. I have some notes on what you wrote.

  • the only effect of the m44 = 1 is accumulate the w component contained in the point.
  • if the point start with w=0, and the transformations before applying perspective don't modify the w component, then the m44 has no effect. so normally the matrix is correct, assuming the initial w component is 0 in web browsers.

As far as I know, it is convention to always set w = 1 for points (i.e. position) and w = 0 for vectors (i.e. velocity, acceleration, ...). This makes sense mathematically, as setting w = 0 means the vector is unaffected by translation (which seems undesirable for points; at least I can think of no reason why you would want a point in space that can't be translated).

I think your comment, that the matrix is correct if w == 0, is true. But I wonder if that reflects a real use case, or if it is a purely theoretical observation. The standard also defines a translation matrix, so translation must play a role in browsers too.

  • and one potential use-case for m44 = 1 is that, one can set the w component in previous custom transformation matrix, and the w value will get accumulated(rather than ignored) by the perspective matrix.

Again, I think this is true, but I wonder if it is a real use case. Setting a (positive) w component to be accumulated would have the effect of moving that point closer to the center of the screen, due to the perspective division. You could create the same effect with a translation along the z axis.

Interesting points, definitely, but I'd be interested if anyone can point to examples that use the perspective matrix in such a way that makes m44 == 1 correct. Doing so seems very convoluted and needless to me, but maybe I am wrong.

@nical
Copy link
Contributor

nical commented Sep 2, 2020

"correct" here is mostly a matter of against what specification. What I'd like to figure out is a proper way to explain what this does (and why as a bonus point). It's too late for web specs to change in any case even in the eventuality that another formulation of perspective transforms would have been a better choice. We need in euclid a perspective method that follows the web specification.

If we can't reconcile the current implementation with something universally useful, we'll probably rename perspective into css_perspective and add another perspective method that has a more common set of parameters (fovy, aspect ratio, near and far planes).

@hannobraun
Copy link
Contributor Author

"correct" here is mostly a matter of against what specification. What I'd like to figure out is a proper way to explain what this does (and why as a bonus point).

In an earlier comment, you claimed that the perspective method creates a transformation that projects points into a plane defined by z = -d. Based on everything that I've learned since I first found that method, this would be the most reasonable expectation for what it should do, and therefore the most reasonable specification of what would be considered "correct" in this context.

Out of all the sources mentioned, the only counterpoint to this interpretation is the CSS Transforms spec and the browsers that follow it. So far, no-one has offered an explanation of what the intention behind this divergence is, nor any evidence that it isn't a simple mistake in the spec.

It's too late for web specs to change in any case even in the eventuality that another formulation of perspective transforms would have been a better choice. We need in euclid a perspective method that follows the web specification.

If we can't reconcile the current implementation with something universally useful, we'll probably rename perspective into css_perspective and add another perspective method that has a more common set of parameters (fovy, aspect ratio, near and far planes).

That the web spec can't be changed, even if this turns out to be a straight mistake, certainly paints a depressing picture, and makes me glad I'm not involved in this world :-)

But as far as Euclid is concerned, I'd be fine with that outcome. I'd like to offer one note though: If a buggy perspective matrix is required (if this turns out to indeed be a bug), it is quite easy to adjust a correct (as per the definition I proposed above) matrix as required (let p = Transform3D::perspective(d); p.m44 = 1.0;).

From my subjective perspective as a non-web user of Euclid, a library that mentions nowhere in its documentation that it is specific to the web or browsers, such a workaround at the use site would be slightly better than a weird method doing the wrong thing (if indeed it turns out to be wrong).

@nical
Copy link
Contributor

nical commented Sep 2, 2020

In an earlier comment, you claimed that the perspective method creates a transformation that projects points into a plane defined by z = -d.

Yeah, my bad, I was trying to piece this mess together and apparently took a wrong shortcut. I'm hopeful that we'll be able to better understand this stuff, though. It's a busy time at mozilla right now, there's only so much of it I can spend figuring this out. Euclid isn't meant to be web-specific but the people who write most of it do web things so it at least needs to meet that use case (no reason we can't add other things that are universally useful).

That the web spec can't be changed, even if this turns out to be a straight mistake, certainly paints a depressing picture, and makes me glad I'm not involved in this world :-)

Glad I can contribute to you knowing how lucky you are :)

I'd like to offer one note though: If a buggy perspective matrix is required

I shouldn't have nitpicked about the word "correct". But again, whatever the outcome of this investigation, best to think about it as a transform that looks like a perspective, quacks like a perspective, actually is a perspective, and is available for use if you need some compatibility with how things look in a browser or webview. The perspective isn't wrong or buggy per se (granted, the doc is). There are many perspective projections and this one, even if not the one you want to use has reasons to exist, if only because it is used by a whole ecosystem of apps by now (for better or worse and before we decide which one it is we have to figure out why).

@hannobraun
Copy link
Contributor Author

hannobraun commented Sep 2, 2020

Yeah, my bad, I was trying to piece this mess together and apparently took a wrong shortcut.

I didn't mean to call you out for that, and I'm sorry if it came across that way. I merely meant to reference that statement, to support my case for what a reasonable definition of correctness is.

I'm hopeful that we'll be able to better understand this stuff, though. It's a busy time at mozilla right now, there's only so much of it I can spend figuring this out. Euclid isn't meant to be web-specific but the people who write most of it do web things so it at least needs to meet that use case (no reason we can't add other things that are universally useful).

I totally understand, and there's no hurry from my side. I too hope we'll be able to clear this up, either by confirming my suspicions about the spec, or by figuring out where I'm wrong or what I'm missing.

Glad I can contribute to you knowing how lucky you are :)

I actually used to be a web developer many years ago. I grew frustrated by living on top of this huge behemoth (the browser) that I could not understand, much less control, and I decided to switch to embedded development. These days I struggle with buggy hardware that has user manuals with hundreds or thousands of pages I need to understand, but still manages to be under-documented :-)

Still, I'm happy about my decision. And these days, when I do web development, it's a lot more fun (not having decisions about browser support, design, and choice of technology dictated by a client does wonders in that regard).

I shouldn't have nitpicked about the word "correct". But again, whatever the outcome of this investigation, best to think about it as a transform that looks like a perspective, quacks like a perspective, actually is a perspective, and is available for use if you need some compatibility with how things look in a browser or webview. The perspective isn't wrong or buggy per se (granted, the doc is). There are many perspective projections and this one, even if not the one you want to use has reasons to exist, if only because it is used by a whole ecosystem of apps by now (for better or worse and before we decide which one it is we have to figure out why).

Yeah, as I said, I'd be fine with the outcome you described. I just wanted to offer my perspective, but I understand that life is full of trade-offs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants