Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bounding_box format #34

Open
carterbox opened this issue Jan 9, 2017 · 16 comments
Open

bounding_box format #34

carterbox opened this issue Jan 9, 2017 · 16 comments

Comments

@carterbox
Copy link
Contributor

carterbox commented Jan 9, 2017

Why is bounding box stored as a 2-tuple of vertical arrays?

For the purpose of applying transformations to the bounding box (#32), I think it would be better to store the bounding box as a single 2xN array where the min corner and max corner are their own rows.

@slivingston
Copy link
Member

slivingston commented Jan 9, 2017

During initial work on the polytope package, the API was intended to follow that of MPT (the Multi-Parametric Toolbox for Matlab). In that Matlab toolbox, the bounding_box method of the Polytope class stores the bounding box as a matrix of size N \times 2.

@slivingston
Copy link
Member

That said, I agree with changing the return type as you propose. I think that the main disadvantage is an extra transpose (or otherwise, change to the shape of) to the corner vertices. However, that cost is small compared to benefits of using transformations in your case.

@carterbox
Copy link
Contributor Author

The question then is whether making the proposed change is going to disrupt anyone's use of polytope. Personally, I don't make any calls to bounding_box from outside polytope, but bounding_box is a public member of the API. Changing what it returns would be a major version increase.

@johnyf
Copy link
Member

johnyf commented Jan 11, 2017

The method bounding_box of the Region (or Polytope) class is being used in tulip, in the following modules of the subpackage tulip.abstract:

In the past, in Matlab code (e.g., numerical_utils and plot_utils), I used to store each set of points as a matrix where each column is a point (or vector). This allowed for vectorized manipulation. The points in such a matrix can be rotated by multiplying from the left with a rotation matrix. Also, vertical arrays are (I think) more common in the literature for representing vectors and point coordinates. The current 2-tuple is of vertical arrays.

The proposed change is to return a 2 x N (numpy ?) array, where each row is a point. What about an N x 2 numpy array?

@carterbox
Copy link
Contributor Author

OK, an Nx2 ndarray seems to be a better choice because of precedence in the literature for vectors to be represented as column vectors, better compatibility with existing code in tulip, and trivial work for me to change what I've already written.

I could switch over all the internal calls to bounding_box for polytope v2.0.0?

@johnyf
Copy link
Member

johnyf commented Jan 11, 2017

Regarding version management, I increment the minor version number for incompatible API changes. The version up to a037b55 has been below 1.0.0, so this practice happens to be compatible with semantic versioning. I do not follow semantic versioning strictly, I agree with several of the principles it describes.

If applied after version 0.1.4 is released, this change would result in an increment to version 0.2.0. I think that 1.0.0 should not be used before the API is overhauled, testing coverage increased to more than 80%, documentation written, and some algorithms reimplemented.

@slivingston
Copy link
Member

I agree with the comment above about requirements before v1.0.0. So, likely the next version (pending this issue) will be 0.2.0.

regarding #34 (comment), @carterbox have you already modified bounding_box to return N \times 2 arrays, or are you proposing that someone else makes the change (to master or dev branch) and then you rebase your work on it?

@carterbox
Copy link
Contributor Author

Yeah, brain fart this morning, I forgot that 0.x.y is pre-release and APIs are considered unstable.

I have not already modified bounding_box or all of the other calls to it. There isn't much difference between the two implementations in terms of length. It just means transforming the limits separately instead of at the same time.

I'm offering to make all of the changes, but I am uncertain whether it would make any significant impact on performance.

@slivingston
Copy link
Member

To be sure that I understand where we are on this issue:

  1. we have agreement that changing to arrays of shape N \times 2 might improve performance.
  2. the change would involve an API break, so the next version should be at least 0.2.0 if this change is applied.
  3. therefore, it remains to demonstrate the hypothesized performance improvement, or to show that code quality is improved.

Here, code quality would mean easier to check and understand matrix computations vs. tuples of vectors.

@johnyf
Copy link
Member

johnyf commented Jan 18, 2017

I agree, with the note that the next version as of 4d541ae will be polytope == 0.1.4, so that the bug fixes become available to tulip == 1.3.0. As implied by a remark above, the API change introduced in this issue will require releasing tulip == 1.3.1.

@carterbox
Copy link
Contributor Author

In the discussion above, we decided that for multiple vectors in one matrix, column vectors are the preference. However, for a lonesome vector, e.g. chebXc, is it preferable to have two dimensions when only one is needed shape (N,) vs shape (N,1)? Because numpy doesn't treat empty dimensions as singleton dimensions, this these two choices do not behave the same.

The problem is that, there is an inconsistent definition of vector type things. Travis tests failing for #39 and the changes at 770a408 are related to this topic.

@slivingston
Copy link
Member

For the question of the opening post and with some relevance to the question about lonesome vectors, I am also in support of using matrices of shape (2, N).

For lonesome vectors, I do not think that we can give a general rule because vectors can be expressed both in the type numpy.ndarray and the type numpy.matrix. It might be worth reviewing the polytope code and ensuring that it uniformly uses numpy.matrix and column vectors (so, having shape (N,1)).

@johnyf
Copy link
Member

johnyf commented Jan 29, 2017

In a bounding box matrix of shape (2, N), the coordinates of each corner form a row. In the literature, we usually transform points represented as matrices of shape (N, 1) by multiplying from the left with a suitable N x N matrix. The shape (2, N) would require multiplying from the right with the transpose of the same matrix. Also, if column matrices are already used for vectors elsewhere in polytope, then the two point representations won't match. The existing interface of bounding_box represents points as vertical matrices.

@carterbox
Copy link
Contributor Author

carterbox commented Jan 29, 2017

I just did some reading about the differences between choosing numpy.ndarray vs numpy.matrix, and from what I read, it seems better that we consistently return numpy.ndarray in shape (N,?) or (N, ).

The reasons cited seem to be:

  1. Faster operations for numpy.ndarray than numpy.matrix for small array sizes.
  2. Scipy prefers numpy.ndarray.
  3. You can now use @ instead of numpy.dot for matrix multiplication with numpy.ndarray.

Additionally, everywhere in polytope we are already using numpy.ndarray.

@slivingston
Copy link
Member

re #34 (comment), I do not think that precedence in the literature is a sufficient argument because in a practical implementation, we must also be concerned with issues of numerical computation, such as speed and precision. However, the fact that previously bounding_box used column matrices is indeed motivation to not change.

re #34 (comment), to be clear, I think that the type to which you refer is numpy.ndarray, not numpy.array. I did not find information about speed of operations in the reference that you (@carterbox) provided. In any case, I think that we can validate the argument by profiling polytope itself.

Another interesting consideration about performance is the configuration in which the arrays are stored. The default of numpy.ndarray is C order, also known as row-major, i.e., where the right-most index changes most quickly when traversing elements. Thus, we might guess that it is better to use numpy.ndarray of shape (N,) for vectors.

@slivingston
Copy link
Member

To be clear, I did not intend to entirely dismiss the argument about following the notation in the literature. Indeed, following it makes understanding easier.

johnyf added a commit that referenced this issue May 6, 2017
Previously, the implementation used right multiplication for
rotation matrices. This is because the source material used
this convention. However, we decided in #34 to use column
vectors and left multiplication in `polytope`, so this commit
converts the internal use of matrix multiplication to reflect
this preference.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants