Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlling size for nested data structures? #556

Open
mikera opened this issue Apr 11, 2024 · 5 comments
Open

Controlling size for nested data structures? #556

mikera opened this issue Apr 11, 2024 · 5 comments

Comments

@mikera
Copy link

mikera commented Apr 11, 2024

Love the library! It is proving very useful. Quick question....

SITUATION: I'm writing some custom generators for data structures that potentially need to generate multiple nested structures of the same type.

PROBLEM: There is a certain threshold where the sizes of the generated data structures seem to explode exponentially, probably because the expected number of children increases at each level (which in turn have a larger expected number of children themselves...)

QUESTION: Is there a good way to control the sizes of nested structures? Ideally if I am generating a data structure of size n with m children, it seems like it would be sensible for each child to be generated with size n/m but it's not quite clear how to do this?

@pholser
Copy link
Owner

pholser commented Apr 11, 2024

@mikera Thanks for checking out junit-quickcheck. I don't plan to develop on it any further -- you may want to investigate Jqwik by @jqwik-team.

That said, I'd be interested to check out your custom generators. Can you post some code?

@mikera
Copy link
Author

mikera commented Apr 12, 2024

Thanks for the reference, I will take a look!

Here's an example of a fairly minimal generator that blows up (stack overflow) while creating a proportion of nested instances of the same type (ArrayLists of ArrayLists and Longs):

import java.util.ArrayList;

import com.pholser.junit.quickcheck.generator.GenerationStatus;
import com.pholser.junit.quickcheck.generator.Generator;
import com.pholser.junit.quickcheck.random.SourceOfRandomness;

/**
 * Generator for arbitrary trees of Longs
 */
public class BadListGen extends Generator<Object> {
	public BadListGen() {
		super(Object.class);
	}

	@Override
	public Object generate(SourceOfRandomness r, GenerationStatus status) {
		int size=status.size()+1;
		int type = r.nextInt(10);
		
		switch (type) {
			case 0: 
				int n = r.nextInt(size);
				ArrayList<Object> al=new ArrayList<>();
				for (int i=0; i<n; i++) {
					al.add(generate(r,status));
				}
				return al;
				
			default: 
				Long l=r.nextLong(-size, size);
				return l;
		}
	}
}

Ideally, I think the size in the nested call to generate needs to be reduced to something like size/n on each iteration? The intention would be that the size controls the overall number of elements in the tree, rather than the size of each child

@pholser
Copy link
Owner

pholser commented Apr 12, 2024

@mikera Ah yes, I see what you mean. Sadly, I didn't make accessible GenerationStatus derivatives that would allow you to futz with size, etc. Your best bet may be to make a GenerationStatus subclass where you can control/modulate size in the manner you wish, and you can feed to your recursive calls. There is an internal-but-maybe-still-accessible AbstractGenerationStatus class you can use to get started, if you don't want to implement the interface directly.

@pholser
Copy link
Owner

pholser commented Apr 14, 2024

@mikera What do you think about something like this?

public class BadListGen extends Generator<Object> {
    public BadListGen() {
        super(Object.class);
    }

    @Override
    public Object generate(SourceOfRandomness r, GenerationStatus status) {
        List<Object> root = new ArrayList<>();
        for (int i = 0; i < status.size(); ++i) {
            root.add(_generate(r, status.size(), 1));
        }
        return root;
    }

    private Object _generate(SourceOfRandomness r, int size, int level) {
        int type = r.nextInt(10);

        if (type == 0) {
            int numberOfChildren = size == 0 ? 0 : r.nextInt(size);
            List<Object> children = new ArrayList<>();
            for (int i = 0; i < numberOfChildren; i++) {
                children.add(_generate(r, size / (level + 1), level + 1));
            }
            return children;
        }

        return r.nextLong(-size, size);
    }
}

@mikera
Copy link
Author

mikera commented Apr 15, 2024

Thanks @pholser for taking a look - your suggestion works well in simple cases

Only slight challenge is if the generator is parameterised and needs to generate something (e.g. leaf nodes) with a generic Generator that doesn't have a _generate and therefore needs a GenerationStatus. Perhaps I can play around with some clever subclassing to solve this :-)

Perhaps an interesting research topic for someone would be exploring how test data generators can be structured to provide provable O(size) construction time and space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants