Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I use TDigest instead of QDigest? #47

Open
ahmadpriatama opened this issue Apr 22, 2015 · 2 comments
Open

Can I use TDigest instead of QDigest? #47

ahmadpriatama opened this issue Apr 22, 2015 · 2 comments

Comments

@ahmadpriatama
Copy link

I'm calculating quantile like described in Liveramp blog post

but somehow, running it on production server output

Caused by: java.lang.IllegalArgumentException: Can only accept values in the range 0..4611686018427387903, got 9223372036854775807
    at com.clearspring.analytics.stream.quantile.QDigest.offer(QDigest.java:125)
    at com.liveramp.cascading_ext.combiner.lib.QuantileExactAggregator.partialAggregate(QuantileExactAggregator.java:38)
    at com.liveramp.cascading_ext.combiner.lib.QuantileExactAggregator.partialAggregate(QuantileExactAggregator.java:17)
    at com.liveramp.cascading_ext.combiner.CombinerFunctionContext.combineAndEvict(CombinerFunctionContext.java:130)
    at com.liveramp.cascading_ext.combiner.CombinerFunction.operate(CombinerFunction.java:130)
    at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:99)
    ... 11 more

and tdunning said that i should use TDigest instead of QDigest, but cacasding_ext depend on stream_lib version which not including TDigest. Any idea so i can use TDigest?
I updated the dependencies version of stream lib to the latest version which include TDigest, but apparently cascading_ext have no ExactAggregator that support TDigest (QDigest use QuantileExactAggregator). What should I do?

@pwestling
Copy link
Contributor

The fastest way to get up and running using TDigest is going to be implementing your own ExactAggregator - you can use QuantileExactAggregator as a guide, and I don't think you'll have too much trouble with it. Once you have the Aggregator, you can pass it to a Combiner the same way as QuantileExactAggregator and you should get the TDigest object you want at the end. If you have any specific issues doing that let us know and we can help.

TDigest seems pretty interesting - my guess is that @matthagy will want to have a built in aggregator for it at some point. I think we were blocked internally on upgrading our version of stream_lib here, but maybe we can take a second look at that.

@bpodgursky
Copy link
Contributor

Yeah, we didn't upgrade the stream lib version because we internally have a lot of long-term persisted structs, and it's unclear whether some of the changes between 2.4 and master have caused broken serialization backwards-compatibility (would need to do more careful testing).

But that shouldn't block you from using the newest version of stream lib with cascading ext in your own project and implementing a new ExactAggregator like porter mentioned. If you do end up making one, we'd be happy to merge it in here once we manage to upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants