[Bug] DependencyCRF.log_prob returns a positive value when the input arc score is large. #61

wangxinyu0922 · 2020-03-22T12:14:45Z

Hi, I found that if I input log_potential into the DependencyCRF with some large values, the marginal distribution will be slightly over 1, and this also results in the log_prob returns a very large positive value in training.
My code is somthing like this:

dist = DependencyCRF(arc_scores, lengths=mask.sum(-1))
labels = dist.struct.to_parts(arcs[:,1:], lengths=mask.sum(-1)).type_as(arc_scores)
log_prob = dist.log_prob(labels)

I found my arc_scores contains a large value like 807687.0625

This will result in the marginal contains values like 1.002

The final log_prob will return a very large positive value of 750633.4375.

I think this problem is something like floating-point precision and can you suggest something to me?

The text was updated successfully, but these errors were encountered:

srush · 2020-03-22T18:36:16Z

Hmm, this is worrisome, we are pretty careful about precision but it is possible we are losing some bits somewhere. I will try to replicate. This is on CUDA right? just for sanity would you mind printing dist.partition.

wangxinyu0922 · 2020-03-23T02:13:52Z

Hmm, this is worrisome, we are pretty careful about precision but it is possible we are losing some bits somewhere. I will try to replicate. This is on CUDA right? just for sanity would you mind printing dist.partition.

dist.partition and the gold tree score:

srush · 2020-03-23T02:35:21Z

That is strange. If possible, do you think you could send me a minimal example, perhaps as a colab? Is it plausible that your gold tree is invalid somehow (either non-projective, or with multiple heads?). I am not sure why it would be so different than the partition.

wangxinyu0922 · 2020-03-23T02:55:03Z

My dataset contains a tiny proportion of non-projective trees. If this issue results in the wrong results, I think I should try to convert these trees into projective ones.

srush · 2020-03-23T03:02:04Z

I see, so I think there are two things happening here :

minor: the marginal is having a slight underflow,
major: the score is very off because of non-projectivity.

I would recommend just dropping the non-projective trees, they are not correctly scored by the model and I don't check. If you want to add a check, then I would happy for a PR (it's a non-trivial check to write fast, you need check all arc-pairs).

wangxinyu0922 · 2020-03-23T03:14:37Z

I see, so I think there are two things happening here :
1. minor: the marginal is having a slight underflow,

2. major: the score is very off because of non-projectivity.
I would recommend just dropping the non-projective trees, they are not correctly scored by the model and I don't check. If you want to add a check, then I would happy for a PR (it's a non-trivial check to write fast, you need check all arc-pairs).

Thanks for your suggestions. I may try some non-projective trees in the dataset for the model. By the way, can I use NonProjectiveDependencyCRF instead of DependencyCRF? The document said that argmax has not been implemented. Is there any simple way to get the predictions?

srush · 2020-03-23T04:44:19Z

I wouldn't recommend switching between the two. NonProj will be faster and more general, but less accurate in cases where the languages 99% projective. Unfortunately there is currently no easy way to get an argmax tree! You can try taking the max for each arc, but that will not guarantee it is a tree.

If you are interested in implementing the argmax, the algorithm is the Chiu-Liu-Edmonds algorithm. It shouldn't be too hard to implement, it's just a bit different so I didn't include it.

wangxinyu0922 · 2020-03-23T04:52:54Z

Now I'm trying to reproduce the problem with the following code:

arc_scores[:,:,:]=500000
arc_scores[:,:,:25]=-500000
dist = DependencyCRF(arc_scores, lengths=mask.sum(-1))
labels = dist.struct.to_parts(arcs[:,1:], lengths=mask.sum(-1)).type_as(arc_scores)
log_prob = dist.log_prob(labels)

Such way will make the dist.marginals becomes larger than 1, regardless of whether it is a projective tree.

But the log_prob is a negative value. I'm still trying to make the log_prob become a positive one.

wangxinyu0922 · 2020-03-23T06:27:38Z

Well, now I'm trying cases for NonProjectiveDependencyCRF . The code is like this:

arc_scores[:,:,:]=0
dist = NonProjectiveDependencyCRF(arc_scores, lengths=mask.sum(-1))
labels = dist.struct.to_parts(arcs[:,1:], lengths=mask.sum(-1)).type_as(arc_scores)
log_prob = dist.log_prob(labels)

When the sentence length is long (I tried 50 here), the output log_prob is -inf and the dist.partition is inf. When sentence length is shorter (I tried one with 9 words), the model returns a correct value. Do you have any ideas?

srush · 2020-03-23T12:42:57Z

No ideas yet, but I will look into both issues.

kmkurn · 2020-03-24T07:54:03Z

Hi, sorry for jumping in, but the issue with NonProjectiveDependencyCRF is I believe the line below:

pytorch-struct/torch_struct/deptree.py

Line 196 in 9f5a0f5

return lap.det().log()

It should've used lap.logdet() to avoid numerical error for large matrices.

Slightly off topic, but NonProjectiveDependencyCRF doesn't seem to handle variable-length sequences. I can't see where the length argument is used, or am I missing something?

wangxinyu0922 · 2020-03-24T09:35:23Z

After setting a breakpoint to find out when the problem occurs in my training code, I find that probably there is something wrong with the length argument in DependencyCRF including the problem of the underflow. The code is:

temp_arcs=torch.load('temp_arcs.pt')
temp_mask=torch.load('temp_mask.pt')
# (batch, sent_len+1, sent_len+1), shape: (dependency, head)
# -1e-9 is a masked value in my code
temp_scores=torch.load('temp_scores.pt')
# pdb.set_trace()
# basic
dist=generate_tree(temp_scores,temp_mask,is_mst=False)
labels = dist.struct.to_parts(temp_arcs, lengths=temp_mask.sum(-1)).type_as(temp_scores)
log_prob = dist.log_prob(labels)
# tensor([3.0518e-05], device='cuda:0', grad_fn=<SubBackward0>)
print(log_prob)

# masking the dependency part
temp_scores2=temp_scores*temp_mask.unsqueeze(-1)
dist=generate_tree(temp_scores2,temp_mask,is_mst=False)
labels = dist.struct.to_parts(temp_arcs, lengths=temp_mask.sum(-1)).type_as(temp_scores2)
log_prob = dist.log_prob(labels)
# tensor([3.0518e-05], device='cuda:0', grad_fn=<SubBackward0>)
print(log_prob)

# masking the head part
temp_scores3=temp_scores*temp_mask.unsqueeze(-2)
dist=generate_tree(temp_scores3,temp_mask,is_mst=False)
labels = dist.struct.to_parts(temp_arcs, lengths=temp_mask.sum(-1)).type_as(temp_scores3)
log_prob = dist.log_prob(labels)
# tensor([-0.0044], device='cuda:0', grad_fn=<SubBackward0>)
print(log_prob)

I have uploaded these files on the OneDrive, I hope this will help us to solve the problem quickly.

srush · 2020-03-24T13:25:51Z

@kmkurn ah fantastic, I changed that line and made another issue about variable length.

srush · 2020-03-24T13:40:31Z

@wangxinyu0922 thanks for the repro, I will see what I can do to fix.

srush · 2020-03-24T18:00:49Z

@wangxinyu0922 I think your code had a bug in it.

Your mask is zeroing out the potentials for the 0th head word.

wangxinyu0922 · 2020-03-25T01:18:23Z

@wangxinyu0922 I think your code had a bug in it.

Your mask is zeroing out the potentials for the 0th head word.

In my code, there is an redundant '' word at position 0, and the temp_score is a shape of (dependency, head). Therefore masking 0th word is correct. To fit the requirements of DependencyCRF, the score of 0th head word should be on the diagonal of the log_potential matrix without the '' and the matrix should be the shape of (head, dependency), which is processed in the process_potential function. So this should not be the problem.

Why the third example (which masks the score of '' to 0 for all words/setting the diagonal of log_potential to be 0) gets a correct negative log_prob while others do not? Do I missed something else?

srush · 2020-03-25T01:47:43Z

Sorry I'm still having trouble understanding. I'm really trying to figure it out, but I need a more clear example.

The 0 word is a real word in the model, if you change it's scores it will impact the model partition. If you want to change the root you need to change the diagonal.
Making a log potential 0 does not remove it from the model. If you want to mask out a decision completely you need to make it -inf.

wangxinyu0922 · 2020-03-25T03:12:29Z

@srush

When taking your suggestion that mask out the log potential with -inf, I find that the log_prob becomes nan and v in the dist.log_prob function in distribution.py makes the calculation of log_prob to be a wrong score.

Then I set the masked value to -1e9 again. I check the function score in helper.py and find that there is some presicion problem of the function torch.mul:

    def score(self, potentials, parts, batch_dims=[0]):
        score = torch.mul(potentials, parts)
        batch = tuple((score.shape[b] for b in batch_dims))
        return self.semiring.prod(score.view(batch + (-1,)))

Here is an example, my potentials is:

The target label is:

The corresponding partition is:

I use score1 = torch.mul(dist1.log_potentials, target_labels) and get:

Things becomes strange when I try score1.sum(), which is 0.001 larger than the partition.

I try to sum over some parts of the matrix and find something strange. But I cannot explain why this happened. Do you have any suggestion?

srush · 2020-03-25T12:10:44Z

I still do not understand exactly where you think the problem is. Do you think the problem is with the score function? We could try use torch.where instead of torch.mul and see if the value is different.

If you think the partition calculation is the issue, can you please send me a working minimal example where the partition is less than the value of a single tree?

wangxinyu0922 · 2020-03-25T14:41:27Z

debug.zip
I tried to use torch.where instead of torch.mul and the values are the same. Here is an example that the partition is less than the value of a single tree

srush · 2020-03-30T01:09:04Z

Thanks this is a good repro case. I see the issue, but I do not yet have a good fix. Can't tell if it is an inherent issue in the algorithm, or a bug somewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] DependencyCRF.log_prob returns a positive value when the input arc score is large. #61

[Bug] DependencyCRF.log_prob returns a positive value when the input arc score is large. #61

wangxinyu0922 commented Mar 22, 2020 •

edited

Loading

srush commented Mar 22, 2020

wangxinyu0922 commented Mar 23, 2020 •

edited

Loading

srush commented Mar 23, 2020

wangxinyu0922 commented Mar 23, 2020 •

edited

Loading

srush commented Mar 23, 2020

wangxinyu0922 commented Mar 23, 2020

srush commented Mar 23, 2020

wangxinyu0922 commented Mar 23, 2020 •

edited

Loading

wangxinyu0922 commented Mar 23, 2020

srush commented Mar 23, 2020

kmkurn commented Mar 24, 2020 •

edited

Loading

wangxinyu0922 commented Mar 24, 2020

srush commented Mar 24, 2020 •

edited

Loading

srush commented Mar 24, 2020

srush commented Mar 24, 2020

wangxinyu0922 commented Mar 25, 2020 •

edited

Loading

srush commented Mar 25, 2020

wangxinyu0922 commented Mar 25, 2020 •

edited

Loading

srush commented Mar 25, 2020

wangxinyu0922 commented Mar 25, 2020

srush commented Mar 30, 2020

[Bug] DependencyCRF.log_prob returns a positive value when the input arc score is large. #61

[Bug] DependencyCRF.log_prob returns a positive value when the input arc score is large. #61

Comments

wangxinyu0922 commented Mar 22, 2020 • edited Loading

srush commented Mar 22, 2020

wangxinyu0922 commented Mar 23, 2020 • edited Loading

srush commented Mar 23, 2020

wangxinyu0922 commented Mar 23, 2020 • edited Loading

srush commented Mar 23, 2020

wangxinyu0922 commented Mar 23, 2020

srush commented Mar 23, 2020

wangxinyu0922 commented Mar 23, 2020 • edited Loading

wangxinyu0922 commented Mar 23, 2020

srush commented Mar 23, 2020

kmkurn commented Mar 24, 2020 • edited Loading

wangxinyu0922 commented Mar 24, 2020

srush commented Mar 24, 2020 • edited Loading

srush commented Mar 24, 2020

srush commented Mar 24, 2020

wangxinyu0922 commented Mar 25, 2020 • edited Loading

srush commented Mar 25, 2020

wangxinyu0922 commented Mar 25, 2020 • edited Loading

srush commented Mar 25, 2020

wangxinyu0922 commented Mar 25, 2020

srush commented Mar 30, 2020

wangxinyu0922 commented Mar 22, 2020 •

edited

Loading

wangxinyu0922 commented Mar 23, 2020 •

edited

Loading

wangxinyu0922 commented Mar 23, 2020 •

edited

Loading

wangxinyu0922 commented Mar 23, 2020 •

edited

Loading

kmkurn commented Mar 24, 2020 •

edited

Loading

srush commented Mar 24, 2020 •

edited

Loading

wangxinyu0922 commented Mar 25, 2020 •

edited

Loading

wangxinyu0922 commented Mar 25, 2020 •

edited

Loading