-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mean and Standard Deviation is Different on Point Plot Log Scale #3661
Comments
Hi, yes the statistics are computed in log space when you have |
Thank you so much for the fast response. And I love the seaborn library! How precisely does this change the computation? Can you please point me to the file where this is done? I'm struggling to understand mathematically what is different when computing mean and std in log space. In particular, I am not sure why the mean would change. I am actually measuring the squared cost so all my data lies on |
Probably the best way to think about it is that you should get the same result as if you passed seaborn the log of your data and then modified the tick labels. Your error bars are symmetric around the mean because they are being drawn from |
Some want to first compute summary statistics and then transform them to the log scale. Others want to first transform data to the log scale and then compute summary statistics. Seaborn appears to do the latter.
In your example, If one were interested in the former, should they plot without the log scale parameter and afterwards manually set the axis to be logarithmic? Potentially relevant stack exchange post here. |
Lastly, it may be helpful for this to appear somewhere in the docs. It was quite tricky for me to understand and I may not be the only one. Perhaps on the tutorial page for statistical estimation and error bars here? I would consider making a pull request if you're interested. Need to confirm I have time for it though. |
Yes |
@mwaskom Just wanted to bump this in case you didn't see. If you're not interested, no worries! |
I could have sworn the docs already said that somewhere, maybe just in the |
I just updated from 0.12 to 0.13, and the import matplotlib.pyplot as plt
import seaborn as sns
plt.yscale("log") # mean in log domain
sns.boxplot(
x=[0, 0, 0, 0, 0],
y=[1, 2, 3, 4, 5],
showmeans=True,
)
# plt.yscale("log") # mean in linear domain
plt.show() Putting the When doing |
Problem Description: I have multiple measurements of some cost, with most values being quite small, but I have some enormous outlier: my mean is$1.2$ , my standard deviation is just under $10$ , and my median is $0.003$ .
When I plot the mean and std with a point-plot, my error bar correctly ranges from approximately$(-10, 10)$ with a mean of $1$ .
But when I use the log scale, the standard deviation and mean shift!
The mean is located near$10^{-2} = 0.1$ instead of $1$ . The standard deviation error bars range from $(10^{-3}, 10^{-1}) = (0.01, 1)$ instead of $(-10, 10)$ .
Question: Why is this? Are these statistics computed differently in log space? A big red flag is that the standard deviation error bars are symmetric in log space. Another red flag is that the error bars no longer go past zero when they absolutely should.
Code: Here is the code I used to generate these two plots. The only difference is toggling the
log_scale
parameter between true and false.The text was updated successfully, but these errors were encountered: