Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: More details to treat NA values in Exploring Data Frames #816

Open
cfdroste opened this issue Jan 10, 2023 · 1 comment
Open

Comments

@cfdroste
Copy link

Hi everyone,
I finished the instructor training and noticed that it might be helpful to explain NA values to the students, why they should try to find the source of NA values and remove all NA values for analysis but keep the records in a data set (https://github.com/swcarpentry/r-novice-gapminder/blob/main/_episodes_rmd/05-data-structures-part2.Rmd). I have experienced that several students only replaced the value with 0 in their study or removed a, besides the missing value, good record in a data set. Maybe English Native speaking students do not have this problem.

@matthieu-bruneaux
Copy link
Contributor

Thank you for your input @cfdroste! This is a good point.

Maybe we could address that by adding a short .callout section ## Note about NAs when we first mention NA values in this episode. Something like:

Note about NAs

NA is a special value in R, meaning "not available". It is used to represent unknown values, and most R operations on vectors containing NAs (such as mean() or sd()) produce NAs too. It makes sense because the result of an operation on unknown values is typically unknown.

An NA value thus has a precise meaning, and should not be replaced by another value such as 0 simply for convenience as this would introduce erroneous information in your data.

And we could add another note after introducing na.omit() in the same episode:

Using na.omit() will remove an entire row even if it contains only a single NA value. This will probably remove valid records in other columns that you might want to use in other analyses, so it is always important to keep the original version of your dataset stored safely in a file that is kept unmodified.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants