Suggestion: More details to treat NA values in Exploring Data Frames #816

cfdroste · 2023-01-10T11:12:50Z

Hi everyone,
I finished the instructor training and noticed that it might be helpful to explain NA values to the students, why they should try to find the source of NA values and remove all NA values for analysis but keep the records in a data set (https://github.com/swcarpentry/r-novice-gapminder/blob/main/_episodes_rmd/05-data-structures-part2.Rmd). I have experienced that several students only replaced the value with 0 in their study or removed a, besides the missing value, good record in a data set. Maybe English Native speaking students do not have this problem.

matthieu-bruneaux · 2023-02-14T23:58:07Z

Thank you for your input @cfdroste! This is a good point.

Maybe we could address that by adding a short .callout section ## Note about NAs when we first mention NA values in this episode. Something like:

Note about NAs

NA is a special value in R, meaning "not available". It is used to represent unknown values, and most R operations on vectors containing NAs (such as mean() or sd()) produce NAs too. It makes sense because the result of an operation on unknown values is typically unknown.

An NA value thus has a precise meaning, and should not be replaced by another value such as 0 simply for convenience as this would introduce erroneous information in your data.

And we could add another note after introducing na.omit() in the same episode:

Using na.omit() will remove an entire row even if it contains only a single NA value. This will probably remove valid records in other columns that you might want to use in other analyses, so it is always important to keep the original version of your dataset stored safely in a file that is kept unmodified.

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: More details to treat NA values in Exploring Data Frames #816

Suggestion: More details to treat NA values in Exploring Data Frames #816

cfdroste commented Jan 10, 2023

matthieu-bruneaux commented Feb 14, 2023

Note about NAs

Suggestion: More details to treat NA values in Exploring Data Frames #816

Suggestion: More details to treat NA values in Exploring Data Frames #816

Comments

cfdroste commented Jan 10, 2023

matthieu-bruneaux commented Feb 14, 2023

Note about NAs