You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone,
I finished the instructor training and noticed that it might be helpful to explain NA values to the students, why they should try to find the source of NA values and remove all NA values for analysis but keep the records in a data set (https://github.com/swcarpentry/r-novice-gapminder/blob/main/_episodes_rmd/05-data-structures-part2.Rmd). I have experienced that several students only replaced the value with 0 in their study or removed a, besides the missing value, good record in a data set. Maybe English Native speaking students do not have this problem.
The text was updated successfully, but these errors were encountered:
Thank you for your input @cfdroste! This is a good point.
Maybe we could address that by adding a short .callout section ## Note about NAs when we first mention NA values in this episode. Something like:
Note about NAs
NA is a special value in R, meaning "not available". It is used to represent unknown values, and most R operations on vectors containing NAs (such as mean() or sd()) produce NAs too. It makes sense because the result of an operation on unknown values is typically unknown.
An NA value thus has a precise meaning, and should not be replaced by another value such as 0 simply for convenience as this would introduce erroneous information in your data.
And we could add another note after introducing na.omit() in the same episode:
Using na.omit() will remove an entire row even if it contains only a single NA value. This will probably remove valid records in other columns that you might want to use in other analyses, so it is always important to keep the original version of your dataset stored safely in a file that is kept unmodified.
Hi everyone,
I finished the instructor training and noticed that it might be helpful to explain NA values to the students, why they should try to find the source of NA values and remove all NA values for analysis but keep the records in a data set (https://github.com/swcarpentry/r-novice-gapminder/blob/main/_episodes_rmd/05-data-structures-part2.Rmd). I have experienced that several students only replaced the value with 0 in their study or removed a, besides the missing value, good record in a data set. Maybe English Native speaking students do not have this problem.
The text was updated successfully, but these errors were encountered: