PA1_template.Rmd

# Reproducible Research: Peer Assessment 1


## Loading and preprocessing the data

The following code reads in the data, formats the date column, creates a weekday column and presents summary statistics:
```{r}
data <- read.csv(file="activity.csv", stringsAsFactors=F)

data$date <- as.Date(data$date, format='%Y-%m-%d')

data$week <- weekdays(data$date)

str(data)

library(pastecs)

options(scipen=100)
options(digits=2)
summ <- stat.desc(data[,-c(2,4)])
summ
```


## What is mean total number of steps taken per day?

We first compute the total steps for each day, aggregating the data for the different intervals within each day. 
```{r}
totalstepsday <- aggregate(data$steps,by=list(data$date),function(x) sum(x,na.rm=T))
```

Then we present the boxplot and the histogram of the total steps for each day.

```{r histtotalsteps, fig.width=7, fig.height=5}

nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE),  height = c(1,1.5))
    par(mar=c(3, 3, .2, .2))
boxplot(totalstepsday$x, horizontal=TRUE,  outline=TRUE,ylim=c(0,26000),col="lightblue",type=3)
hist(totalstepsday$x,nclass=20,xlab="",ylab="Frequency",col="lightblue",main="",xlim=c(0,26000))
```

The mean is __`r mean(totalstepsday$x,na.rm=TRUE)`__ and the median is __`r median(totalstepsday$x,na.rm=TRUE)`__.


## What is the average daily activity pattern?

We compute the average number of steps for each interval, across days. 

```{r}
totalinterval <- aggregate(data$steps,by=list(data$interval),function(x) mean(x,na.rm=T))
```

The plot presents a time series of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis).

```{r timeseries, fig.width=7, fig.height=5}
plot(y=totalinterval$x,x=totalinterval$Group.1,xlab="Interval",ylab="Average number of steps",type="l")
```

```{r}
activeinterval <- totalinterval[which(totalinterval$x==max(totalinterval$x)),1]
```

The 5-minute interval, on average across all the days in the dataset, that contains the maximum number of steps is __`r activeinterval`__.


## Imputing missing values

```{r}
miss <- length(which(is.na(data$steps)))
```

The dataset has __`r dim(data)[1]`__ observations and __`r miss`__ missing observations.

The missing information for the intervals will be replaced with the median number of steps for the interval across all days.


```{r}
imputat <- aggregate(data$steps,by=list(data$interval),function(x) median(x,na.rm=T))

datacomplete <- data[!is.na(data$steps),]
datamissing <- data[is.na(data$steps),]

dataimputed <- datamissing
dataimputed$steps <- imputat[match(datamissing$interval,imputat[,1]),2]

datacompleteimputed <- rbind(datacomplete,dataimputed)
```

We then compute the total steps for each day, aggregating the data for the different intervals within each day, now considering the dataset including the imputed values.

```{r}
totalstepsday <- aggregate(datacompleteimputed$steps,by=list(datacompleteimputed$date),function(x) sum(x))
```

Boxplot and the histogram of the total steps for each day.

```{r histtotalstepsimput, fig.width=7, fig.height=5}

nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE),  height = c(1,1.5))
    par(mar=c(3, 3, .2, .2))
boxplot(totalstepsday$x, horizontal=TRUE,  outline=TRUE,ylim=c(0,26000),col="lightblue",type=3)
hist(totalstepsday$x,nclass=20,xlab="",ylab="Frequency",col="lightblue",main="",xlim=c(0,26000))
```

The mean is __`r mean(totalstepsday$x,na.rm=TRUE)`__ and the median is __`r median(totalstepsday$x,na.rm=TRUE)`__. The mean is now bigger than when excluding the missing values, but the median is the same.


## Are there differences in activity patterns between weekdays and weekends?

We create e variable indicating "Weekend" or "Weekday":


```{r}

datacompleteimputed$weekend <- ifelse(datacompleteimputed$week %in% c("Saturday","Sunday"),"Weekend","Weekday")

```


We compute the average number of steps for each interval, across weekend days. 

```{r}
totalintervalweekend <- aggregate(datacompleteimputed$steps[datacompleteimputed$weekend=="Weekend"],by=list(datacompleteimputed$interval[datacompleteimputed$weekend=="Weekend"]),function(x) mean(x))
```


We compute the average number of steps for each interval, across weekday days. 

```{r}
totalintervalweekday <- aggregate(datacompleteimputed$steps[datacompleteimputed$weekend=="Weekday"],by=list(datacompleteimputed$interval[datacompleteimputed$weekend=="Weekday"]),function(x) mean(x))
```


The plot presents the time series of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis), separately for weekends and weekdays.

```{r timeseries2, fig.width=7, fig.height=5}

tmp <- max(c(totalintervalweekend$x,totalintervalweekday$x))+1
nf <- layout(mat = matrix(c(1,2),2,1, byrow=TRUE))
    par(mar=c(3, 3, 3, 3))

plot(y=totalintervalweekend$x,x=totalintervalweekend$Group.1,xlab="Interval",ylab="Average number of steps",type="l",main="Weekends",ylim=c(0,tmp))

plot(y=totalintervalweekday$x,x=totalintervalweekday$Group.1,xlab="Interval",ylab="Average number of steps",type="l",main="Weekdays",ylim=c(0,tmp))

```