GitHub - cmchenry/HumanActivityRecognition: This repository is for the Getting & Cleaning Data Course Project submission for Coursera - "getdata-016"

####Overview This repository provides R code for creating a tidy dataset from the "Human Activity Recognition Using Smartphones Dataset". The "Human Activity Recognition Using Smartphones Dataset" ("source data") is a publicly available dataset which can be obtained at the following website. For your convenience, the dataset is included in this repository in the "UCI HAR Dataset" folder along with the associated "Readme.txt" and "features_info.txt". This dataset is the result of an experiment conducted to capture accelerometer and gyroscope readings from Samsung Galaxy S II smartphones attached to 30 volunteer subjects performing 6 different activities. See References for more details on the source data[1].

The resulting tidy dataset, "tidy_wearable_ordered.txt", represents only the mean() and std() Features averaged by Activity and Subject. The tidy data set has the following characteristics:

180 observations including 6 Activities performed by 30 Subjects. Each observation is in it's own row.
The average of 66 mean() and std() Features extracted from a total of 561 total Features in the source data.
Each Feature is in it's own column with a human readable column name.

This dataset fully embodies the requirements of a tidy dataset:

Each Feature variable is in it's own column.
Each average of the 180 observations is in it's own row.
There is 1 table for the resulting Feature averages.
There was no need for multiple tables, since the data is all related to the same observations and has the same characteristics.
The table has a row at the top with the variable names.
The variable names are human readable.
The data is in a single file since there is only one table for the tidy dataset.

####List of Files

run_analysis.R
- Main R Script provided to execute the cleanup of the data and create a tidy dataset.
human-activity-recognition.Rproj
- An RStudio project file for the repository.
Readme.md
- This file, which provides an overview of the project, the contents of the repository, how to run the code and the transformations used to create the tidy dataset.
CodeBook.md
- A codebook detailing the structure of the variables in the resulting tidy dataset.
UCI HAR Dataset
- This directory contains the original source data, readme's and codebook.
tidy_wearable_ordered.txt
- The resulting tidy dataset created by running the analysis.

####Running the Code The following are the required steps to run the script:

Download this Github repository: https://github.com/cmchenry/HumanActivityRecognition.git
Install Install R 3.1.1 "Sock it to Me"
Install RStudio 0.98.1062
Open the "human-activity-recognition.Rproject"
Obtain the "plyr" package by running install.packages("plyr") at the R Console.
In the R Console, run source("run_analysis.R")
The output will be a file in the project directory called "tidy_wearable_ordered.txt".

####Tranformations a Create Tidy Dataset The following were the transformations of the source data used to create the resulting tidy dataset:

Loaded and combined source Subject data:
- subject_test.txt
- subject_train.txt
Loaded and combined the source Activity data:
- Y_test.txt
- Y_train.txt
Loaded and combined the source Feature data:
- X_test.txt
- X_train.txt
Loaded Feature names data and filtered out only the mean() and std() features. Also cleaned up feature names for human readable Feature column names. NOTE: It was determined that only the mean() and std() features were required. The meanFreq() features represent the frequency components to obtain the mean() and thus are not features of interest for this tidy dataset:
- features.txt
Using the Feature indexes in step #4, filtered the loaded Feature data created in step #3 to the required mean() and std() features. Also added human readable column names to the Features.
Loaded Activity Labels and joined them to the Activity data loaded in step #2. Assigned a human readable column name to the activity data:
- activity_labels.txt
Combined the columns of the transformed Subject, Activity and Features datasets to create the raw tidy dataset.
Summarized the dataset created in step #7, by calculating the average of each feature, by Activity and by Subject. Sorted the data by Activity and Subject for easy interpretation.
Exported the summarized tidy dataset to the following file:
- tidy_wearable_ordered.txt

####References [1] Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
UCI HAR Dataset		UCI HAR Dataset
.DS_Store		.DS_Store
.gitignore		.gitignore
CodeBook.md		CodeBook.md
Readme.md		Readme.md
human-activity-recognition.Rproj		human-activity-recognition.Rproj
run_analysis.R		run_analysis.R
tidy_wearable_ordered.txt		tidy_wearable_ordered.txt

cmchenry/HumanActivityRecognition

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages