Data Preprocessing in R

About the project

The project focuses on the topic of meat consumption efficiency in Australia. The project is an example of data preprocessing in R.

Goal

The main goal of the project was to incorporate a data pre-processing to data from various sources.

Data

The project used 7 different data sets from various public sources.

Outcomes

The project revealed that: From 1991 to 2017, an average Australian wasted about 45 kilograms of meat per year (~45 percent).

Limitations

The project has not taken into account the percentage of usability of each animal; the amount of meat wastage may includes non-consumable parts such as: bones, organs, etc. Therefore, the wasted amount does not necessarily indicate food wastage or financial loss.

What I’ve learned from this project

The process of data preprocessing

The process is not linear; each of the steps of the process has be revisited multiple times during the project.
The value of data processing

The data preprocessing is not only important for modelling. The process of working with multiple datasets from various sources presents a tremendous learning opportunity regrading the topic of interest.
The unexpected lesson

Although, the project has failed to draw a valid conclusion regrading food wastage, it has raised a question regrading the efficiency of using animals as food, considering the current state of climate change.

The projects covers various steps of the data preprocessing including:

Data cleaning.
Data normalisation.
Data transformation.
Dealing with missing values.
Dealing with multiple data sets.
Dealing with date/time and character variables.

You can download the report here.

The project demonstrates the ability to work with R packages including Tidyr, Dplyr, Reader, Outliers, Lubridate and Stringr

Twitter Facebook Google+ LinkedIn

Minh Phan