--- title: "Cleansing the data set" author: "Choonghyun Ryu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Cleansing the data set} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r environment, echo = FALSE, message = FALSE, warning=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "") ``` ## Preface If you created a data set to develop a classification model, you must perform a cleansing of the data. After you create the data set, you should do the following: * **Cleansing the data set** + **Optional removal of variables, including missing values** + **Remove a variable with one unique number** + **Remove categorical variables with a large number of levels** + **Convert a character variable to a categorical variable** * Split the data into a train set and a test set * Modeling and Evaluate, Predict The alookr package makes these steps fast and easy: ## How to perform cleansing the data set Refer to the following website for information on how to perform cleansing the data set. - [`Cleansing the data set`](https://choonghyunryu.github.io/alookr_vignette/cleansing.html)