--- title: "Splitting the data set" author: "Choonghyun Ryu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Splitting the dataset} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r environment, echo = FALSE, message = FALSE, warning=FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "") ``` ## Preface The original data must be divided into train and test data sets to develop a classification model. You should do the following: * Cleansing the data set * **Split the data into a train set and a test set** + **Split the data.frame or tbl_df into a train set and a test set** + **Compare data set** + **Comparison of categorical variables** + **Comparison of numeric variables** + **Diagnosis of train set and test set** + **Extract train/test data set** + **Extract train set or test set** + **Extract the data to fit the model** * Modeling and Evaluate, Predict The alookr package makes these steps fast and easy: ## How to perform split the data Refer to the following website for information on splitting the data into a train and test set. - [`Splitting the dataset`](https://choonghyunryu.github.io/alookr_vignette/split.html)