辅导Acsc-Stat 300、讲解R编程设计、辅导R、讲解data留学生
- 首页 >> Algorithm 算法 Acsc-Stat 300
Lab 10 Assignment
The assignments must be in the correct format.
• All code you write in R must be compiled using R-Markdown with the output and answers
into MS-Word or PDF formats.
– The assignment must be done using R-Markdown. The output of this document will
include all the R code, R output and answers to the questions. Answers must be
written in full sentences.
Questions
Use the data file Lab10.RData for this assignment. Use the load() function to read the data.
1. This data has 200 observations. Before starting the variable y needs to be converted to
a factor. It is easiest to create a new data frame D1 containing the factor of y, and the
variables X1 and X2. Separate the data into a training set and a validation set each with
100 observations. Use a random number as the seed.
2. For the training data, plot X1 versus X2 denoting a different color for the values of Y. Does
the data appear to be separable? Please answer a little bit more than “Yes” or “No”.
3. Use the training data and cross validation to select the cost value Support Vector Classifi-
cation model.
(a) What is the best value for cost? How many support vector values are there? Plot the
best model on the training data.
(b) Plot the best model from 3a) on the validation points.
(c) Calculate the predicted values for the validation data and compare these with the
actual values for y (confusion table). What is the misclassification validation error?
4. Use the training data and cross validation to select the cost value and gamma for the
Support Vector Machine using a radial kernel.
(a) What are the recommended values for gamma and cost? Plot the best model on the
training data.
(b) Plot the best model from 4a) on the validation points.
(c) Calculate the predicted values for the validation data and compare these with the
actual values for y. What is the misclassification validation error?
5. Identify which of the two models you would use on this data and briefly state the reason
for your selection.
Lab 10 Assignment
The assignments must be in the correct format.
• All code you write in R must be compiled using R-Markdown with the output and answers
into MS-Word or PDF formats.
– The assignment must be done using R-Markdown. The output of this document will
include all the R code, R output and answers to the questions. Answers must be
written in full sentences.
Questions
Use the data file Lab10.RData for this assignment. Use the load() function to read the data.
1. This data has 200 observations. Before starting the variable y needs to be converted to
a factor. It is easiest to create a new data frame D1 containing the factor of y, and the
variables X1 and X2. Separate the data into a training set and a validation set each with
100 observations. Use a random number as the seed.
2. For the training data, plot X1 versus X2 denoting a different color for the values of Y. Does
the data appear to be separable? Please answer a little bit more than “Yes” or “No”.
3. Use the training data and cross validation to select the cost value Support Vector Classifi-
cation model.
(a) What is the best value for cost? How many support vector values are there? Plot the
best model on the training data.
(b) Plot the best model from 3a) on the validation points.
(c) Calculate the predicted values for the validation data and compare these with the
actual values for y (confusion table). What is the misclassification validation error?
4. Use the training data and cross validation to select the cost value and gamma for the
Support Vector Machine using a radial kernel.
(a) What are the recommended values for gamma and cost? Plot the best model on the
training data.
(b) Plot the best model from 4a) on the validation points.
(c) Calculate the predicted values for the validation data and compare these with the
actual values for y. What is the misclassification validation error?
5. Identify which of the two models you would use on this data and briefly state the reason
for your selection.