subject

We will practice building a machine learning algorithm using a new dataset, iris, that provides multiple predictors for us to use to train. To start, we will remove the setosa species and we will focus on the versicolor and virginica iris species using the following code: library(caret)
data(iris)
iris <- iris[-which(iris$Species=='setosa') , ]
y <- iris$Species
The following questions all involve work with this dataset.
1. First let us create an even split of the data into train and test partitions using createDataPartition() from the caret package. The code with a missing line is given below:
# set. seed(2) # if using R 3.5 or earlier
set. seed (2, sample. kind="Rounding") # if using R 3.6 or later
# line of code
test <- iris[test_index, ]
train <- iris[-test_index, ]
2. Which code should be used in place of # line of code above?
a. test_index <- createDataPartition(y, times=1, p=0.5)
b. test_index <- sample(2, length(y), replace=FALSE)
c. test_index <- createDataPartition(y, times=1, p=0.5, list=FALSE)
d. test_index <- rep(1, length(y))
Note: for this question, you may ignore any warning message generated by the code. If you have R 3.6 or later, you should always use the sample. kind argument in set. seed for this course.
3. Next we will figure out the singular feature in the dataset that yields the greatest overall accuracy when predicting species. You can use the code from the introduction and from Q1 to start your analysis.
Using only the train iris dataset, for each feature, perform a simple search to find the cutoff that produces the highest accuracy, predicting virginica if greater than the cutoff and versicolor otherwise. Use the seq function over the range of each feature by intervals of 0.1 for this search. Which feature produces the highest accuracy?
a. Sepal. Length
b. Sepal. Width
c. Petal. Length
d. Petal. Width
4. For the feature selected in Q8, use the smart cutoff value from the training data to calculate overall accuracy in the test data. What is the overall accuracy?
Notice that we had an overall accuracy greater than 96% in the training data, but the overall accuracy was lower in the test data. This can happen often if we overtrain. In fact, it could be the case that a single feature is not the best choice. For example, a combination of features might be optimal. Using a single feature and optimizing the cutoff as we did on our training data can lead to overfitting.
Given that we know the test data, we can treat it like we did our training data to see if the same feature with a different cutoff will optimize our predictions. Repeat the analysis in Q8 but this time using the test data instead of the training data. Which feature best optimizes our overall accuracy when using the test set?
a. Sepal. Length
b. Sepal. Width
c. Petal. Length
d. Petal. width
5. Now we will perform some exploratory data analysis on the data.
plot(iris, pch=21, bg=iris$Species)
Notice that Petal. Length and Petal. width in combination could potentially be more information than either feature alone. Optimize the the cutoffs for Petal. Length and Petal. width separately in the train dataset by using the seq function with increments of 0.1. Then, report the overall accuracy when applied to the test dataset by creating a rule that predicts virginica if Petal. Length is greater than the length cutoff OR Petal. Width is greater than the width cutoff, and versicolor otherwise. What is the overall accuracy for the test data now?

ansver
Answers: 2

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 21:50
Given int variables k and total that have already been declared, use a while loop to compute the sum of the squares of the first 50 counting numbers, and store this value in total. thus your code should put 11 + 22 + 33 + + 4949 + 50*50 into total. use no variables other than k and total.
Answers: 2
question
Computers and Technology, 23.06.2019 15:00
What is the total resistance in a circuit that contains three 60 ohm resistors connected in a series? a. 20 ohms b. 120 ohms c. 60 ohms d. 180 ohms
Answers: 2
question
Computers and Technology, 24.06.2019 13:30
To move an excel worksheet tab, simply right-click on it drag and drop it double-click on it delete it
Answers: 1
question
Computers and Technology, 24.06.2019 21:30
Write an algorithm to check if a number is even or odd and show with flow chart step by step
Answers: 2
You know the right answer?
We will practice building a machine learning algorithm using a new dataset, iris, that provides mult...
Questions
question
Mathematics, 22.11.2020 23:00
question
Geography, 22.11.2020 23:00
question
Geography, 22.11.2020 23:00
Questions on the website: 13722360