Highlights
Question 1 - Consider “Boston.csv” dataset to answer the following questions.
crim= per capita crime rate by town
indus= proportion of non-retail business acres per town.
rm= average number of rooms per dwelling
dis= weighted distances to five Boston employment centres
medv= Median value of owner-occupied homes (in $1000's)
a). Construct the matrix plot and correlation matrix. Comment on the relationship among variables.
b). Derive a multiple linear regression model to describe “median value of owner occupied homes” in terms of other numeric variables and give the resulting model.
c). Add the interaction term crim*indus to the model in part ii and derive the resulting model.
d). Add the polynomial term rm*rm of order 2 to the model in part iii and derive the resulting model.
e). Test the significance of each slope parameter of the model and discuss the results.
f). Give the resultant best model and describe its accuracy.
g). List the model assumptions and test for three of these assumptions.
Question 02 - Consider “Wine_Quality.csv” dataset to answer the following questions.
a). Divide the dataset into two parts; training set with 3000 observations and testing set with the rest of the observations.
b). Build a decision tree model for the training dataset to predict the Quality of Wine.
c). Use cross-validation and choose the best size for the tree in part(b).
d). Build the best tree model and identify the variables that contribute in creating a Quality Wine.
e). Predict the outputs for the testing dataset using the model in part iv and calculate the Mean Squared Error(MSE).
f). Consider the wine quality as high if WineQuality > 6 and low otherwise. Create a new variable to categorise it as “High” or “Low” and name it “Wine_Cat”.Repeat the steps a to e. In step e calculate the misclassification rate instead of MSE.
Question 03 - Consider “CPU_Performance.csv” dataset to answer the following questions.
a). Build a linear support vector classifier to classify the CPU Performance.
b). Select the best parameter values for the model in (a) using cross-validation.
c). Discuss the performance of the model in (b) by considering misclassification matrix and misclassification rate.
d). Build a polynomial support vector machine to classify the CPU Performance.
e). Select the best parameter values for the model in (d) using cross-validation.
f). Discuss the performance of the model in (e) by considering misclassification matrix and misclassification rate.
g). Build a radial support vector machine to classify the CPU Performance.
h). Select the best parameter values for the model in (g) using cross-validation.
i). Discuss the performance of the model in (h) by considering misclassification matrix and misclassification rate.
j). Identify the best model out of the three different models obtained in previous parts. Justify your answer
This Statistics Assignment has been solved by our Statistics Experts at My Uni Papers. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2025 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.