You will be acting as a data scientist at a consultant company and you need to make a prediction on a dataset. The dataset can be found below.
You need to build classifiers using the techniques covered in the lectures to predict the class attribute. At the very minimum, you need to produce a classifier for each method we have covered (Decision Tree, K-Nearest Neioghbour, Random Forest, Support Vector Machine and Neural Network). However, if you explore the problem very thoroughly (as you should do in the industry), preprocessing the data, looking at different methods, choosing their best parameters settings and identifying the best classifier in a principled and explainable way, then you should be able to get a better mark. If you choose to use KNIME or Python and you show 'expert' use (i.e. exploring multiple classifiers, with different settings, choosing the best in a principled way and being able to explain why you built the model the way you did), optimise and test different models, this will attract a better mark.
You need to write a report describing how you solved the problem and the results you found. See below for requirements.
Below you will find 3 datasets: a "Loan Dataset" to build and optimise your model (it contains the target values), an "Unknown Dataset" for the final model assessment (it does not have the target values - you need to predict them) and a "Kaggle Submission Sample" which shows you what the file submitted to Kaggle should look like. In particular, you will need to set the column names in your submission file correctly - that is, "row ID" and "Prediction-Loan-Default".
For Kaggle you will need to create an account first. Once the account has been created you need to submit the file onto Kaggle. Once it has been submitted it will give you a score. For more information look at the sample documents provided.
Build a classifier that classifies the “loan_default” attribute - with 0 if it is No and 1 for Yes.
You can do different data pre-processing and transformations (e.g. grouping values of attributes, converting them to binary, etc.), providing explanations for why you have chosen to do that. You may need to split the training set into training, validation and test sets to accurately set the parameters and evaluate the quality of the classifier.
You can use KNIME or Python to build classifiers and explain more about your classifier - and be sure that you are producing valid results! You don't need to limit yourself to the classifiers we used in class, but if you do use other classifiers, you need to describe them in your report and make sure you are producing valid results.
A hint: usually it's not a case of having a 'better' classifier that will produce good results. Rather, it's a case of identifying or generating good features that can be used to solve the problem.
Your report should include the following information:
In this assessment, students were required to act as data scientists and develop predictive classification models using the given Loan Dataset. The primary goal was to build and evaluate multiple classifiers to predict the “loan_default” attribute, where 0 = No and 1 = Yes.
The assessment was designed to demonstrate students’ ability to handle an end-to-end machine learning workflow, applying both technical and analytical reasoning in line with industry practices.
The mentor began by helping the student interpret the scenario and define the classification objective predicting loan defaults. The student was guided to perform exploratory data analysis (EDA) to understand variable types, missing values, class distribution, and correlations. This step emphasized identifying potential issues such as data imbalance or outliers early in the process.
Next, the mentor explained the importance of data cleaning and transformation to improve model accuracy. The student learned how to:
The student was then guided to build baseline models using all five classifiers Decision Tree, KNN, Random Forest, SVM, and Neural Network using either KNIME or Python.
The mentor explained how each algorithm works, what hyperparameters affect performance, and how to interpret results. Emphasis was placed on applying train-validation-test splits and cross-validation to avoid overfitting.
The mentor helped the student systematically optimize each model by adjusting parameters such as:
Evaluation metrics such as accuracy, precision, recall, F1-score, and ROC-AUC were used to compare performance. The mentor also guided the student in identifying the most influential features contributing to model decisions.
After evaluating all models, the student, under mentor supervision, selected the best-performing classifier based on validation results and interpretability. The model was then used to predict outcomes for the Unknown Dataset, following Kaggle’s submission format.
The mentor ensured that the file structure, column naming (“row ID” and “Prediction-Loan-Default”), and submission steps were correctly followed.
Finally, the mentor guided the student in preparing a structured technical report covering:
The student was also encouraged to reflect on the learning outcomes, including analytical thinking, technical application, and model interpretability.
By the end of the assessment, the student successfully:
Looking to understand how to approach your assignment effectively? You can download the sample solution provided below to explore the structure, format, and academic writing style expected in your coursework. This example is designed purely for reference and learning purposes, helping you grasp how professional assignments are prepared.
However, please note that submitting this sample as your own work may lead to plagiarism issues and academic penalties. To ensure originality and meet your unique requirements, it’s always best to request a custom-written assignment crafted specifically for your topic and university guidelines.
When you order a fresh, plagiarism-free solution, you get:
Take the smarter route learn from the sample, but submit only what’s truly yours!
Download Sample Solution Order Fresh Assignment
© Copyright 2025 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.