Dataset Attribute Description Assignment 3

Download Solution Order New Solution

Assessment 3 

Scenario

You will be acting as a data scientist at a consultant company and you need to make a prediction on a dataset. The dataset can be found below.

You need to build classifiers using the techniques covered in the lectures to predict the class attribute. At the very minimum, you need to produce a classifier for each method we have covered (Decision Tree, K-Nearest Neioghbour, Random Forest, Support Vector Machine and Neural Network). However, if you explore the problem very thoroughly (as you should do in the industry), preprocessing the data, looking at different methods, choosing their best parameters settings and identifying the best classifier in a principled and explainable way, then you should be able to get a better mark. If you choose to use KNIME or Python and you show 'expert' use (i.e. exploring multiple classifiers, with different settings, choosing the best in a principled way and being able to explain why you built the model the way you did), optimise and test different models, this will attract a better mark. 

You need to write a report describing how you solved the problem and the results you found. See below for requirements.

Below you will find 3 datasets: a "Loan Dataset" to build and optimise your model (it contains the target values), an "Unknown Dataset" for the final model assessment (it does not have the target values - you need to predict them) and a "Kaggle Submission Sample" which shows you what the file submitted to Kaggle should look like. In particular, you will need to set the column names in your submission file correctly - that is, "row ID" and "Prediction-Loan-Default".

For Kaggle you will need to create an account first. Once the account has been created you need to submit the file onto Kaggle. Once it has been submitted it will give you a score. For more information look at the sample documents provided.

Classification Task

Build a classifier that classifies the “loan_default” attribute - with 0 if it is No and 1 for Yes.

You can do different data pre-processing and transformations (e.g. grouping values of attributes, converting them to binary, etc.), providing explanations for why you have chosen to do that. You may need to split the training set into training, validation and test sets to accurately set the parameters and evaluate the quality of the classifier.

You can use KNIME or Python to build classifiers and explain more about your classifier - and be sure that you are producing valid results! You don't need to limit yourself to the classifiers we used in class, but if you do use other classifiers, you need to describe them in your report and make sure you are producing valid results.

A hint: usually it's not a case of having a 'better' classifier that will produce good results. Rather, it's a case of identifying or generating good features that can be used to solve the problem.

Your report should include the following information:

  • A description of the data mining problem;
  • The data preprocessing and transformations you did (if any);
  • How you went about solving the problem;
  • Classification techniques used and summary of the results and parameter settings;
  • The best classifier that you selected - the type, its performance, how it solved the problem (if it makes sense for that type of classifier.

Assessment Requirements Summary

In this assessment, students were required to act as data scientists and develop predictive classification models using the given Loan Dataset. The primary goal was to build and evaluate multiple classifiers to predict the “loan_default” attribute, where 0 = No and 1 = Yes.

The key requirements included:

  • Conducting data preprocessing and feature engineering to ensure data quality and model readiness.
  • Building classifiers using at least five core techniques:
    Decision Tree, K-Nearest Neighbour (KNN), Random Forest, Support Vector Machine (SVM), and Neural Network.
  • Performing parameter tuning, model validation, and comparative evaluation to identify the best-performing model.
  • Submitting predictions for the Unknown Dataset on Kaggle to assess real-world model performance.
  • Writing a detailed report explaining the data mining process, preprocessing steps, classifier selection, parameter optimization, and final model results.

The assessment was designed to demonstrate students’ ability to handle an end-to-end machine learning workflow, applying both technical and analytical reasoning in line with industry practices.

Academic Mentor’s Step-by-Step Guidance Process

Step 1: Understanding the Problem and Data Exploration

The mentor began by helping the student interpret the scenario and define the classification objective predicting loan defaults. The student was guided to perform exploratory data analysis (EDA) to understand variable types, missing values, class distribution, and correlations. This step emphasized identifying potential issues such as data imbalance or outliers early in the process.

Step 2: Data Preprocessing and Feature Engineering

Next, the mentor explained the importance of data cleaning and transformation to improve model accuracy. The student learned how to:

  • Handle missing data using imputation or removal techniques.
  • Convert categorical attributes into numerical form (label encoding or one-hot encoding).
  • Normalize or scale features for algorithms like SVM and KNN.
  • Generate new derived features that could improve predictive performance.
    The mentor encouraged documenting each preprocessing decision to ensure the approach remained explainable and reproducible.

Step 3: Model Building Using Multiple Classifiers

The student was then guided to build baseline models using all five classifiers Decision Tree, KNN, Random Forest, SVM, and Neural Network using either KNIME or Python.
The mentor explained how each algorithm works, what hyperparameters affect performance, and how to interpret results. Emphasis was placed on applying train-validation-test splits and cross-validation to avoid overfitting.

Step 4: Model Optimization and Performance Evaluation

The mentor helped the student systematically optimize each model by adjusting parameters such as:

  • Tree depth and splitting criteria for Decision Trees,
  • K values for KNN,
  • Number of estimators for Random Forests,
  • Kernel and C-value for SVM, and
  • Network layers, activation functions, and learning rates for Neural Networks.

Evaluation metrics such as accuracy, precision, recall, F1-score, and ROC-AUC were used to compare performance. The mentor also guided the student in identifying the most influential features contributing to model decisions.

Step 5: Selecting the Best Model and Preparing the Kaggle Submission

After evaluating all models, the student, under mentor supervision, selected the best-performing classifier based on validation results and interpretability. The model was then used to predict outcomes for the Unknown Dataset, following Kaggle’s submission format.
The mentor ensured that the file structure, column naming (“row ID” and “Prediction-Loan-Default”), and submission steps were correctly followed.

Step 6: Report Compilation and Reflection

Finally, the mentor guided the student in preparing a structured technical report covering:

  • Description of the data mining task
  • Preprocessing and transformation rationale
  • Model development and optimization process
  • Comparative results and final model justification

The student was also encouraged to reflect on the learning outcomes, including analytical thinking, technical application, and model interpretability.

Final Outcome and Learning Achievements

By the end of the assessment, the student successfully:

  • Applied multiple machine learning algorithms to a real-world dataset.
  • Learned to preprocess, optimize, and validate models systematically.
  • Developed an understanding of how different classifiers behave under varying data conditions.
  • Demonstrated data-driven decision-making and technical reporting skills suitable for professional data science environments.

Get Expert Guidance Use This Sample the Right Way!

Looking to understand how to approach your assignment effectively? You can download the sample solution provided below to explore the structure, format, and academic writing style expected in your coursework. This example is designed purely for reference and learning purposes, helping you grasp how professional assignments are prepared.

However, please note that submitting this sample as your own work may lead to plagiarism issues and academic penalties. To ensure originality and meet your unique requirements, it’s always best to request a custom-written assignment crafted specifically for your topic and university guidelines.

When you order a fresh, plagiarism-free solution, you get:

  • 100% original and customized content
  • Professional academic writers with subject expertise
  • Proper referencing and formatting
  • On-time delivery with complete confidentiality

Take the smarter route learn from the sample, but submit only what’s truly yours!

Download Sample Solution  Order Fresh Assignment

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.