Highlights
Task
Predicting Airbnb Listing Prices in Sydney
Overview:
Task 1: Problem Description and Initial Data Analysis
1. Read the Competition Overview on Kaggle
2. Referring to Competition Overview and the data provided on Kaggle write about a 500 words Problem
Task 2: Data Cleaning, Missing Observations and Feature Engineering
Task 2, Question 1 : Clean all numerical features and the target variable price so that they can be used in training algorithms. For instance, host_response_rate feature is in object format containing both numerical values and text. Extract numerical values (or equivalently eliminate the text) so that the numerical values can be used as a regular feature.
Task 2, Question 2 Create at least 4 new features from existing features which contain multiple items of information, e.g. creating email , phone , reviews , jumio , etc. from feature host_verifications .
Task 2, Question 3 : Impute missing values for all features in both training and test datasets.
Task 2, Question 4 : Encode all categorical variables appropriately as discussed in class.
Where a categorical feature contains more than 5 unique values, map the features into 5 most frequent values + 'other' and then encode appropriately. For instance, you could group then map property_type into 5 basic types + 'other': [entire rental unit, private room, entire room, entire towehouse, shared room, other] and then encode.
Task 2, Question 5 : Perform any other actions you think need to be done on the data before constructing predictive models, and clearly explain what you have done.
Task 2, Question 6 : Perform exploratory data analysis to measure the relationship between the features and the target and write up your findings.
Task 3: Fit and tune a forecasting model/Submit predictions/Report score and ranking
Make sure you clearly explain each step you do, both in text and on the recoded video.
1. Build a machine learning (ML) regression model taking into account the outcomes of Tasks 1 & 2
2. Fit the model and tune hyperparameters via cross-validation: make sure you comment and explain each step clearly
3. Create predictions using the test dataset and submit your predictions on Kaggle's competition page
4. Provide Kaggle ranking and score (screenshot your best submission) and comment
5. Make sure your Python code works, so that a marker that can replicate your all of your results and obtain the same MSE from Kaggle
Hint: to perform well you will need to iterate Task 3, building and tuning various models in order to find the best one.
This BUSA8001-IT Computer Science Assignment has been solved by our IT Computer Science Expert at My Uni Papers. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing Style. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered.
You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2025 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.