Master the Data Preparation - IT Assignment Help

Download Solution Order New Solution
Assignment Task

    

Task
Data Cleaning must take place before you analyze and combine all the information to help answer the question your client is asking.The source of data for Melbourne listings is a data set adapted from Inside Airbnb sourcing data from Airbnb. For this assignment we utilise the data sets specific to Melbourne and detailed listings data compiled on 5 July, 2021. The listing dataset provides information about each Melbourne property hosts make available for rent. Visually inspect the data provided at the suggested location by loading into your Excel worksheet ‘raw data’. After cleaning the raw data within your worksheet ‘clean data’ should show a count of 18,606 rows and 15 columns.Use Excel functionality to confirm the data shape (number of rows and columns). For data intensive projects, consider the rows as observations while columns represent the variables.Ensure that the id column does not have any duplicates. This field is considered a primary key and should be unique within your sheet. This id is a useful proxy for numbering each row of the dataset.Note: The id column is the ID for the listing. This is not to be confused with host_id column, the ID of the host. Hosts can have more than one listing.Review descriptions of the fields (column names) available to help describe each listing. This will help you to get a feel for how Airbnb and hosts describe properties.A separate spreadsheet workbook has been provided to you HERE, containing a data dictionary or cheat sheet contains description of all the variables or fields. Here is a list of a select of variables and description to get you started.The raw data contains some 31 variables and most are not used for this assessment. You will likely focus on 3 to 5 variables so the requirement is not to race ahead and clean all the data in advance. The variables provide detailed information about hosts, properties, property description in long form, neighbourhoods, room types, area, the price (rental) and more. The data itself likely includes duplicate information, blank, corrupt fields or N/A instead of actual values. Additionally, some of the variables may not be useful for any of your investment decision making to help answer the client question. Using a combination of knowledge and assumptions you can eliminate a number of variables.For the purpose of this assessment, we are interested in the following variables only:Beyond duplicates, blank rows and obvious errors further clean the data with some simple rules such as ignore any rows not relating to Melbourne or neighbourhoods, ignore columns with lots of empty data. Make sure you check the spelling of Melbourne neighbourhood names to ensure you have correct spelling.Save your changes and upload a copy of your workbook below(make sure you only have one file uploaded only). Note at this stage, your workbook should contain the following sheets only:

1-Raw data
2- Clean data


While data cleaning is often regarded as boring and monotonous it is a critical function that can directly improve decision making. 
Common data issues:

•Missing values
•Null values
•Partial or incomplete
•Duplicates
•Ill formed data
•Strange symbols
•Unstructured data
 
Profile the data
Format as table – table with column headings
1
.Benefits – filter table by values in columns e.g. property type
2.Sort by individual columns e.g. price
3. Quickly view range of values by column to find Blanks, outlier values and NULLs
4. Sort/filter/view using tables
5.Sort /filter impacts entire dataset
6.Duplicates should be easy to determine
7.Infer missing value or incomplete
8.Interpolation – help determine most likely value in the field
 

 

This IT Assignment has been solved by our IT Experts at My Uni Papers. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
    

Be it a used or new solution, the quality of the work submitted by our assignment Experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.