Highlights
Task:
Question 1a - Knowledge and data mining tasks
Explain the fundamental differences between data clustering and data classification. Give one example for each of these data mining tasks and briefly identify the knowledge to be discovered in each task.
Question 1b - DBMS
What are the different types of relationship between entities in an ER model. Give one example for each type.
Question 2 - Data Preparation (10 Marks)
Question 2a - Scaling and standardization
A student realizes that he/she has applied min-max scaling twice to a numeric attribute. Explain whether or not this could change the subsequent clustering result.
Question 2b - Data reduction and transformation
Data sampling is one technique commonly used in the data preparation step. Describe and explain one situation that you would consider data sampling, and another situation where you would not.
Question 3 - Data Similarity and Distances (10 Marks)
Question 3a - Discrete Sequences
Compute the Edit distance between
Question 3b - Text data similarity
Compute the cosine similarity using the raw frequencies for the following book titles
1. ‘‘data mining practical machine learning tools and techniques’’
2. ‘‘data mining concepts and techniques’’ Clearly show your reasoning.
Question 4 - Association Pattern Mining (15 Marks)
Consider the following transaction database of a fictional supermarket:
Transaction ID Items 1 a, c, d, f
Question 4a - Frequent patterns
Compute all frequent patterns at an absolute minimum support level of 2.
Question 4b - Maximal frequent patterns
Determine all the maximal frequent patterns at an absolute minimum support level of 2.
Question 4c - Association rules
A supermarket manager wishes to know which frequent 2-itemsets are most likely co-purchased with c (note that such a valid frequent 2-itemset must not include c!). In other words, if a customer already purchases such a frequent 2-itemset, c will also be co-purchased with high probability. Suggest the best two (2) frequent 2-itemsets most likely co-purchased with c that have the largest confidence. Assume an absolute support level of 2
Question 4d - Apriori algorithm
Show the candidate itemsets and the frequent itemsets in each level-wise pass of the Apriori algorithm. Assume an absolute minimum support level of 2.
Question 5 - Data Clustering (15 Marks)
Question 5a - External validation
A clustering method produces the following result on a data set of 17 samples. Suppose that the shapes of the data samples indicate the external information which class they come from. Compute the following • Purity • F1 measure
This COMP5009/COMP3009: IT Assignment has been solved by our IT Experts at My Uni Papers. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.
Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.
© Copyright 2025 My Uni Papers – Student Hustle Made Hassle Free. All rights reserved.