KL7012: Statistical Programming-Report Writing - IT Computer Science Assignment Help

Download Solution Order New Solution

Assignment Task

LEARNING OUTCOMES

The learning outcomes (LOs) for this module are:-

Knowledge & Understanding

LO1 Demonstrate knowledge and understanding of the core concepts of machine learning and its underlying mathematical foundations

LO2 Demonstrate knowledge and understanding of the principal advanced machine learning techniques for solving real world problems.

 

Intellectual / Professional skills & abilities

LO3 Critically evaluate machine learning algorithms and applications.

LO4 Analyse, design and develop machine learning solutions and evaluate their performance

 

Personal Values Attributes (Global / Cultural awareness, Ethics, Curiosity) (PVA)

LO5 Carry out independent research, individually and a part of a team, and communicate effectively the research findings.

 

Assessment Tasks:

You have been provided with access to four datasets; all are available on Kaggle (Please see links below). The data covers the following scenarios:

• Cell images for detecting Malaria

• ECG Heartbeat classification

• Classification of breast cancer images

 

You are required to choose ONLY one of the above scenarios as your assignment. Your task is to produce a deep learning model that is appropriate to the problem. The model can be your own model or designed based on fine-tuning of a pretrained model. You are required to conduct data preparation/transformation to make the data ready for the model. Please note that what will be provided in the report should reflect on the python code. Please also note NOT to take on any existing code online as your own work. The errors in the code will affect your final mark. The key components you must complete are:

1. Explore the dataset to understand its characteristics

2. Pre-process your data to be suitable for building the model

3. Build the model that allows for the task specified for chosen dataset and that are going to be used in your comparisons

4. Evaluate the models’ predictions using the metrics stated above.

5. Fine-tune the best model to get better predictions on the test set

6. Present your findings with suitable visualisations that are easy to interpret

7. Critically evaluate and discuss the whole process and the findings and what can be improved

 

 

Assignment Questions:

  1. Six months ago, a local gym set up a research programme to find out if gym members who attended exercise classes were more likely to lose weight than those who exercised alone. A census of all participants was conducted. These were the results they recorded:

Image-1

The staff at the gym wants to know which type of exercise – gym only workouts or attending exercise classes – is most effective in helping individuals lose weight. Prepare a short report (not more than 700 words) which summarises and interprets the findings, using all of the statistics given in the table above.

 

  1. There are different ways to deal with missing data values when processing data, describe one of the ways to deal with it and what are the positives and negatives of using doing this way. [No more than 250 words]

 

  1. Cystic fibrosis is an inherited condition that causes sticky mucus to build up in the lungs and digestive system. This causes lung infections and problems with digesting food. Below is the data recorded for this disease (detail of variables is given below the table). The data below is also provided in ‘cystfibr.txt’ for copying and processing.

Image-2

Below is the description of all the above variables:

age: age in years.

sex: 0: male, 1:female.

height: height (cm).

weight: weight (kg).

bmp: bone morphogenetic protein - body mass (% of normal).

fev1: forced expiratory volume.

rv: residual volume.

frc: functional residual capacity.

tlc: total lung capacity.

pemax: maximum expiratory pressure.

(a) Read this data into a data frame and attach it to the data frame.

(b) Create summaries of the variables in this dataset and comment on them?

 

4. For the data given in question 3 (‘cystfibr.txt’),

(a) Use scatterplots between the variables to find any clear relationships between the variables and discuss them?

(b) Create boxplots for the variables height, weight, bmp, fev1, rv, frc, tlc and pemax, all stratified by sex. Which of these have evidence of outlying observations?

 

  1. The probability that a patient recovers from a delicate heart operation is 0.88. What is the probability that exactly 5 of the next 8 patients having this operation survive?

 

  1.  Northumbria University’s ‘ask4help’ receives 5 emails per minute on the average. Find the probability of receiving 7 emails in a given minute.

 

  1. A fuel station sells, on the average, 14500 litres of fuel per day with a standard deviation of 2500 litres. If a manager stocks 20000 litres on a particular day, what is the probability that more than 10000 litres will be sold?

 

  1. A study was made on the amount of converted sugar in a certain process at various temperatures. The data were coded and recorded as follows:

Image-3

(a) Estimate the linear regression line to provide a chart and summary statistics together with the coefficients.

(b) Estimate the mean amount of converted sugar produced when the coded temperature is 1.76.

 

  1. The following data is taken from a company about its advertisements and purchases of the product. Calculate coefficient of correlation to measure the strength and direction of relationship between the number of advertisements and purchases made, and comment on it.

Image-4

 

  1. A scenario

You are a director of a major manufacturing organisation, and collecting various pieces of information for your potential customers, such as on one of your major customers who is based in London, will require delivery lorries to travel the length of the M1. You should only use the source specified. You will need to adopt a sampling approach and credit will be given for schemes which show you have considered how to apply the principles of sampling to obtain the best results with the smallest possible dataset.

 

Report Requirements Your STATISTICAL report should consist of no more 2500 (applies only to question 10) and should be word-processed. Credit will be given for the use of an appropriate technical style of presentation.

Your report should address the following topics:

• Your sampling strategy and how it was devised

• Details of the data collected

• Details of your statistical analysis and its results

• Conclusions drawn

• Any relevant background research

 

Credit will be given for an appropriate use of graphs, tables and charts. All external sources of information must be correctly cited and referenced.

 

You should include a table of all the data you have collected and any calculations performed in RStudio as an appendix to your report. This is not counted in the page limit. Failure to include this will result in the deduction of marks. Also note that the Traffic England website (link given above) allows you to sign in and save data shots, but any updates on their website may lead to losing of data stored in your account. Thus, it is your responsibility to store this data into your computer(s) and keep it safe for your this task.

 

This KL7012-IT Computer Science Assignment has been solved by our IT Computer Science Experts at My Uni Papers. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing Style. Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered.

You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turn tin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Get It Done! Today

Country
Applicable Time Zone is AEST [Sydney, NSW] (GMT+11)
+

Every Assignment. Every Solution. Instantly. Deadline Ahead? Grab Your Sample Now.