辅导FIT5149、R语言辅导、讲解R编程设计、辅导Critical Temperature

- 首页 >> CS


FIT5149 S2 2019 Assessment 1

Predicting the Critical Temperature

of a Superconductor

Aug-2019

Marks 15% of all marks for the unit

Due Date 17:00 Friday 13 Spetember 2019

Extension

An extension could be granted for circumstances. A special consideration application form must be

submitted. Please refer to the university webpage on special consideration. Lateness For all assessment items handed in after the official

due date, and without an agreed extension, a 10%

penalty applies to the student’s mark for each day

after the due date (including weekends, and public

holidays) for up to 5 days. Assessment items handed in

after 5 days will not be considered.

Authorship

This assignment is an individual assignment and

the final submission must be identifiable your own

work. Breaches of this requirement will result in an

assignment not being accepted for assessment and

many result in disciplinary action.

Submission

You are required to submit two files, one is either a

Jupyter notebook or a R Markdown file, another is the

PDF file generated by them. The two files must be

submitted via Moodle. Students are required to

accepted the terms and conditions in the Moodle

submission page. A draft submission won’t be marked. Programming

language R in Jupyter Notebook or R Markdown

Introduction

Superconductivity is a phenomenon of exactly zero electrical resistance and

expulsion of magnetic flux fields occurring in certain materials, called superconductors,

when cooled below a characteristic critical temperature. Superconductors

are widely used in many industry fields, e.g. the Magnetic Resonance

Imaging (MRI) in health care, electricity transportation in energy industry and

magnetic separation, etc.

Predicting the critical temperature (Tc) of a superconductor is still an open

problem in the scientific community. In the past, simple empirical rules based on

experiments have guided researchers in synthesizing superconducting materials

for many years. Nowadays, features (or predictors) based on the superconductor’s

elemental properties can be generated and used to predict Tc.

In this task, we are going to analyze superconductor data from the Superconducting

Material Database maintained by Japan’s National Institute for Materials

Science (NIMS). The aim is to build statistical models that can predict

Tc based on the material’s chemical properties.

Specifically, you are going to analyse a superconductor data set, which is

based on real world material science data. The problem you are going to solve

is: Can you

• predict the critical temperature Tc given some chemical properties of a

material?

• explain your prediction and the associated findings? For example, describe

the key properties associated with the response variable.

Data set

The data set was originally from from the Superconducting Material Database

maintained by Japan’s National Institute for Materials Science(NIMS) and prepossessed

by Kam []. It contains 21,263 material records, each of which have 82

columns: 81 columns corresponding to the features extracted and the last 1 column

of the observed Tc values. Among those 81 columns, the first column is the

number of elements in the material, the rest 80 columns are features extracted

from 8 properties (each property has 10 features). Detailed data preparation

process can be found in [].

The data set files are stored in UCI’s website below (click the hyper-line to

download the data)

superconduct.zip : After you unzip the file, there are two data sets: train.csv

can be used to train and validate prediction models and build a description

(21,263 material records). Each record consists of 82 columns, containing

number of elements (column 1), features extracted from 8 properties

(columns 2-81) and the critical temperature (column 82). unique m.csv

tells you the chemical formula of each corresponding material.

In order to finish the analyse task, you should split the provided train.csv into

your own training and testing sets before building the models.

Task description

In this assessment, you will focus on the following two tasks.

Prediction task

For the prediction task, the underlying problem is to estimate the critical temperature

given a new conductor’s properties. There are eight properties that can

be used: Atomic Mass, First Ionization Energy, Atomic Radius, Density, Electron

Affinity, Fusion Heat, Thermal Conductivity, Valence. For each property,

ten features are extracted: Mean, Weighted mean, Geometric mean, Weighted

geometric mean, Entropy, Weighted entropy, Range, Weighted range. Standard

deviation, Weighted standard deviation. The provided data sets are well organised,

you do not need to wrangle the data. But make sure you understand the

intuition of these attributes.

To measure the performance of your model(s), you firstly split the original

data into training and testing set, fit the model using the training set, do the

predictions on the test set and compute the Mean Squared Error (MSE).

In this task, you are required to develop models that can accurately predict

a superconductor’s critical temperature. To finish the task, you should

1. develop and compare 2 to 3 models;

2. describe and justify the choice of your models;

3. analyze and interpret your results

Please note that testing set cannot be used to train your models.

Description task

The purpose of the description task is identify the key properties for a superconductor.

In other words, which property contributes the most to your model’s

performance? Descriptions can be based on variable correlation analysis, regression

equations, linguistic descriptions, or any other form. The descriptions

and the accompanying interpretation must be comprehensible, useful. To finish

this task, you should use proper data analysis techniques (e.g., EDA, statistics)to

1. identify a subset of attributes that have a significant impact on the prediction

of the critical temperature;

2. and give statistical reasons of your finding.

Files to be submitted

There are two files required to be submitted, which are

• The R implementation of the two tasks in one file.

– The file must be either a Jupyter notebook or an R Markdown

file. Besides the R code, all the discussions must also be included in

the file.

– The name of the file must be in one of the following formats:

∗ XXXXXXXX FIT5149 Ass1.ipynb

∗ XXXXXXXX FIT5149 Ass1.Rmd

You should replace “XXXXXXXX” with your student ID.

• A PDF file generated by the Jupyter notebook or R Markdown. The name

of the PDF file must be in the following format

– XXXXXXXX FIT5149 Ass1.pdf

Please refer to the Assessment 1’s Moodle page for how to submit the two

files. Please note that If you do not follow the instruction to name your files, a

penalty will be applied.

Additional learning resources

This assessment is based on the paper A Data Driven Statistical Model for

Predicting the Critical Temperature of a Superconductor at https://arxiv.org/pdf/1803.10260.pdf

• Raw data is available at http://supercon.nims.go.jp/supercon/material_menu

Warning: Monash University takes academic misconduct very seriously. You

can learn from the above materials and understand the principle of how the

analysis was done. However, you must finish this assessment with your own

work.

站长地图