# 辅导BUCI057H7、C++，Java编程辅导

- 首页 >> Database作业
BUCI057H7 page 2 of 9

Question 1 (10 marks)
Provost & Fawcett have defined Data Science in terms of 9 computational problems.
Define the Similarity problem in general and propose examples on multi-dimensional data.

BUCI057H7 page 3 of 9

Question 2 (10 marks)
Spectral analysis can be used to reduce data dimensionality. Explain why dimensionality reduction is
desirable and how Spectral analysis can achieve it.

BUCI057H7 page 4 of 9

Question 3 (10 marks)
Over D = {a, b, c, d, e}, frequency of observations gives us the following distribution:
P = Pr[X=xi] = [3/8, 3/16, 1/8, 1/8, 3/16].
To simplify calculations, however, we decide to adopt the “simpler” distribution
Q = Pr[X=xi] = [1/2, 1/8, 1/8, 1/8, 1/8].
Compute the Kullback-Leibler divergence between P and Q, defined as

To simplify calculations, assume that log23 (logarithm in base 2 of 3) equals 1.585 and show the
process by which you calculated the divergence.

BUCI057H7 page 5 of 9

Question 4 (10 marks)
Define the decision trees employed in the Supervised Segmentation task and describe in words how
the CART algorithm can recursively build a decision tree for a given dataset of labeled Yes/No
examples.

BUCI057H7 page 6 of 9

Question 5 (10 marks)
Sports Rating & Ranking: if a function S(i) measures the strength of a team/player i attending a
tournament, how could we predict the outcome of a match between, say, team i and team j?
What method would you use, among those seen in class, to extract function S(i) from a dataset of past
results?

BUCI057H7 page 7 of 9

Question 6 (10 marks)
Define the Kernel method for creating a feature space and discuss why it is used in combination with
Support Vector Machines to classify data.