# 辅导MSBA7021、辅导C++，Java编程

- 首页 >> CS Prescriptive Analytics Due Dec 22, 23:59 pm

Fall 2022

Assignment 2

1. The method of random forests, developed by Leo Breiman and Adele Cutler, was designed to improve

the prediction accuracy of CART (classification and regression trees). It works by building a large

number of trees which each “votes” on the outcome to make a final prediction.

A random forest builds many trees by randomly selecting a subset of observations and a subset of in-

dependent variables to use for each tree. We could not just run CART many times, because we would

get the same tree every time. Instead, each tree is given a random subset of the independent vari-

ables from which it can select the splits, and a training set that is a bagged or bootstrapped sample of

the full dataset. This means that the data used as the training data for each tree is selected randomly

with replacement from the full dataset. As an example, if we have five data points in our training set,

labeled {1, 2, 3, 4, 5}, we might use the five observations {3, 2, 4, 3, 1} to build the first tree, the five

observations {5, 1, 2, 3, 2} to build the second tree, etc. Notice that some observations are repeated,

and some are not included in certain training sets. This gives each tree a slightly different training

set, and helps make each tree see different things in the data.

Just like in CART, there are some parameters that needed to be selected for a random forest model.

One is the number of trees to build, which is typically set to be a few hundred. The more trees that

are built, the longer the model will take to build. But if not enough trees are built, the full potential

of the model might not be realized. Another parameter is needed to control the size the trees. A

popular choice is to set a maximum depth of a tree. Random forest models have been shown to be

less sensitive to the parameters of the model, so it is typically sufficient to set the parameters to

reasonable values.

In this assignment, we will investigate how the method of random forests performs in contextual

decision-making, using the same setup as in the lecture note “From Predictions to Prescriptions”,

except that we now set the size of the training set n = 1000 and the size of the testing set T = 500.

The data is given by train.csv and test.csv.

(a) In the lecture note, we use linear regression as the machine learning model to predict the de-

mand for a given observation of the covariates in the predict-then-optimize (PTO) approach.

Use the the random forest model instead to see how the result changes.

(b) The method of random forests can also be used in the “integrated” approach as it also provides

a weight function. Let M be the number of trees built in a random forest model. For each

m = 1, . . . ,M , treem generates a disjoint partition of the value region of the covariates. Then,

we may construct a weight function, say wtreem , for tree m using the partition of the tree. In

particular, let ψm denote the binning rule of tree m that maps x to the corresponding bin (or

leaf). As explained in the lecture note, the weight wtreem can be written as

wtreem (x, xi) =

I(ψm(x) = ψ(xi))

#{j : ψm(x) = ψm(xj)} .

That is, for tree m, given a new observation of the covariates x, we give equal weights to the

Page 1 of 2

MSBA7021

Prescriptive Analytics

Fall 2022

Assignment 2

observations xi in the training set that share the same bin (or leaf) as x, and these weights sum

to one. On the other hand, the observations xi in the training set that do not share the same

leaf as x are given weights 0.

Note that we can construct such a weight function for each tree in the random forest model.

(These weight functions are different because the trees are different.) Recall that when we use

the random forest model for a prediction task, the random forest takes a majority vote over the

M trees to generate the final prediction. This means that each tree carries the weight in the

final decision. We may apply the same idea here for constructing the weight function for the

random forest. Namely, we simply take an average on the weights generated by each tree as the

weight generated by the random forest, i.e.,

I(ψm(x) = ψ(xi))

#{j : ψm(x) = ψm(xj)} .

Use this weight function in the “integrated” approach to see how the result changes.

Hint. The binning rule of each tree of the random forest model is already implemented in the

scikit-learn. Explore its documentation to find out.

(c) Compare the results in (a) and (b) to the “integrated” approach based on kernel regression,

where the Gaussian kernel is used.

Note. To achieve a good performance, you should write your own codes and tune the hyperpa-

rameters with cross-validation. Students who correctly use Bayesian Optimization for tuning

will receive a bounus (10 pts at most).

(d) (Bonus question, 10pts) Design a new weight function for the “integrated” approach. Imple-

ment it and compare the result with (a) to (c).

Hint. You can have a try with local linear regression.

2. Consider the following two functions to be minimized:

f(x, y) = (x? 5)2 + 2(y + 3)2 + xy,

g(x, y) = (1? (y ? 3))2 + 10 ((x+ 4)? (y ? 3)2)2 .

In the following sub-questions, run a gradient descent algorithm with the initial point (x, y) =

(0, 2). For function f , T = 10 iterations are allowed, while T = 100 iterations are allowed for

function g. Report the function value after each iteration and visualize the convergence process.

(a) Run with a fixed learning rate of η = 0.05 for function f and η = 0.0015 for function g.

(b) Run with any variant of gradient descent you want, such as stochastic gradient descent or gra-

dient descent with a learning rate schedule.

Note. Try to get the smallest function value after the T iterations. The performance of your

gradient descent algorithm will be graded.

Fall 2022

Assignment 2

1. The method of random forests, developed by Leo Breiman and Adele Cutler, was designed to improve

the prediction accuracy of CART (classification and regression trees). It works by building a large

number of trees which each “votes” on the outcome to make a final prediction.

A random forest builds many trees by randomly selecting a subset of observations and a subset of in-

dependent variables to use for each tree. We could not just run CART many times, because we would

get the same tree every time. Instead, each tree is given a random subset of the independent vari-

ables from which it can select the splits, and a training set that is a bagged or bootstrapped sample of

the full dataset. This means that the data used as the training data for each tree is selected randomly

with replacement from the full dataset. As an example, if we have five data points in our training set,

labeled {1, 2, 3, 4, 5}, we might use the five observations {3, 2, 4, 3, 1} to build the first tree, the five

observations {5, 1, 2, 3, 2} to build the second tree, etc. Notice that some observations are repeated,

and some are not included in certain training sets. This gives each tree a slightly different training

set, and helps make each tree see different things in the data.

Just like in CART, there are some parameters that needed to be selected for a random forest model.

One is the number of trees to build, which is typically set to be a few hundred. The more trees that

are built, the longer the model will take to build. But if not enough trees are built, the full potential

of the model might not be realized. Another parameter is needed to control the size the trees. A

popular choice is to set a maximum depth of a tree. Random forest models have been shown to be

less sensitive to the parameters of the model, so it is typically sufficient to set the parameters to

reasonable values.

In this assignment, we will investigate how the method of random forests performs in contextual

decision-making, using the same setup as in the lecture note “From Predictions to Prescriptions”,

except that we now set the size of the training set n = 1000 and the size of the testing set T = 500.

The data is given by train.csv and test.csv.

(a) In the lecture note, we use linear regression as the machine learning model to predict the de-

mand for a given observation of the covariates in the predict-then-optimize (PTO) approach.

Use the the random forest model instead to see how the result changes.

(b) The method of random forests can also be used in the “integrated” approach as it also provides

a weight function. Let M be the number of trees built in a random forest model. For each

m = 1, . . . ,M , treem generates a disjoint partition of the value region of the covariates. Then,

we may construct a weight function, say wtreem , for tree m using the partition of the tree. In

particular, let ψm denote the binning rule of tree m that maps x to the corresponding bin (or

leaf). As explained in the lecture note, the weight wtreem can be written as

wtreem (x, xi) =

I(ψm(x) = ψ(xi))

#{j : ψm(x) = ψm(xj)} .

That is, for tree m, given a new observation of the covariates x, we give equal weights to the

Page 1 of 2

MSBA7021

Prescriptive Analytics

Fall 2022

Assignment 2

observations xi in the training set that share the same bin (or leaf) as x, and these weights sum

to one. On the other hand, the observations xi in the training set that do not share the same

leaf as x are given weights 0.

Note that we can construct such a weight function for each tree in the random forest model.

(These weight functions are different because the trees are different.) Recall that when we use

the random forest model for a prediction task, the random forest takes a majority vote over the

M trees to generate the final prediction. This means that each tree carries the weight in the

final decision. We may apply the same idea here for constructing the weight function for the

random forest. Namely, we simply take an average on the weights generated by each tree as the

weight generated by the random forest, i.e.,

I(ψm(x) = ψ(xi))

#{j : ψm(x) = ψm(xj)} .

Use this weight function in the “integrated” approach to see how the result changes.

Hint. The binning rule of each tree of the random forest model is already implemented in the

scikit-learn. Explore its documentation to find out.

(c) Compare the results in (a) and (b) to the “integrated” approach based on kernel regression,

where the Gaussian kernel is used.

Note. To achieve a good performance, you should write your own codes and tune the hyperpa-

rameters with cross-validation. Students who correctly use Bayesian Optimization for tuning

will receive a bounus (10 pts at most).

(d) (Bonus question, 10pts) Design a new weight function for the “integrated” approach. Imple-

ment it and compare the result with (a) to (c).

Hint. You can have a try with local linear regression.

2. Consider the following two functions to be minimized:

f(x, y) = (x? 5)2 + 2(y + 3)2 + xy,

g(x, y) = (1? (y ? 3))2 + 10 ((x+ 4)? (y ? 3)2)2 .

In the following sub-questions, run a gradient descent algorithm with the initial point (x, y) =

(0, 2). For function f , T = 10 iterations are allowed, while T = 100 iterations are allowed for

function g. Report the function value after each iteration and visualize the convergence process.

(a) Run with a fixed learning rate of η = 0.05 for function f and η = 0.0015 for function g.

(b) Run with any variant of gradient descent you want, such as stochastic gradient descent or gra-

dient descent with a learning rate schedule.

Note. Try to get the smallest function value after the T iterations. The performance of your

gradient descent algorithm will be graded.