CSCI 3151讲解、辅导data留学生、Python语言辅导、Python程序设计调试 辅导Python编程|辅导留学生 Statistic
- 首页 >> CS CSCI 3151: Assignment 2
In this assignment you will:
a) review and extend your understanding of vector algebra and derivatives of functions of multiple
variables.
b) experiment building and evaluating various machine learning models on different data sets. You will
learn how to handle the practicalities of running machine learning algorithms, and critically assess their
performance on the given data sets. You will also practice digging into the sklearn documentation and
online resources.
Start working on the assignment as soon as you receive it.
Use the discussion group on Brightspace to post questions you have as new threads. You will get
feedback from classmates and me (the instructor), and you will collect points for class participation.
Q1 [3]. Digital Probabilities.
In this question you will explore various properties of random variables using Python generators for
them.
a) Write a Python program that throws n times a single fair die with f faces, numbered 1..f, with
probability of each face equal to 1
f
, and returns the numeric average s of the values of the face up over
the n throws. Run this experiment m times.
(i) Compute the experimental (sample) mean and experimental (sample) variance of s based on the
data from m runs, as a function of m
(ii) Plot the histogram of s, as a discrete function over the interval 0, 1, 2, ..., n. Discuss the shape of
the resulting histogram as a function of m.
(iv) What are the theoretical values of the mean and variance of s? Explain your answer.
(iii) Plot the absolute difference of the experimental mean and variance of s from their theoretical values
as a function of m. Discuss the result.
b) Building on the program in (a) for n = 2,
(i) estimate the probability of event A (both throws resulted in an even number), of event B (at least
one throw resulted in an even number) and of the conditional probability of A given B, as a function of
m. Estimating the probability is equivalent to counting the frequency of occurrence of the event in the
number of runs m scaled appropriately to a value in the interval [0, 1].
(ii) Compute the theoretical probabilities of P(A) and P(A|B).
(ii) Plot the absolute difference of the computed values from their theoretical values as a function of m.
(iii) Formulate the estimation of the probabilities in part (i) in the context of the discussion of estimation
in the lecture
c) Consider event C (first throw is even) and D (second throw is ≤ 3). Are these two events independent?
(i) Prove your answer theoretically.
(ii) Intuitively verify your theoretical answer by computing the required probabilities as frequencies,
and see if the independence condition is approximately satisfied.
d) Given a population, we define two events, H = “Have a headache”, and F = “Coming down with
Flu”. The associated probabilities are P(H) = 1/8, P(F) = 1/30, P(H|F) = 1/3.
i) Calculate theoretically the P(HF), P(HF¯), P(HF¯ ), P(H¯F¯)
ii) Build a generative model of a population of m persons, according to the above probabilities. Clearly
justify your approach.
1
iii) Using your population data, estimate using frequencies P(H), P(F) and P(H|F), and plot as a
function of m.
Q2 [3]. Optimization in action
In this question you will implement gradient descent in Python.
Consider a function of two variables, z = f(x, y) = (x − 2)2 + (y − 3)2
. Implement in Python based
on first principles the gradient descent algorithm for estimating the minimum of this function. Pick a
random initial point (x0, y0), and update it by making a small step of size α in the opposite direction
of the gradient of f(x, y) calculated at (x0, y0). Iterate the computation. The new point at each step
is (xi+1, yi+1) = (xi
, yi) − α∇f(xi
, yi). Organize your code to be as general as possible (with respect
to choice of function). Follow good programming practices: add liberal comments, use good naming
conventions for variables, use matrices and vectors, instead of for loops on their scalar elements. Plot
on the (x, y) plane the trajectories of the points (xi
, yi) until convergence for different values of α.
Convergence is defined as the condition |(xi+1, yi+1) − (xi
, yi)| < . Select a meaningful value of .
Discuss the speed of convergence as α varies.
Q3 [3]. Analog Probabilities
In this question, you will review probability basics. The recommended format for your solution is
as markdown cells in your notebook, formatted as markdown text and equations using LaTeX. Use
an online equation editor, like http://www.sciweavers.org/free-online-latex-equation-editor,
click on ”Convert” to view the formatted equation, and copy-paste the resulting LaTeX code into your
markdown cell, enclosed in $...$. Or write them by hand and scan them (use the CamScanner app on
your cell phone for this). Save as image on Google Drive or One Drive and link to it from a markdown
cell in your notebook.
a) A box contains three fair coins and one biased coin. For the biased coin, the probability that any
flip will result in a head is 1/3. Al draws two coins from the box, flips each of them once, observes an
outcome of one head and one tail and returns the coins to the box. Bo then draws one coin from the
box and flips it. The result is a tail. Determine the probability that neither Al nor Bo removed the
biased coin from the box.
b) A box contains N items, K of which are defective. A sample of M items is drawn from the box at
random. What is the probability that the sample includes no defective items if the sample is taken:
i) with replacement
ii) without replacement
c) Show from first principles that the expected value of the sum of two random variables is equal to the
sum of the expected values of the random variables, i.e. E(x + y) = E(x) + E(y).
d) Show from first principles that the variance of a random variable σ
2
x = E{[x − E(x)]2} = E(x
2
) −
[E(x)]2
e) Show from first principles that the expected value of the product xy is equal to the product of
the expected values of the individual random variables x and y, E(xy) = E(x)E(y), if x and y are
independent.
2
Marking the assignment
Refer to the rubric that will be used for marking the assignment.
Submitting the assignment
1. Your assignment as a single .ipynb file including your answers to both the math and the experimental
questions, in the correct order, should be submitted before the deadline on Brightspace.
Use markdown syntax to format your answers
For equations, you can either
a. Format them using latex (enclose latex code in $...$ for inline equations and $$...$$ for
displayed equations). For a quick reference of latex syntax, visit here.
b. Write them with neat handwriting, scan into a png file, and include the png file in your notebook,
using this syntax: ![alt text](imageURL)
Consider the CamScanner app on your mobile phone for scanning.
2. You can submit multiple editions of your assignment. Only the last one will be marked. It is
recommended to upload a complete submission, even if you are still improving it, so that you have
something into the system if your computer fails for whatever reason.
3. IMPORTANT: PLEASE NAME YOUR PYTHON NOTEBOOK FILE AS:
--Assignment-N.ipynb, for example
Milios-Evangelos-Assignment-1.ipynb
A 10% penalty to the assignment mark will be applied for a misnamed notebook file, i.e. your
mark will be multiplied by 0.9.
4. In addition to your .ipynb file, please upload a blank rubric file, which you download from this
URL. A 10% penalty to the assignment mark will be applied for uploading a zip file, instead of
two separate files (notebook + rubric).
5. The markers will enter your marks and their overall feedback in the rubric file on Brightspace, and
they will upload your Python notebook file with comments on specific cells, as a new markdown
cell below the cell being commented on.
3
In this assignment you will:
a) review and extend your understanding of vector algebra and derivatives of functions of multiple
variables.
b) experiment building and evaluating various machine learning models on different data sets. You will
learn how to handle the practicalities of running machine learning algorithms, and critically assess their
performance on the given data sets. You will also practice digging into the sklearn documentation and
online resources.
Start working on the assignment as soon as you receive it.
Use the discussion group on Brightspace to post questions you have as new threads. You will get
feedback from classmates and me (the instructor), and you will collect points for class participation.
Q1 [3]. Digital Probabilities.
In this question you will explore various properties of random variables using Python generators for
them.
a) Write a Python program that throws n times a single fair die with f faces, numbered 1..f, with
probability of each face equal to 1
f
, and returns the numeric average s of the values of the face up over
the n throws. Run this experiment m times.
(i) Compute the experimental (sample) mean and experimental (sample) variance of s based on the
data from m runs, as a function of m
(ii) Plot the histogram of s, as a discrete function over the interval 0, 1, 2, ..., n. Discuss the shape of
the resulting histogram as a function of m.
(iv) What are the theoretical values of the mean and variance of s? Explain your answer.
(iii) Plot the absolute difference of the experimental mean and variance of s from their theoretical values
as a function of m. Discuss the result.
b) Building on the program in (a) for n = 2,
(i) estimate the probability of event A (both throws resulted in an even number), of event B (at least
one throw resulted in an even number) and of the conditional probability of A given B, as a function of
m. Estimating the probability is equivalent to counting the frequency of occurrence of the event in the
number of runs m scaled appropriately to a value in the interval [0, 1].
(ii) Compute the theoretical probabilities of P(A) and P(A|B).
(ii) Plot the absolute difference of the computed values from their theoretical values as a function of m.
(iii) Formulate the estimation of the probabilities in part (i) in the context of the discussion of estimation
in the lecture
c) Consider event C (first throw is even) and D (second throw is ≤ 3). Are these two events independent?
(i) Prove your answer theoretically.
(ii) Intuitively verify your theoretical answer by computing the required probabilities as frequencies,
and see if the independence condition is approximately satisfied.
d) Given a population, we define two events, H = “Have a headache”, and F = “Coming down with
Flu”. The associated probabilities are P(H) = 1/8, P(F) = 1/30, P(H|F) = 1/3.
i) Calculate theoretically the P(HF), P(HF¯), P(HF¯ ), P(H¯F¯)
ii) Build a generative model of a population of m persons, according to the above probabilities. Clearly
justify your approach.
1
iii) Using your population data, estimate using frequencies P(H), P(F) and P(H|F), and plot as a
function of m.
Q2 [3]. Optimization in action
In this question you will implement gradient descent in Python.
Consider a function of two variables, z = f(x, y) = (x − 2)2 + (y − 3)2
. Implement in Python based
on first principles the gradient descent algorithm for estimating the minimum of this function. Pick a
random initial point (x0, y0), and update it by making a small step of size α in the opposite direction
of the gradient of f(x, y) calculated at (x0, y0). Iterate the computation. The new point at each step
is (xi+1, yi+1) = (xi
, yi) − α∇f(xi
, yi). Organize your code to be as general as possible (with respect
to choice of function). Follow good programming practices: add liberal comments, use good naming
conventions for variables, use matrices and vectors, instead of for loops on their scalar elements. Plot
on the (x, y) plane the trajectories of the points (xi
, yi) until convergence for different values of α.
Convergence is defined as the condition |(xi+1, yi+1) − (xi
, yi)| < . Select a meaningful value of .
Discuss the speed of convergence as α varies.
Q3 [3]. Analog Probabilities
In this question, you will review probability basics. The recommended format for your solution is
as markdown cells in your notebook, formatted as markdown text and equations using LaTeX. Use
an online equation editor, like http://www.sciweavers.org/free-online-latex-equation-editor,
click on ”Convert” to view the formatted equation, and copy-paste the resulting LaTeX code into your
markdown cell, enclosed in $...$. Or write them by hand and scan them (use the CamScanner app on
your cell phone for this). Save as image on Google Drive or One Drive and link to it from a markdown
cell in your notebook.
a) A box contains three fair coins and one biased coin. For the biased coin, the probability that any
flip will result in a head is 1/3. Al draws two coins from the box, flips each of them once, observes an
outcome of one head and one tail and returns the coins to the box. Bo then draws one coin from the
box and flips it. The result is a tail. Determine the probability that neither Al nor Bo removed the
biased coin from the box.
b) A box contains N items, K of which are defective. A sample of M items is drawn from the box at
random. What is the probability that the sample includes no defective items if the sample is taken:
i) with replacement
ii) without replacement
c) Show from first principles that the expected value of the sum of two random variables is equal to the
sum of the expected values of the random variables, i.e. E(x + y) = E(x) + E(y).
d) Show from first principles that the variance of a random variable σ
2
x = E{[x − E(x)]2} = E(x
2
) −
[E(x)]2
e) Show from first principles that the expected value of the product xy is equal to the product of
the expected values of the individual random variables x and y, E(xy) = E(x)E(y), if x and y are
independent.
2
Marking the assignment
Refer to the rubric that will be used for marking the assignment.
Submitting the assignment
1. Your assignment as a single .ipynb file including your answers to both the math and the experimental
questions, in the correct order, should be submitted before the deadline on Brightspace.
Use markdown syntax to format your answers
For equations, you can either
a. Format them using latex (enclose latex code in $...$ for inline equations and $$...$$ for
displayed equations). For a quick reference of latex syntax, visit here.
b. Write them with neat handwriting, scan into a png file, and include the png file in your notebook,
using this syntax: ![alt text](imageURL)
Consider the CamScanner app on your mobile phone for scanning.
2. You can submit multiple editions of your assignment. Only the last one will be marked. It is
recommended to upload a complete submission, even if you are still improving it, so that you have
something into the system if your computer fails for whatever reason.
3. IMPORTANT: PLEASE NAME YOUR PYTHON NOTEBOOK FILE AS:
Milios-Evangelos-Assignment-1.ipynb
A 10% penalty to the assignment mark will be applied for a misnamed notebook file, i.e. your
mark will be multiplied by 0.9.
4. In addition to your .ipynb file, please upload a blank rubric file, which you download from this
URL. A 10% penalty to the assignment mark will be applied for uploading a zip file, instead of
two separate files (notebook + rubric).
5. The markers will enter your marks and their overall feedback in the rubric file on Brightspace, and
they will upload your Python notebook file with comments on specific cells, as a new markdown
cell below the cell being commented on.
3