代写Adaptive Signal Processing and Machine Intelligence Coursework代做R编程

- 首页 >> Java编程

Adaptive Signal Processing and Machine Intelligence Coursework

March 15, 2024

Guidelines

The coursework comprises four assignments, whose individual scores yield 80% of the final mark.  The remaining 20% accounts for presentation and organisation. Students are allowed to discuss the coursework but must code their own MATLAB scripts, produce their own figures and tables, and provide their own discussion of the coursework assignments.

General directions and notation:

。The simulations should be coded in MATLAB, a de facto standard in the implementation and validation of signal processing algorithms.

。The report should be clear, well-presented, and should include the answers to the assignments in a chronological order and with appropriate labelling. Students are encouraged to submit through Blackboard (in PDF format only), although a hardcopy submission at the undergraduate office will also be accepted.

。The report should document the results and the analysis in the assignments, in the form of figures (plots), tables, and equations, and not by listing MATLAB code as a proof of correct implementation.

。The students should use the following notation: boldface lowercase letters (e.g.  x) for vectors, lowercase letters with a (time) argument (x(n)) for scalar realisations of random variables and for elements of a vector, and uppercase letters (X) for random variables. Column vectors will be assumed unless otherwise stated, that is, x 2 RNx1 .

。In this Coursework, the typewriter font, e.g. mean, is used for MATLAB functions.

Presentation:

。The length limit for the report is 42 pages. This corresponds to ten pages per assignment in addition to one page for front cover and one page for the table of contents, however, there are no page restrictions per assignment but only for the full-report (42 pages).

。The final mark also considers the presentation of the report, this includes:  legible and correct figures, tables, and captions, appropriate titles, table of contents, and front cover with student information.

。The figures and code snippets (only if necessary) included in the report must be carefully chosen, for clarity and to meet the page limit.

。Do not insert unnecessary MATLAB code or the statements of the assignment questions in the report.

。For figures, (i) decide which type of plot is the most appropriate for each signal (e.g.  solid line, non-connected points, stems), (ii) export figuresin a correct format: without grey borders and with legible captions and lines, and (iii) avoid the use of screenshots when providing plots and data, use figures and tables instead.

。Avoid terms like good estimate, is (very) close, somewhat similar, etc - use formal language and quantify your statements (e.g. in dB, seconds, samples, etc).

。Note that you should submit two files to Blackboard:  the report in PDF format and all the MATLAB code files compressed in a ZIP/RAR format.  Name the MATLAB script files according to the part they correspond to (e.g. SEASP_Part_X-Y-Z.m).

Honour code:

Students are strictly required to adhere to the College policies on students responsibili- ties. The College has zero tolerance to plagiarism.  Any suspected plagiarism or cheating (prohibited collaboration on the coursework, including posting your solutions on internet domains and similar) will lead to a formal academic dishonesty investigation. Being found responsible for an academic dishonesty violation results in a discipline file for the student and penalties, ranging from severe reduction in marks to expulsion from College.

1    Classical and Modern Spectrum Estimation

Aims: Students will learn to:

. Understand the challenges in spectrum estimation of real–valued data.

. Consider spectral estimation as a dimensionality reduction problem and learn how to benefit from a Machine Intel- ligence approach to spectral estimation.

. Perform. practical spectrum estimation using parametric models, understand the issues with the resolution, bias and variance, so as to motivate modern subspace approaches.

. Understand the pivotal role of correct estimates of second order statistics in applications of spectral estimations, and the effects that biased vs.  unbiased estimates of the correlation function have on practical estimates of power spectra.

. Understand the benefits and drawbacks of model–based parametric and line spectra.   Learn how these spectra mitigate the problems with the bias and variance of classic spectral estimators.

. Use dimensionality reduction techniques to resolve the time–frequency uncertainty issues in frequency–based Ma- chine Intelligence.

. Learn how to deal with the problems of ill–conditioning of data correlation matrices and perform. advanced Principal Component Regression type of estimation.

. Verify these concepts on real world examples in Brain Computer Interface, and in the conditioning and estimation parameters of your own Electrocardiogram (ECG) and respiration signals.

. Understand how the above concepts can be utilised in the future eHealth.

Background.  1 jx(n)j 2  < 1, the Discrete Time Fourier Transform (DTFT) is defined as

X(ω) =     x(n)eωn      (DTFT).                                                         (1)

n= 1

We often use the symbol X(ω) to replace the more cumbersome X(eω ). The corresponding inverse DTFT is given by

x(n) = ππ X(ω)eωndω        (inverse  DTFT).                                              (2)

This can be verified by substituting (2) into (1). The energy spectral density is then dened as

S(ω) = jX(ω)j 2           (Energy  Spectral  Density).                                              (3)

A straightforward calculation gives

一π(π) S(ω)dω = 一π(π) nm1 x(n)x(m)eω(n m)dω = nm1 x(n)x(m) 一π(π) eω(n m)dω] = n jx(n)j 2 .

In the process, we have used the equality   111 e|ω(n m)dω = δn,m  (the Kronecker delta).  Equation (4) can be now restated as

n=Σ11 jx(n)j 2  = ππ S(ω)       (Parseval\ s theorem).                                        (5)

For random sequences we cannot guarantee finite energy for every realisation (and hence no DTFT). However, a random signal usually has a nite average power, and can therefore be characterised by average power spectral density (PSD). We assume zero mean data, Efx(n)g = 0, so that the autocovariance function (ACF) of a random signal x(n) is defined as

r(k) = Efx(k)x* (k - m)g       (Autocovariance  function  ACF).                               (6)

The Power Spectral Density (PSD) is defined as the DTFT of the ACF in the following way

P (ω) =     r(k)eωk           Deinition 1 of  Power  Spectral  Density.                             (7)

k= 1

ππ P (ω)ekω ππ P (ω)ekω dω = Σ 1 r(l) ππ e(k l)ωdω] = r(k).


Observe that

r(0) = ππ P (ω)dω .                                                                   (8)

Since from (6) r(0) = Efjx(n)j 2 g measures the (average) signal power, the name PSD for P (ω) is fully justified, as from (8) it represents the distribution of the (average) signal power over frequencies. The second definition of PSD is given by

!     ' N -1                          ' 2 J

P (ω) =  E  Σ x(n)e-nω '    {             Deinition 2 of  Power  Spectral  Density.           (9)

(     ' n=0                          '  )

1.1    Properties of Power Spectral Density (PSD)

Approximation in the definition of PSD.

Show analytically and through simulations that the definition of PSD in (7) is equivalent to that in (9) under a mild    [5] assumption that the covariance sequence r(k) decays rapidly, that is [1]

           jkjjr(k)j = 0.                                                           (10)

k= - (N -1)

Provide a simulation for the case when this equivalence does not hold. Explain the reasons.

1.2    Periodogram-based Methods Applied to RealWorld Data

Now consider two real–world datasets: a) The sunspot time series and b) an electroencephalogram (EEG) experiment.

a)  Apply one periodogram-based spectral estimation technique (possibly after some preprocessing) to the sunspot time    [10] series. Explain what aspect of the spectral estimate changes when the mean and trend from the data are removed

(use the MATLAB commands mean and detrend).  Explain how the perception of the periodicities in the data changes when the data is transformed by first applying the logarithm to each data sample and then subtracting the sample mean from this logarithmic data.

The basis for brain computer interface (BCI).

b)  The electroencephalogram (EEG) signal was recorded from an electrode located at the posterior/occipital (POz)    [10] region of the head.  The subject observed a flashing visual stimulus (flashing at a fixed rate of X Hz, where X is

some integer value in the range [11, . . . , 20]). This induced a response in the EEG, known as the steady state visual evoked potential (SSVEP), at the same frequency.  Spectral analysis is required to determine the value of ‘X’. The recording is contained in the EEG_Data_Assignment1.mat file which contains the following elements:

。POz – Vector containing the EEG samples (expressed in Volts) obtained from the POz location on the scalp,

fs Scalar denoting the sampling frequency (1200 Hz in this case).

Read the readme_Assignment1.txt file for more information.

Apply the standard periodogram approach to the entire recording, as well as the averaged periodogram with differ- ent window lengths (10 s, 5 s, 1 s) to the EEG data. Can you identify the the peaks in the spectrum corresponding to SSVEP? There should be a peak at the same frequency as the frequency of the flashing stimulus (integer X in the range [11, . . . , 20]), known as the fundamental frequency response peak, and at some integer multiples of this value, known as the harmonics of the response. It is important to note that the subject was tired during the recording which induced a strong response within 8-10 Hz (so called alpha-rhythm), this is not the SSVEP. Also note that a power-line interference was induced in the recording apparatus at 50 Hz, and this too is not the SSVEP. To enable a fair comparison across all spectral analysis approaches, you should keep the number of frequency bins the same.

Hint: It is recommended to have 5 DFT samples per Hz.

How does the standard periodogram approach compare with the averaged periodogram of window length 10 s?

Hint: Observe how straightforward it is to distinguish the estimated SSVEP

peaks from other spurious EEG activity in the surrounding spectrum.

In the case of averaged periodogram, what is the effect of making the window size very small, e.g.  1 s?

1.3    Correlation Estimation

Unbiased correlation estimation and preservation of non-negative spectra. Recall that the correlation-based definition of the PSD leads to the so-called correlogram spectral estimator given by

P (ω) =          ˆ(r)(k)ejωk                                                                                             (11)

k= - (N -1)

where the estimated autocorrelation functionˆ(r)(k) can be computed using the biased or unbiased estimators given by

Biased:  ˆ(r)(k) =  x(n)x* (n - k)                 (12)

n=k+1

Unbiased:  ˆ(r)(k) = nx(n)x* (n - k)       (0     k     N - 1).           (13)

Although it may seem that the unbiased estimate is more appropriate as its mean matches the true mean of PSD, observe that this estimate (despite being exact) can be highly erratic for larger lags k (close to N), where fewer samples are available to estimate the PSD. As a consequence, the ACF may not be positive definite, resulting in negative PSD values.

a)  Write a MATLAB script which calculates both biased and unbiased ACF estimates of a signal and then use these     [10]

ACF estimates to compute the corresponding correlogram in Eq.  (11).  Validate your code for different signals e.g. WGN, noisy sinusoidal signals and filtered WGN. Explain how the spectral estimates based on (12)-(13) differ from one another?  In particular, how does the correlogram corresponding to the unbiased ACF estimates behave for large lags (i.e. kclose to N)? Does the unbiased ACF estimate result in negative values for the estimated PSD?

Plotting the PSD in dB. Depending on the estimation approach, the spectral estimate P(ˆ)(ω) can be asymptotically

unbiased with variance μP2 (ω), where μ > 0 is a constant.  When several realisations of a random signal are available,

it is possible to present the estimate PSD as a condence interval dened by P(ˆ)(ω) μˆ(σ)P (ω), where P(ˆ)(ω) and ˆ(σ)P (ω)

are respectively the mean and standard deviation of the estimated PSDs of the available observations.  A drawback of this approach is that, as stated earlier, the standard deviation is proportional to the value of the PSD and therefore the confidence interval widens in zones where the PSD increases, and it is these parts that we are particularly interested in. Fig. 1 shows an overlay plot of 100 realisations of the PSD of two sinusoids immersed in i.i.d. WGN showing the mean (top), and the standard deviation of the set (bottom).

For ease of presentation, by plotting the PSD estimates in decibels we observe a more condensed realisation due to the contraction property of the logarithm.

b)  Use your code from the previous section (only the biased ACF estimator) to generate the PSD estimate of several     [5] realisations of a random process and plot them as in Fig.  1.  Generate different signals composed of sinusoids

corrupted by noise and elaborate on how disperse are the different realisation of the spectral estimate. Hint: use the fft and fftshift commands in MATLAB.

c)  Plot your estimates in dB, together with their associated standard deviation (again as in Fig. 1 for comparison).    [5]

How much spread out are the estimates now? Comment on the benefits of this representation.

Frequency estimation by MUSIC. In order to accurately estimate the spectrum of closely-spaced sine waves using the periodogram, a large number of samples N is required since the frequency resolution of the periodogram is proportion- ate to 1/N. On the other hand, subspace methods assume a harmonic model consisting of a sum of sine waves, possibly complex, in additive noise. In this setting, the noise is also complex-valued.

For illustration, consider a complex-valued signal of 30 samples in length, generated using the following code: n  =  0:30;

noise  =  0.2/sqrt(2) *(randn(size(n))+1j *randn(size(n)));

x  =  exp(1j *2 *pi *0.3 *n)+exp(1j *2 *pi *0.32 *n)+  noise;

The signal consists of two complex exponentials (sine waves) with frequencies of 0.3 Hz and 0.32 Hz and additive complex white Gaussian noise. The noise has zero mean and variance of 0.2.

The spectral estimate using the periodogram (rectangular window, 128 frequency bins and unit sampling rate) is shown in Fig. 2. Observe that the periodogram was not able to identify the two lines in the spectrum; this is due to the resolution of the periodogram being proportionate to 1/N, which is greater than the separation between the two frequencies.

PSD estimates (diferent realisations and mean)

300

 

200

 

100

 

0

0            5           10          15          20          25          30          35          40

Frequency [π radians ]

Standard deviation of the PSD estimate

40

20

00            5           10          15          20          25          30          35          40

Frequency [π radians ]

Figure 1: PSD estimates of two sinusoids immersed in noise.  Top: An overlay plot of 100 realisations and their mean. Bottom: Standard deviation of the 100 estimates.

Periodogram Power Spectral Density Estimate

15

 

10

 

5

 

0

 

−5

 

−10

 

−15

 

−20

 

−25

400

Frequency (mHz)

Figure 2: Periodogram of two complex exponentials with closely-spaced frequencies.

d)  Familiarise yourself with the generation of complex exponential signals, and generate signals of different frequen-     [5]

cies and length.  Verify that by considering more data samples the periodogram starts showing the correct line spectra.

e)  Use the following code to find the desired line spectra using the MUSIC method.

[X,R]  =  corrmtx(x,14,’mod’);

[S,F]  =  pmusic(R,2,[  ],1,’corr’);

plot(F,S,’linewidth’,2);  set(gca,’xlim’,[0.25  0.40]);

grid  on;  xlabel(’Hz’);  ylabel(’Pseudospectrum’);

Explain the operation of the first three lines in the code using the MATLAB documentation and the lecture notes.     [10]

What is the meaning of the input arguments for the functions corrmtx and pmusic? Does the spectrum estimated using the MUSIC algorithm provide more detailed information? State briefly the advantages and disadvantages of the periodogram and the MUSIC algorithms and comment on the bias and variance. How accurate would a general spectrum estimate be when using MUSIC?

1.4    Spectrum of Autoregressive Processes

In many spectrum estimation applications, only short data lengths are available; thus, classical spectrum estimation tech- niques based on the Fourier transform. will not be able to resolve frequency elements spaced close to one another. In order to solve this problem, we can use modern spectrum estimation methods based on the pole-zero modelling of the data.

Consider a general ARMA(p, q) process given by

y(n) = a1 y(n — 1) + · · · + apy(n — p) + w(n) + b1 w(n — 1) + · · · + bq w(n — q)

The power spectrum of y has the form.

Py (ejω ) =    jPk(q)=1 bke-jkω j 2     

Thus, the power spectrum can be estimated through the parameters (a1 , ..., ap , b1 , .., bq ).  The assumption of an under- lying model for the data is the key difference between classical and modern spectrum estimation methods.

For an AR process in particular, the power spectrum is the output of an all-pole filter given by

The parameters σw(2) and a =

Py (ejω ) = 

[a1      . . .   ap]T can be estimated by a set of (p + 1) linear equations

r(r)x(x) 1(0)x(x) 0(1)       .(.)    .(.)    .(.)   rxp(x)(p)1)      a11                        0(1)  

.              .           .                       .                    .        = σw(2)        .(.)

rx p)   rx (p.1)    .    .    .(.)       rx 0)          a.p                        0(.)  

where rx (k) could be calculated using the biased autocorrelation estimate

rx (k) =  NX(-1-)k x(n + k)x(n)

n=0

or the unbiased autocorrelation estimate

rx (k) =  Nk x(n + k)x(n)

a)  Based on your answers in Section 2.1, elaborate on the shortcomings of using the unbiased ACF estimate when     [5] finding the AR parameters? [see Eq. (13)]

b)  Generate 1000 samples of data in MATLAB, according to the following equation                                                            [10]

x(n) = 2.76x(n — 1) — 3.81x(n — 2) + 2.65x(n — 3) — 0.92x(n — 4) + w(n)

where w    Ⅵ (0, 1) and discard the rst 500 samples (x=x(500:end)) to remove the transient output of the lter. Estimate the power spectrum density of the signal using model orders p  = 2, ..., 14 and comment on the effects of increasing the order of the (assumed) underlying model by comparing the estimation to the true Power Spectral Density. Only plot the results of the model orders which produced the best results.

c)  Repeat the experiment in b) for data length of 10, 000 samples. What happens to the PSD when the chosen model     [5] order is lower (under-modelling) or higher (over-modelling) than the correct AR(4) model order?

1.5    Real World Signals: Respiratory Sinus Arrhythmia from RR-Intervals

Important change Section 1.5 of the CW:

If you have taken Statistical Signal Processing & Inference last year, then you already have your own ECG data from the wrists, and please proceed with this Assignment;

If you do not have your own ECG recordings, then we will provide the data.  We will an email to the class with the data and explanations.

Respiratory sinus arrhythmia (RSA) refers to the modulation of cardiac function by respiratory effort. This can be readily observed by the speeding up of heart rate during inspiration (“breathing in”) and the slowing down of heart rate during expiration (“breathing out”). The strength of RSA in an individual can be used to assess cardiovascular health. Breathing at regular rates will highlight the presence of RSA in the cardiac (ECG) data.

a)  Apply the standard periodogram as well as the averaged periodogram with different window lengths (e.g. 50 s, 150     [10] s ) to obtain the power spectral density of the RRI data. Plot the PSDs of the RRI data obtained from the three trials

separately.

b)  Explain the differences between the PSD estimates of the RRI data from the three trials? Can you identify the peaks    [5] in the spectrum corresponding to frequencies of respiration for the three experiments?

c)  Plot the AR spectrum estimate for the RRI signals for the three trials.   To find the optimal AR model order,    [10] experiment with your model order until you observe a peak in the spectrum (approximately) corresponding to

the theoretical respiration rate.  List the differences you observe between your estimated AR spectrum and the periodogram estimate in Part a).


站长地图