代写Lab Week 7: Background Reading代做Python语言
- 首页 >> C/C++编程Lab Week 7: Background Reading
Part A. Parsing Dates and Times:
Important: Make sure to have done the reading for wk07 from the text book before working through this notebook as it uses loop syntax reviewed in that reading.
The code cell below contains an example of converting numeric date information (years, months, days) into a so-called datetime object.
↗ datetime is a core python library used to handle basic date and time formatting. From the datetime library, we will use the datetime.datetime class which is combination of a date and a time information. It is most convenient for us to import only the datetime class from the datetime library. i.e.:
from datetime import datetime
Note that for this example and for the lab, the datetime.datetime class defaults to 00:00 minutes and seconds when no time is provided.
Many other python libraries are compatible with the datetime library, which makes it especially useful when used with other libraries such as matplotlib to create nicely labelled axes when plotting time series.
Date and time information can be accessed easily from a datetime object as an attribute, using dot-notation. e.g. for a datetime object named dtime , we can access the year as dtime.year , or the month as dtime.month .
In [1]: import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
# example array of dates (rows = [year, month, day]) as integers:
dates_int = np.array([[1960, 12, 1], [1998, 2, 14], [2001, 10, 28]])
print('Starting data:')
for ii, date in enumerate(dates_int): # see Tues class notes for `enume
print(f'row {ii}: {date}')
n_measurements = dates_int.shape[0]
print('\nConverting to datetime objects:')
# initialize an empty list to append datetime objects to:
my_datetimes = []
# loop through each row in array
for ii, date in enumerate(dates_int):
the_date = datetime(date[0], date[1], date[2]) # passing in yr, mo,
print(f'row {ii}: {the_date}')
my_datetimes.append(the_date) # append each date one at a time to `
# I can now turn this list into an array using np.array
# to do this I need to specify the data type of my array as object)
my_datetimes_arr = np.array(my_datetimes, dtype=object)
print('\nAfter converting:')
print(f'{my_datetimes = }')
print('\nMore context:')
print(f'{type(my_datetimes) = }')
print(f'{type(my_datetimes[0]) = }')
print(f'{my_datetimes[0] = }')
print(f'{my_datetimes[0].year = }')
print(f'{my_datetimes[0].month = }')
print(f'{my_datetimes[0].day = }')
print('\nCheck my array of datetime objects')
print(f'{type(my_datetimes_arr)=}')
print(f'{my_datetimes_arr.dtype=}')
print(f'{my_datetimes_arr[0] = }')
Starting data:
row 0: [1960 12 1]
row 1: [1998 2 14]
row 2: [2001 10 28]
Converting to datetime objects:
row 0: 1960-12-01 00:00:00
row 1: 1998-02-14 00:00:00
row 2: 2001-10-28 00:00:00
After converting:
my_datetimes = [datetime.datetime(1960, 12, 1, 0, 0), datetime.datetime(19 98, 2, 14, 0, 0), datetime.datetime(2001, 10, 28, 0, 0)]
More context:
	type(my_datetimes) = 
	type(my_datetimes[0]) = 
my_datetimes[0] = datetime.datetime(1960, 12, 1, 0, 0)
my_datetimes[0].year = 1960
my_datetimes[0].month = 12
my_datetimes[0].day = 1
Check my array of datetime objects
	type(my_datetimes_arr)=
my_datetimes_arr.dtype=dtype('O')
my_datetimes_arr[0] = datetime.datetime(1960, 12, 1, 0, 0)
ALTERNATIVE APPROACH
See Tues class on pre-allocating arrays
I could also have done this by pre-allocating an array of datetime objects
In [7]: # array pre-allocation
my_datetimes_arr2 = np.full(dates_int.shape[0], 'nan', dtype = 'O')
for ii, date in enumerate(dates_int):
my_datetimes_arr2[ii] = datetime(date[0], date[1], date[2])
print('\nSolution 2:')
print(f'{my_datetimes_arr2 = }')
# check the solutions are the same
dbool = my_datetimes_arr == my_datetimes_arr2
print(dbool) #check you understand dtype and shape
Solution 2:
my_datetimes_arr2 = array([datetime.datetime(1960, 12, 1, 0, 0),
datetime.datetime(1998, 2, 14, 0, 0),
datetime.datetime(2001, 10, 28, 0, 0)], dtype=object)
[ True True True]
Part B. Background Reading for Lab
Creating a Running Mean Algorithm
It is often very useful to be able to smooth a time series to better display broad trends by averaging away the short-timescale variability. A simple, effective way of smoothing a time series is to use a running mean.
Let's say we have $N$ data points recorded at some even time interval. We will number them 1,2,3,. . .,$i$−1, $i$, $i$ + 1,. . .,$N$. (In a numpy array they will have indices $0, 1, 2, ..., N-1$.
To calculate a 5 point running mean, then the 5th point in the smoothed time series will be the average of points 3, 4, 5, 6 and 7 from the original time series (i.e. averaging 5 points, centered on the 5th point).
The 6th point in the smoothed time series will be the average of points 4, 5, 6, 7,and 8 from the original time series.
The 7th point will be the average of points 5, 6, 7, 8, and 9.
No description has been provided for this image
Figure 1: An example of the 5 point running mean. The $i$th element of $z$ will be 9.2, an average of 1, 12, 21, 7, and 5. The next element will be 9.4, an average of 12, 21, 7, 5, and 2, and so on.
We call the number of points to be averaged our window length, and the term running mean comes from the fact that we are iteratively moving this window along our original array, one point at a time, each time averaging points (5 in this case), centered on the current point (Figure 1 above).
The averaging and the sliding of the window produces a smoothed version of the data.
The more points in our window, the more we smooth the original data set.
Figure 2 shows an example with some synthetic data representing a measurement taken at every hour for 5 days.
The original data are a noisy sine wave with a period of 1 day, and are shown with a blue line and * s at the times they were measured.
The 5-point (i.e. 5-hr in this case) running mean and 25-point (25 hour) running mean are shown.
You can see that smoothing over 25 hrs (a whole day) basically gives the daily average, smoothing over 5 hrs retains the daily sine wave signal, but with the noise smoothed out.
You'll also notice that the smoothed data aren't plotted near the ends of the time series, and this effect gets bigger as the window length increases: the orange curve starts 2 points in from each end of the original data set, the green curve starts 12 points in from the end of the original data set. This is because as the window approaches the ends of the time series it runs out of points to either the left or the right of the mid-point. The ends need special treatment - you'll deal with this in the lab.
No description has been provided for this image
Figure 2: Example of running mean with smaller (5 pt) and larger (25 pt) window length.
How do we describe this mathematically?
We start with a time series $x$, whose $i$'th element (in Python) is $x_i$ where $i= 0. . . N-1$.
Then we form. a new time series $z$, again with $N$ elements, for which the $i$'th point $z_i$ is an average of points in $x$ within the window.
For example, if the window has a length of 5:
$$ z_i = \frac{x_{i-2} + x_{i-1} + x_{i} + x_{i+1} + x_{i+2}}{5}\tag{1} $$
$$ z_i = \frac{1}{5}\sum_{k=-2}^2 x_{i+k}\tag{2} $$
More generally, for a window length, $L$ ($L$ an odd number), then:
$$ z_i = \frac{1}{L}\sum_{k=-W}^Wx_{i+k}\tag{3} $$
where:
$$ W = \frac{L-1}{2}\tag{4} $$
Check for yourself that for a window length, $L =$ 5, gives W = 2 (equation 4), so that equation 3 is equivalent to equation 2.
Part C: Calculation by hand to use in the lab.
Set a time series $x$ to be the array x = np.array([1, 5, 3, 7, 9, 4, 6, 9, 7, 4, 10, 6, 2]) . This will allow you to be able to test / debug your code by doing some simple calculations by hand.
Assume a window length $L = 3$. Calculate by hand the value of the running mean at the 3rd position (the element at position with index = 2) in $z$? i.e. $z_i = z_2$.
Write down the formula for $z_2$ and compute what is in it given the values in $x$.
Similarly, what should be in $z_{11}$?
	
	
	
