代做APS106 – Lab #9代做留学生SQL语言程序

- 首页 >> Matlab编程

APS106 – Lab #9

In this (final!) lab of APS106, you will practice using the pandas package and analyze Toronto bikeshare data from 2017-2018.

Lab Objectives

Load data from csv files into pandas DataFrames

Add columns to pandas DataFrames

Concatenate pandas DataFrames with similar columns

Extract information from datasets using pandas DataFrame. methods

Lab Deliverables

The following files must be submitted to Gradescope prior to the assignment deadline:

lab9.py

For this assignment, you should complete the following functions:

create_bikeshare_df (BONUS MARKS ONLY – See below)

rank_station_usage

average_trip_duration_by_membership

A Note on the Gradescope autograder. Five testcases are provided on Gradescope to help you prepare your solution. Passing all these testcases does not guarantee your code is correct.

You will need to develop your own testcases to verify your solution works. Your programs will be graded using ten secret testcases. These testcases will be released after the assignment

deadline.

IMPORTANT:

•    Do not change the filename or function names

•    Do not use input() inside your program

Problem

This week we will imagine that you have been contracted by a bikeshare company in Toronto to

analyze data regarding their bike and station usage in 2017 and 2018. Specifically, they would like to know the following:

1.   Within 2017 and 2018, which stations were used most?

2.   Within 2017 and 2018, what was the average trip duration for members and non-members? The bikeshare company has recorded the following information for each trip in 2017 and 2018:

1.   trip_id – a number identifying the trip

2.   trip_start_time – the date and time of the start of the trip

3.   trip_stop_time the date and time of the end of the trip

4.   trip_duration_seconds the length of the trip in seconds

5.   from_station_id – the ID number of station where the bike was taken from at the start of the trip

6.   from_station_name – the name of the station where the bike was taken from at the start of the trip

7.   to_station_id – the ID number of the station where the bike was returned at the end of the trip

8.   to_station_name – the name of the station where the bike was returned at the end of the trip 9.   user_type – Whether the trip was taken by a member or a non-member

The bikeshare company has provided you with 8 csv files containing this trip data. Each file

contains trip information for one quarter (three month period) within 2017 and 2018. The files are

named using the following structure: “Bikeshare Ridership (year Qquarter number).csv” where year and quarter number are replaced with the year and quarter number when the data was collected.    You can download the files from Quercus. We recommend that you open the files and review their   structure before moving to the next part of this lab.

(OPTIONAL BONUS PART) Part 1 - Create a single DataFrame for all the data

Update April 9, 2024 – Using the pandas concat function was removed from the content of

APS106 W2024 after the lab was designed. You are NOT responsible for understanding how to use this function for the final exam.

Because this part of the lab was designed to have you use the concat function to create a single DataFrame with all the data from 2017 and 2018, we are making this part of the lab optional. If

you complete this function and pass the tests on Gradescope, you may receive up to 4 bonus

points on this lab that can be applied to increase your grade on any other labs where you did not

receive full marks. The bonus marks will not be applied to any other portion of the course (i.e., term tests, final exam, reflections) and your final total lab grade cannot exceed 100%.

If you choose not to complete this part of the lab, you may download the file

all_bikeshare_data.csv from quercus to complete and test the remainder of the lab.

In this part of the lab you will complete the create_bikeshare_df function. This function accepts a list of filename strings as input and returns a pandas DataFrame. containing the following columns:

Column name

Type

Description

trip_id

int

A number identifying the trip

trip_start_time

str

The date and time of the start of the trip

trip_stop_time

str

The date and time of the end of the trip

trip_duration_seconds

int

The length of the trip in seconds

from_station_id

float

The ID number of station where the bike was taken from at the start of the trip

from_station_name

str

The name of the station where the bike was taken from at the start of the trip

to_station_id

float

The ID number of the station where the bike was returned at the end of the trip

to_station_name

str

The name of the station where the bike was returned at the end of the trip

user_type

str

Either “Member” or “Casual” (i.e., non-member)

quarter

int

The quarter during which the trip was taken

year

int

The year during which the trip was taken

Columns 1-9 come directly from reading the csv files. Your function will need to add columns 10 (quarter) and 11 (year) to the DataFrame. The quarter value in each row should be an int (either 1, 2, 3, or 4) and the year should be an int (either 2017 or 2018). Your function should return a single DataFrame with all these columns. Your function should read each file within the list of files input to the function and concatenate the data from each file into a single DataFrame.

Example usage

Part 2 - Rank stations by usage

In this part of the lab, you will complete the rank_station_usage function. This function will analyze bikeshare data within a specific year and quarter to determine the stations that were used most frequently within that year and quarter. The function takes the following inputs:

1.   bikeshare_df- A pandas DataFrame. containing bikeshare data formatted like the DataFrame. returned by the create_bikeshare_df function

2.   year- an int specifying the year we want to analyze

3.   quarter- an int specifying the quarter we want to analyze

The function should return a DataFrame. that is indexed by station names and has the following columns:

Column name

Type

Description

departure_count

int

The total number of trips that started at   that station during the specified year and quarter

return_count

Int

The total number of trips that ended at that station during the specified year and quarter

combined_departure_return_count

int

The sum of the departure and return count columns

The returned DataFrame’s rows should be sorted in descending order according to the

combined_departure_return_count values (i.e., the station with the highest total usage should be in the first row of the returned DataFrame).

Example usage

Part 3 - Average trip length by membership status

In this part, you will complete the average_trip_duration_by_membership function. This function will compute the average trip length for both members and non-members (labeled as

“Casual” under the user_type column) during each year and quarter in the dataset.

The function will accept a single DataFrame. input, bikeshare_df,which will be formatted like the DataFrame. returned by the function in part 1.

The function should return a DataFrame with the following columns:

Column name

Type

Description

year

int

The year for which the average trip length was

calculated

quarter

int

The quarter for which the the average trip length was calculated

member_length

float

The average trip length in seconds for members during the year and quarter

non_member_length

float

The average trip length in seconds for non-members during the year and quarter

Example usage





站长地图