代写COMP207The Epature bus company database
- 首页 >> CSCOMP207 – Assignment 1: The Epature bus company database
There are 100 points you can get in this assignment. Recall that the assignment contributes 25% to your overall grade in the course.
The assignment is to create a database for a bus company called Epature: The initial task will focus on building the database (i.e., tables and so on), while the latter tasks focus on creating useful queries.
Questions 2-7 give some number of points each as is marked on them.
Public data: For each question, 2 points are given for getting the right output on the public test data described in an additional file. The question will specify what this right output is, and you will also be able to see it when you upload your solution to CodeGrade (since you can upload any number of times before the deadline, I would suggest using it to test your solution against the test data)
Hidden data: The remaining points are given for getting the right output on another data set. The latter set is kept hidden to avoid you hardcoding the right output in the queries. It will follow the same form as the test data though and unless you hardcode your solution for the test data, it is very likely that if you get points for one part you will get points for the other. (Note that you do get the grade at once for task 1).
Because I won’t have much time to help each of you individually (I think there will be around 550 of you so you do the math), I would prefer that you put any questions you had on the discussion board. That said, you are NOT meant to include your code on the discussion board – simply reference the code you uploaded to CodeGrade if needed. This also means that I would prefer that you try first on your own and/or check the discussion board (because with so many of you, most questions will have been asked and answered at the time most of you think of them and it is faster to look it up for you than waiting for me to answer it again!) and then ask.
How much help you can expect if you ask for it: I am willing to help you a lot if you get stuck in questions 1-2 (but please, spend some time on them first – nearly everybody – think >95% - won’t have much trouble with them). For questions 3-5 I will come up with suggestions and ideas to help guide you to a solution (but again, please try first! The questions already have some hints so check those first). Questions 6-7 are meant to be hard. If you do the questions in order, these questions will get your grade from 80 to 100. According to the university’s marking scheme, you need to be able to answer every question perfectly to get a grade in the range 70-80(!). Therefore, I will only help clarify what the questions are asking you to do but not actually help you solve them.
Each question from 2-7 requires you to create a view with a specified (in the question) name. You may use as many views as you wish to solve each question (well, except question 1, since it would not be helpful), but there should be one with the specified name, which is the one that is getting checked for having the right output. E.g., you could have view1ForQuestion2, view2ForQuestion2 and view3ForQuestion2, if you feel it would help to do question 2 – most other names are fine too!
ChatGPT: You are not allowed to use ChatGPT.
Group work: While you might discuss with your friends/colleagues, you must hand in a solution only you worked on, and plagiarism checks will be run.
Deadline and feedback
The deadline for the assignment is Wednesday the 8th of November at 17:00. General feedback for the assignment will be given on Tuesday the 21st of November. The relatively long period between those is because you can get an ELP and they only last until I give feedback. To accommodate people with ELPs as well as I can, I will therefore first give feedback nearly 2 weeks after the deadline (2 weeks is the maximum amount of time an ELP can give you).
If you have specific concerns about your grade or similar for the first assignment, then, after the general feedback has been released, I will answer questions about your solution and grade over email.
Format
The assignment should be done in .sql format (i.e., the output format from MySQL’s workbench) – it is really just a basic text file with the SQL commands written in it and you could do it by writing the file directly in a basic text editor if you wish (Notepad in Windows or TextEdit on Mac – if you select Make Plain Text in Format).
The name of the file should be epature.sql: You can hand in precisely 1 file, and it must have precisely that name (you can submit as many times as you wish until the deadline, but only the most recent version counts).
And each line should contain only the following:
1.CREATE TABLE statements for question 1 (6 in total)
2.CREATE VIEW statements for questions 2-7 (the number of views depends on how you solve the questions and how many you solve, if not all). Note, that you may use any positive number of views to solve each question, but each question’s specified view should have the properties requested.
3.SQL comments, i.e. the part of lines after “-- “, i.e. double - followed by space. You do not need to make any, but may do so if you wish.
In particular, do not include CREATE DATABASE and USE statements. They will make the tests on the hidden data not work (technically, I create two databases based on your construction, one with the public data given in the additional file and one with the hidden data. If you use CREATE DATABASE or USE statements, in essence only 1 is made and it will be marked as if you did not do the other one). I have written tests that test for this so if you do not do something really complex to avoid these, you should at least be warned about it.
You can include INSERT statements, but the database will be emptied before checking, so it serves little purpose (often, many people will submit with the test data already inserted, but, because the databases will be emptied, it will not matter).
It is very unlikely that you would want to remove any views after you have made them. Unless you have a very good reason, do not remove them.
Make sure that you can run the full file through MySQL when using the Epature database (starting with an empty Epature database) and after having done so, the Epature database should contain the tables and views required from the questions you solved (and perhaps some more views if you feel it would be convenient). This means that you should remove any statement that causes errors before handing in the assignment, because MySQL stops when it encounters an error (meaning that the last statements are not executed)! If you do not, you risk getting a far lower grade than otherwise (because the part of your hand-in after the first error will not be graded).
You can submit any number of times before the deadline: We are using CodeGrade for checking these things and whenever you submit, you will see whether your file works for the public dataset. I suggest using it…
Do not do the following: Any of the following should not be done:
End by removing the database (i.e. DROP DATABASE Epature; or similar). It would be the same as handing in an empty file. It will be very easy to see on CodeGrade since everything will stop working.
Create comments like “------------". MySQL Workbench will accept it, but the command line version of MySQL does not, which is what is used to check your file... Just insert an extra space after the second -.
Use any other order for the columns than what is specified. Since the insert command does not state which columns they insert into, you will put the information in the wrong column and then get hard-to-understand issues when you attempt to solve the questions.
Use PARTITION OVER or other new commands. It was not taught in class and is specific to newer versions of MySQL (like the one you would install on your own laptop). Unfortunately, CodeGrade is using an old version of Ubuntu (or at least did last year), where the newest versions of MySQL do not work. Therefore, it will not work when we grade you and you will fail that (and later questions – MySQL will report an error on that line and will not run the rest of your file).
Question 1 – (Easy) (worth 18 points – 3 points for each table)
Make the set of tables that match the following set of schemas.
Customers(birth_day, first_name, last_name, c_id)
Employees(birth_day, first_name, last_name, e_id)
BusType(capacity,type)
BusTrip(start_time,route_no,type*,e_id*,b_id)
TicketCosts(cost,duration)
CustomerTrip(from_stop_no,to_stop_no,b_id*,c_id*)
Each underlined attribute should be the primary key for the table (it happens to be the last attribute in each table besides CustomerTrip where it is the last two together) and each attribute with * should have a foreign key to the table with a primary key of just that name, e.g. if the tables were R(a,b) and S(b*,c), b in R and c in S should be the primary keys and b in S should reference b in R as a foreign key. More directly, type and e_id in BusTrip and b_id and c_id in CustomerTrip should reference the primary keys of BusType, Employees, BusTrip and Customers respectively.
Only use data types in the following list: INT, VARCHAR(20), DATE, DATETIME. Instead of specifying the datatypes explicitly, ensure that the test data defined in the additional file gets inserted correctly (it seems very likely that you would also guess the same datatypes as these suggest – except perhaps duration which is a VARCHAR(20) – it will be one of the strings single, day or week) and use DATE or DATETIME if all entries are dates or date-times. If you follow all of these requirements, each attribute should have a clear, unique datatype (which happens to likely be what you would guess it to be). As an aside, the costs are measured in pennies (and not directly pounds), to avoid precision issues with floating point numbers.
Question 2 – (Easy) (worth 21 points – 2 points for getting the right output on the test data and another 20 for the hidden data – see the first page for more detail!)
We are considering giving Louise Davies a raise but want to check how many bus trips she has made in September 2023 first. More precisely, you are asked to create a view LouiseTrips with number_of_trips which should be how many trips were done by Louise Davies in September 2023 (you may assume she did some and that she is the only employee with that full name – you should NOT assume that her employee id is 4 just because it is in the public data!).
Note that in the test data, Louise did 11 trips, of which 8 where in September 2023, 1 in September 2022, 1 in August 2023 and 1 in October.
HINT: Recall that COUNT(*) will count how many rows you have in the output, so you just need to make a query that finds the bus trips that were done in September 2023 by Louise Davies and then use that.
The view should be called LouiseTrips and be such that the output of
SELECT * FROM LouiseTrips;
when run on the Epature database (after inserting the test data given in an additional file) should be:
number_of_trips
8
Question 3 – (Easy-medium) (worth 15 points – 2 points for getting the right output on the test data and another 13 for the hidden data – see the first page for more detail!)
The bus company want to change route 102 but wants to hear from employees and customers who have been on that route first. You are meant to help with that and thus should find the customers and employees that have been on route 102 and you should return the birth_day, first_name,last_name of those people. HINT: You should likely use UNION.
The view should called be PeopleOnRoute and be such that the output of
SELECT * FROM PeopleOnRoute ORDER BY last_name, first_name;
when run on the Epature database (after inserting the test data given in an additional file) should be:
birth_dayfirst_namelast_name
1992-12-17SofiaAdams
1989-12-31ChloeAllen
1986-03-15ChristopherAnderson
1985-10-23AveryBaker
1992-09-05EmilyBrown
1984-09-02AvaClark
1992-11-10LouiseDavies
1991-02-18SarahDavis
1994-09-29EthanGreen
1988-06-19LilyHall
1996-08-17AndrewHarris
1985-11-25DavidJones
1993-04-06JacobKing
1990-11-11WilliamLewis
1988-04-30MatthewMiller
1989-10-12DanielMoore
1987-01-04MiaRobinson
1986-07-07GraceScott
1985-08-20JaneSmith
1990-01-15JohnSmith
1994-12-08SophiaTaylor
1993-05-28EmmaThomas
1995-02-25JamesWalker
1987-06-10MichaelWilliams
1997-07-22OliviaWilson
1991-08-14BenjaminYoung
(it is Jane Smith and Louise Davies as employees and all customers besides Alice Smith)
Question 4 – (Medium) (worth 15 points – 2 points for getting the right output on the test data and another 13 for the hidden data – see the first page for more detail!)
A manager wants to be able to wish each employee happy birthday on their birthdays. Output all the employees (birth_day,first_name,last_name) sorted by how far in the future their birthday is, from the 8th of November (i.e. the day you are meant to hand in) – so any on the 8th of November would be the first, followed by any on the 9th of November and so on and any on the 31st of December would be before any on the 1st of January and so on. Finally, any on the 7th of November would be last.
HINT: Given a date D, you can get which month it is in using MONTH(D) and the day it is in that month using DAYOFMONTH(D). There is a similar function for the day in the year, but if you used that you run into issues with people born in a leap years (specifically birthdays in March or later would be off-by-1). One way to do the query is to find the people that still have their birthday this year in some view where you make a constant 1 in a new column and then find the ones that first have their birthday next year and give them the constant 2 in the same new column and putting those together with UNION. You can then order the UNION by the new column, the month and the dayofmonth of the birthdays.
Note, your query is NOT meant to use the current date, but specifically the 8th of November – it is likely easy to convert it to the current date, using CURDATE() but it would make the query change depending on the day it gets checked which would be annoying for me.
The view you create should be called UpcomingBirthdays and it should be such that
SELECT * FROM UpcomingBirthdays;
when run on the Epature database (after inserting the test data given in an additional file) should be:
birth_dayfirst_namelast_name
1992-11-10LouiseDavies
1988-12-22EmilyDoe
1995-04-10AliceJohnson
1990-05-15JohnDoe
1985-08-20JaneSmith
Question 5 – (Medium-Hard) (worth 10 points – 2 points for getting the right output on the test data and another 8 for the hidden data – see the first page for more detail!)
We want to ensure that the buses do not have too many passengers at any time. For each bus trip, i.e. b_id, determine if there is a stop_no so that the number of passengers that were on the bus at that time, (i.e. passengers in CustomerTrip so that from_stop_no<=stop_no<to_stop_no) is more than the bus capacity (in BusType). You can assume that the buses are empty at the start of the trip and at the end (i.e. for each bus trip in BusTrip).
HINT: You can do it with just 1 SELECT query fairly directly, but you will need to use WHERE, GROUP BY, HAVING in that case and need to have CustomerTrip on the FROM line twice (and BusType and BusTrip once). That said, it might be conceptually easier to understand if you made a view that gives the stop_no’s (i.e. both from_stop_no and to_stop_no in one column – that said, if you think about it, you only need to check for whether the bus is full on a from_stop_no – if none got on at a given stop, it can only be overfull if it was overfull already, so you really only need from_stop_no) and use that instead of one of the CustomerTrips (you will still need WHERE, GROUP BY, HAVING).
The view you create should be called OverfullBuses and it should be such that
SELECT * FROM OverfullBuses ORDER BY b_id;
when run on the Epature database (after inserting the test data given in an additional file) should be:
b_id
27
28
42
44
49
50
Question 6 – (Hard) (worth 10 points – 2 points for getting the right output on the test data and another 8 for the hidden data – see the first page for more detail!)
The bus company wants to use a scan and drive system: When you, as a customer, go on the bus, you scan your bank card and then end up having to pay for the cheapest tickets that could be used for your trips – whether they are single, day or weekly tickets.
To make it slightly easier to solve, this and the next question will together solve that.
Here, in this question, we will only look for the best tickets between single and day tickets.
For each date in which the customer took at least one trip, determine what the customer should pay for all their trips that day in total (you can see the prices of the different ticket types in TicketCosts). The output should be c_id for the customer, date for the date and cost for the total cost.
As an example, if price for a single ticket is 200 and the price for a day ticket is 500, then, if you went on two trips that day, you would pay 200*2=400, while if you went on three or more you would pay 500.
To make it a bit easier, the buses stop before midnight. Given a datetime D, you can get the corresponding date using the function DATE(D)
The view should be called PricePerDay. Because the output is very large on the example database, it is presented at the end, from pages 13 to 19.
Question 7 – (Hard) (worth 10 points – 2 points for getting the right output on the test data and another 8 for the hidden data – see the first page for more detail!)
Note, that this question is a continuation of question 6.
You are meant to extend your solution in question 6 to also handle weekly tickets. You are meant to output c_id, week, year, cost for the cost for that week in that year for that customer.
To make it easier, weekly tickets run from Monday to Sunday. Also, to make it less tedious you may assume that none takes the bus during the last and first week of the year.
HINT: You can find which week a datetime D is in using WEEK(D,7) – the 7 is for Monday. Also, you can find the year using YEAR(D).
E.g., if the weekly ticket cost was 1600, the daily ticket cost was 500 and the single ticket cost was 200, then it would be cheaper to pay 3 day tickets if you went on 3 trips on each of those days (1500), when one weekly ticket (1600). On the other hand, it would be better to buy a weekly ticket than buying 2 single tickets each day (2800 vs 1600).
The view should be called PricePerWeek and be such that the output of
SELECT * FROM PricePerWeek ORDER BY c_id, week, year;
when run on the Epature database (after inserting the test data given in an additional file) should be:
c_idweekyearcost
1202022200
1352021200
1352023200
1362023200
13720231300
1382023200
1512022200
212022200
2332023200
2362023900
2372023400
2402023200
2412022200
382023200
3162021200
3332023200
3352021200
3362023200
33720231500
4242021200
4362023200
4372023800
4382023400
4402021200
4512022200
5162021200
5332023200
5362022200
5362023200
5372023400
5462020200
5482020200
662021200
6152023200
6242021200
6332023200
6362022200
6362023400
6372023800
6402023200
6482020200
7162021200
7202022200
7352023200
7362022200
7362023400
7372023800
7402021200
7482020200
8202022200
8362023800
8372023400
8402021200
9162021200
9202022200
9332023200
9362022200
9362023600
9372023400
9402023200
9462020200
9512022200
1062021200
10362023600
10372020200
10372023600
10412022200
10512022200
11112022200
11352021200
11362022200
113620231300
11372020200
113720231600
11382023200
11462020200
11482020200
1262021200
1282023200
12152023200
12202022200
12242021200
12352023200
12362022200
12362023200
12372023800
12402023200
13242023200
13362023400
13372023600
13382023200
13482020200
1462021200
1482023200
14112022200
14332023200
14352023400
143620231100
14372020200
143720231300
14382023400
14482020200
14512022200
1512022200
15152023200
15352021200
15352023200
15362022200
15362023800
153720231000
15382023200
15402021200
15462020200
15482020200
15512022200
16112022200
16242023200
16372023400
16482020200
1712022200
1762021200
17112022200
17162021200
17242023200
17352023500
17362022200
173620231400
173720231600
17382023200
17512022200
18352023200
18362023200
183720231600
18512022200
19332023200
19362023200
19372023600
19482020200
2012022200
20202022200
20352021200
203620231200
203720231600
20382023200
20402023200
20412022200
21352021200
21362023200
213720231000
21382023400
21402021200
21402023200
22242023200
22332023200
22352023200
223620231200
223720231300
22462020200
22512022200
23112022200
23332023200
23352021200
23362023600
23372020200
23372023600
23412022200
2412022200
2482023200
24112022200
24202022200
24242021200
24352021200
24352023200
24362023500
24372023800
24382023400
24402021200
24412022200
2562021200
25112022200
25242021200
25352021200
25352023200
253620231500
253720231600
25402023200
25512022200
Output for question 6)
The view in question 6 should be called PricePerDay and be such that the output of
SELECT * FROM PricePerDay ORDER BY c_id,date;
when run on the Epature database (after inserting the test data given in an additional file) should be:
c_iddatecost
12021-08-30200
12022-05-22200
12022-12-20200
12023-09-03200
12023-09-08200
12023-09-11500
12023-09-12200
12023-09-13200
12023-09-15200
12023-09-17200
12023-09-20200
22022-01-09200
22022-10-15200
22023-08-20200
22023-09-05200
22023-09-08200
22023-09-09500
22023-09-11200
22023-09-17200
22023-10-05200
32021-04-22200
32021-08-30200
32023-02-25200
32023-08-20200
32023-09-09200
32023-09-11400
32023-09-12200
32023-09-13200
32023-09-15500
32023-09-17200
42021-06-15200
42021-10-10200
42022-12-20200
42023-09-09200
42023-09-11200
42023-09-12200
42023-09-13400
42023-09-18200
42023-09-20200
52020-11-20200
52020-12-05200
52021-04-22200
52022-09-05200
52023-08-20200
52023-09-09200
52023-09-11200
52023-09-12200
62020-12-05200
62021-02-10200
62021-06-15200
62022-09-05200
62023-04-10200
62023-08-20200
62023-09-05200
62023-09-09200
62023-09-11400
62023-09-15400
62023-10-05200
72020-12-05200
72021-04-22200
72021-10-10200
72022-05-22200
72022-09-05200
72023-09-03200
72023-09-05200
72023-09-08200
72023-09-12200
72023-09-13400
72023-09-15200
82021-10-10200
82022-05-22200
82023-09-07200
82023-09-08400
82023-09-09200
82023-09-13200
82023-09-15200
92020-11-20200
92021-04-22200
92022-05-22200
92022-09-05200
92022-12-20200
92023-08-20200
92023-09-07200
92023-09-08200
92023-09-09200
92023-09-12200
92023-09-14200
92023-10-05200
102020-09-15200
102021-02-10200
102022-10-15200
102022-12-20200
102023-09-07200
102023-09-09400
102023-09-13200
102023-09-15200
102023-09-17200
112020-09-15200
112020-11-20200
112020-12-05200
112021-08-30200
112022-03-18200
112022-09-05200
112023-09-04200
112023-09-07200
112023-09-08500
112023-09-09400
112023-09-11400
112023-09-12400
112023-09-13200
112023-09-14400
112023-09-17200
112023-09-20200
122021-02-10200
122021-06-15200
122022-05-22200
122022-09-05200
122023-02-25200
122023-04-10200
122023-09-03200
122023-09-08200
122023-09-11200
122023-09-12200
122023-09-13200
122023-09-15200
122023-10-05200
132020-12-05200
132023-06-15200
132023-09-05200
132023-09-08200
132023-09-11200
132023-09-15200
132023-09-17200
132023-09-20200
142020-09-15200
142020-12-05200
142021-02-10200
142022-03-18200
142022-12-20200
142023-02-25200
142023-08-20200
142023-09-03400
142023-09-05200
142023-09-08400
142023-09-09500
142023-09-11200
142023-09-12500
142023-09-15400
142023-09-16200
142023-09-18200
142023-09-20200
152020-11-20200
152020-12-05200
152021-08-30200
152021-10-10200
152022-01-09200
152022-09-05200
152022-12-20200
152023-04-10200
152023-09-03200
152023-09-05400
152023-09-08400
152023-09-12200
152023-09-13200
152023-09-14200
152023-09-16200
152023-09-17200
152023-09-18200
162020-12-05200
162022-03-18200
162023-06-15200
162023-09-13200
162023-09-17200
172021-02-10200
172021-04-22200
172022-01-09200
172022-03-18200
172022-09-05200
172022-12-20200
172023-06-15200
172023-09-03500
172023-09-04200
172023-09-05200
172023-09-08500
172023-09-09500
172023-09-11400
172023-09-12500
172023-09-13200
172023-09-14200
172023-09-15200
172023-09-17200
172023-09-20200
182022-12-20200
182023-09-03200
182023-09-09200
182023-09-11400
182023-09-12200
182023-09-13200
182023-09-14200
182023-09-15400
182023-09-16200
192020-12-05200
192023-08-20200
192023-09-07200
192023-09-11200
192023-09-15400
202021-08-30200
202022-01-09200
202022-05-22200
202022-10-15200
202023-09-07200
202023-09-08500
202023-09-09500
202023-09-11500
202023-09-12400
202023-09-13200
202023-09-15500
202023-09-16200
202023-09-18200
202023-10-05200
212021-08-30200
212021-10-10200
212023-09-08200
212023-09-11200
212023-09-12400
212023-09-15200
212023-09-16200
212023-09-18200
212023-09-20200
212023-10-05200
222020-11-20200
222022-12-20200
222023-06-15200
222023-08-20200
222023-09-03200
222023-09-05200
222023-09-07200
222023-09-08400
222023-09-09400
222023-09-11200
222023-09-12200
222023-09-13400
222023-09-15500
232020-09-15200
232021-08-30200
232022-03-18200
232022-10-15200
232023-08-20200
232023-09-05200
232023-09-08200
232023-09-09200
232023-09-13200
232023-09-14200
232023-09-15200
242021-06-15200
242021-08-30200
242021-10-10200
242022-01-09200
242022-03-18200
242022-05-22200
242022-10-15200
242023-02-25200
242023-09-03200
242023-09-09500
242023-09-11400
242023-09-12200
242023-09-14200
242023-09-18200
242023-09-20200
252021-02-10200
252021-06-15200
252021-08-30200
252022-03-18200
252022-12-20200
252023-09-03200
252023-09-04200
252023-09-05200
252023-09-07200
252023-09-08500
252023-09-09400
252023-09-11500
252023-09-12200
252023-09-13200
252023-09-14400
252023-09-15400
252023-10-05200