Database Systems & Applications
Course Project 2022
⚫ Build a database driven web application from the ground up.
⚫ To develop the project, you are encouraged to use the Docker platform.
⚫ If you wish, you may use any languages, tools, microservice architecture,
or web application frameworks, and run servers on your own machines or
⚫ Submit a report, together with your data and code.
2. Project Requirements
The project is to build a database driven web application. Specifically, you will
need to complete the following tasks over the course of this semester.
1) Pick your favorite data management application. It should be
relatively substantial, but not too enormous. Several project ideas are
described at the end of this document, but you are encouraged to come
up with your own. When picking an application, keep the following
questions in mind:
⚫ How do you plan to acquire the data to populate your database?
Use of real datasets is highly recommended. You may use program
generated “fake” datasets if real ones are too difficult to obtain.
⚫ How are you going to use the data? What kind of queries do you
want to ask?
⚫ How is the data updated? Your application should support both
queries and updates
2) Design the database schema. Start with an E/R diagram and convert
it to a relational schema. Identify any constraints that hold in your
application domain, and code them as database constraints. If you plan
to work with real datasets, it is important to go over some samples of
real data to validate your design (in fact, you should start Task 7 below
as early as possible, in parallel to Tasks 3~6). Do not forget to apply
database design theory and check for redundancies.
3) Create a sample database using a small dataset. You may generate
this small dataset by hand. You will find this sample database very
useful in testing, because large datasets make debugging difficult. It is
a good idea to write some scripts to create/load/destroy the sample
database automatically; they will save you lots of typing when
4) Design a web-based user interface for your application. Think
about how a typical user would use your application. Optionally, it might
be useful to build a “canned” demo version of the site first (i.e., with
hardcoded rather than dynamically generated responses), while you
brush up your application design skills at the same time. Do not spend
too much time on refining the look of your interface; you just need to
understand the basic “flow” in order to figure out what database
operations are needed in each step of the user interaction.
5) Write SQL queries that will supply dynamic contents for the web pages
you designed for Task 4. Also write SQL code that modifies the
database on behalf of the user. You may hardcode the query and
update parameters. Test these SQL statements in the sample database.
6) Choose an appropriate platform for your application. Python or
type of simple database driven web application, deploy it in your
development environment, and make sure that all parts are working
7) Acquire the large “production” dataset, either by downloading it
from a real data source or by generating it using a program. Make sure
the dataset fits your schema. For real datasets, you might need to write
programs/scripts to transform them into a form that is appropriate for
loading into a database. For program generated datasets, make sure
they contain enough interesting “links” across rows of different tables,
or else all your join queries may return non-empty results.
8) Test the SQL statements you developed for Task 5 in the large
database. Do you run into any performance problems? Try creating
some additional indexes to improve performance.
9) Implement and debug the application and the web interface. Test
your application with the smaller sample database first. You may need
to iterate the design and implementation several times in order to
correct any unforeseen problems.
10) Test your application with the production dataset. Resolve any
11) Polish the web interface. You may add as many bells and whistles as
you like, though they are optional because they are not the main focus
of this course.
3. Milestone 1
You should have completed Tasks 1~5 and have started thinking about 6 and
7. If you plan to work with real data, you should also have made significant
progress on Task 7 (you should at least ensure that it is feasible to obtain the
real dataset, transform it, and load it into your database).
4. Milestone 2
You should have completed Tasks 1~8 and have made good progress on 9.
5. Project Demo
At the end of the semester, you will need to present a working demo of your
system. Prior to your demo, submit the following:
1) A final project report, including
⚫ A brief description of your application.
⚫ A plan for getting the data to populate your database, as well as
some sample data.
⚫ A list of assumptions that you are making about the data being
⚫ An E/R diagram for your database design.
⚫ A list of database tables with keys declared.
⚫ A description of the Web interface. You can write a brief
description of how users interact with the interface (e.g., “the user
selects a car model from a pulldown menu, clicks on the ‘go’ button,
and a new page will display all cars of this model that are available
⚫ A brief description of the platform you chose in Task 6.
⚫ Changes you made to the database during performance tuning in
Task 8, e.g., additional indexes created.
2) A .zip or .tar.gz archive of your source code. The source code directory
should at least contain:
⚫ A file describing how to create and load your sample database.
⚫ Files containing the SQL code used for creating tables, constraints,
stored procedures and triggers (if any).
⚫ A file test-sample.sql containing the SQL statements you wrote for
⚫ A file test-sample.out showing the results of running testsample.sql over your sample database.
⚫ If applicable, any code for downloading/scraping/transforming real
data that you have written for Task 7.
⚫ A file describing how to generate the “production” dataset and load
it into your database. Do not submit the production dataset itself
through if it is too big; instead, submit the URL where you
download/scrape the raw data (if applicable), and the code that
extracts and transforms (or generates) the production dataset.
⚫ A file test-production.sql containing the SQL statements you
wrote for Task 5. You may wish to modify some queries to return
only the top 10 result rows instead of all result rows (there might be
lots for large datasets).
⚫ A file test-production.out showing the results of running testproduction.sql over the production dataset.
⚫ All your source code.
⚫ a README file describing how to set up your servers and database,
and how to compile and deploy your application.
6. Project Ideas
Below is a list of possible project ideas for which high quality datasets exist. Of
course, you are welcome to come up with your own.
Examples include those that allow visitors to explore information about movies,
music, sports, stocks, etc. There are already many commercial offerings for
such purposes. While there is less room for innovation, there are plenty of
examples of what a good web application would look like, as well as high quality,
well formatted datasets.
⚫ IMDb Datasets (http://www.imdb.com/interfaces );
⚫ Sports Statistics (https://sports-statistics.com/sports-data/);
⚫ Million Song Dataset (http://millionsongdataset.com/);
⚫ Finance datasets available on data.world
⚫ Airbnb public datasets (http://insideairbnb.com/get-the-data.html);
⚫ Tianchi Data Sets (https://tianchi.aliyun.com/dataset/);
⚫ TPC-C benchmark (http://www.tpc.org/tpcc/);
⚫ TPC-H benchmark (http://www.tpc.org/tpch/);
These projects are well suited for those who just want to learn how to build
database backed web applications as beginners. You can always spice things
up by adding features that you wish those websites had (e.g., different ways for
summarizing, exploring, and visualizing the data).
Your task would be to take one of such datasets, design a good relational schema,
clean up/restructure the data, and build a web application for the public to explore
Database Systems & Applications