代做COMP7104-DASC7104 Practical Assignment 2 Query optimization (Oracle)-Preliminaries代做留学生Java程序
- 首页 >> C/C++编程COMP7104-DASC7104
Practical Assignment 2
Query optimization (Oracle)-Preliminaries
To be completed / tested on own laptop before Tuesday 24th of March
(nothing to be submitted on Moodle, just make sure all steps work on your laptop)
The second practical assignment will be a concrete application of the concepts, structures, and algorithms presented in the lectures on relational DBMSs, and on query execution and query optimization.
We will use the ORACLE XE Database Management System. This system provides a good example of a sophisticated optimizer based on index structures and comprehensive evaluation algorithms. All the join algorithms described in class are indeed implemented in ORACLE XE.
In addition, Oracle offers simple and practical tools (AUTOTRACE and EXPLAIN) to understand the execution plan chosen by the optimizer, as well as to obtain performance statistics (I/O cost and CPU cost, among others).
For starters, make sure you have the Docker environment installed on your laptop, as described in the Docker-preliminaries document on Moodle; you should be able to run the following command:
docker run hello-world
Installing the Oracle-XE DBMS docker
The following steps, once completed, will ensure that you have a full-fledged, advanced relational DBMS (Oracle-XE) on your laptop.
In general, when installing a new database system by a Docker image, we will start by searching for and choosing an image on http://hub.docker.com, which is a kind of catalog of Docker images for the Docker community (it requires your Docker login).
Start by doing a keyword search on http://hub.docker.com for “oracle-xe”, and you should find this one:
https://hub.docker.com/r/gvenzl/oracle-xe
Read its description and then get this image by executing the following command on your laptop:
docker pull gvenzl/oracle-xe
We will use this container / docker image for the Oracle XE DBMS as it has the advantage of working on all laptop configurations, including the Apple Mac Mi chips.
Installing the Oracle SQL Developer client
Connecting to the Oracle server is done via a client, such as the SQL Developer, that you can get here:
https://www.oracle.com/tools/downloads/sqldev-downloads.html
(this requires a login, an Oracle Web account).
After installing SQL Developer (you may need to install Java JDK before, if you do not have it already), you can launch it and connect to the Oracle server by clicking on the + icon and filling the connection form as described later, using your chosen password; notice also all the options you need to select in the connection window below.
Starting the Oracle-XE server (all machines including Windows, Linux, Apple Intel, but excluding Apple M1/M2 chips)
You will launch the Oracle server (its Docker image) on your laptop by running a command similar to the following one (which runs on my laptop):
docker run -d -p 1521:1521 -e ORACLE_PASSWORD=BoGDdanC1 -v
/Users/bogdan/data/OracleDBData:/opt/oracle/oradata --name oraxe gvenzl/oracle-xe
In this command, you must replace the directory “/Users/bogdan/data/OracleDBData” with a directory that you created beforehand on your machine. This is the directory where the Oracle database will be saved whenever you turn off completely (as in remove and delete) the Oracle server (its Docker image); in this way, after each restart you can reuse the database. This may be handy for instance if you turn off your machine, which shuts down Docker and all the running images.
In Windows, you would need to write the path “/Users/bogdan/data/OracleDBData” in Windows style, as in “C:\Users\bogdan\data\OracleDBData” .
The -v option is optional and it may not work for some of you (it does not work for Apple Mi chips for instance), and it’s OK if it does not work. This option does the directory “sharing” between your machine and the Docker image, and it works in both directions; your local directory will be mapped to the oradata directory on the Docker image. It is useful to use this option if you want the database to be saved and persist on your machine each time you may turn off the Oracle docker image or the Docker desktop altogether (or your laptop!). Without it, a stop and removal of the oraxe docker image will make disappear all the data from your database, meaning you will start with a fresh and empty database next time.
In your command, you can replace, if you want, the password by the one of your choosing. Note: this is not a sensitive access, my advice is to keep the one given for illustration, to avoid other issues, with unaccepted characters, formats, etc. If you really insist on changing it, keep in mind the password should be at least 8 characters long, with at least one upper-case, at least one digit, and at least one special character.
The -p option indicates that the container’s port 1521 (the one usually used by the Oracle server) is mapped to the same port 1521 of the Docker (the one we will use); same for 5500.
Once this is is done, if you run a docker ps command, you should see the “oraxe” image up and running:
Starting the Oracle-XE server (Apple Mi chips)
(try first the steps below, if they do not work you can also contact the TA Mingruo on the Moodle Forum or by email u300843[email protected])
If you have an Mi Apple laptop, you must use the Colima environment, as described here:
https://hub.docker.com/r/gvenzl/oracle-xe (section Oracle XE on Apple M chips)
Regarding the “start container as usual” part, a few remarks are in order:
• Do not use the -v option (it will not work)
• go to where you have installed SQL Developer (for instance on my machine)
cd
cd .sqldeveloper
cd 23.1.0
and add the following lines in the file product.conf (e.g., using the vim text editor) of that directory:
AddVMOption -Duser.timezone="+02:00"
AddVMOption -Duser.language=en
AddVMOption -Duser.region=us
Save and exit this file.
if you do not have vim, you may have vi, which is the same thing.
here are some vi/vim commands:
https://coderwall.com/p/adv71w/basic-vim-commands-for-getting-started
Connecting to the Oracle-XE server
Connecting to the Oracle server is done via an Oracle client SQL Developer. You can launch SQL Developer and connect to the Oracle server by clicking on the + icon, then filling the connection form as follows (using your chosen password, or mine for example: BoGDdanC1):
If everything went well you can now connect and run a test query for asking the time, such as
SELECT SYSDATE FROM DUAL;
The database: creating the tables & indexes, populating the database
Follow in SQL Developer the following steps:
Step 1:
Execute all the commands IN ORDER from schema_stage1_2024.sql, as follows: copy/paste and execute in SQL Developer each block of commands delimited by (----), one block at a time, and observe whether all is OK after its execution. You can simply select with the mouse in SQL Developer the block of commands you want to execute and click on the large green rectangle button (as illustrated below); in this way, all and only the mouse-selected commands will be executed, one after the other. None of the commands should raise errors. This stage of execution should be rather fast, creating the tables.
Step 2:
Execute the following commands in SQL Developer, which will populate the tables we created at the previous stage. This stage is slow, may take hours (on my machine it takes roughly 12 hours).
Make sure you disable sleep mode on your computer and that it is not running on battery but on wall electricity; my advice is to run this stage overnight.
In the SQL Developer window, run the following as block of instructions, that is select with the mouse in SQL Developer the entire block of four commands and click on the large green rectangle button (as illustrated below);
@/your_own_path_here/data1.sql;
commit;
@/your_own_path_here/data2.sql;
commit;
Note: Instead of /your_own_path_here/ you must put the path where you saved the data1.sql and data2.sql scripts.
Note: these two files are large, do not try to open them in a text editor (it will likely crash), you can for instance take a peek and see a few lines from them with the “more” or “tail” commands instead.
Note: In the data directory on Moodle, the files data1-1 to data1-10 are a backup for those who may not be able not load data1 and data2 directly because too large, in which case you can load these smaller files one at a time: data1-1 to data1-10 and then data2-1 to data2-10.
Step 3:
Once the previous stage is finally done, test a few SQL queries on the resulting tables and check the results correspond to those in the table of Section 1 hereafter:
SELECT COUNT(*) FROM ARTIST;
SELECT COUNT(*) FROM MOVIE1;
…
Then, like at Step 1, execute all the commands from schema_stage2_2024.sql: copy/paste and execute in SQL Developer each block of commands delimited by (----), one block at a time, and observe whether
all is OK. Some of the steps here may take a little while (few minutes), be patient. You must execute all the commands, no exceptions.
The database schema
The schema of the database we will use is the following:
ARTIST (IDARTIST, LASTNAME, FIRSTNAME, DOB)
COUNTRY (CODE, NAME, LANGUAGE)
MOVIE (IDMOVIE, TITLE, YEAR, IDMES, GENRE, SUMMARY, CODECOUNTRY)
• IDMES is the identifier of the movie director
• We will have four movie tables, called Moviei, for i=1,2,3,4
ROLE (IDMOVIE, IDACTOR, ROLENAME) -- each IDACTOR value is an IDARTIST value
MOVIERATER (EMAIL, LASTNAME, FIRSTNAME, PROVINCE)
RATING (IDMOVIE, EMAIL, RATE)
Oracle’s AUTOTRACE tool
We can observe the execution plan Oracle choses for a given query by using the menu AUTOTRACE, by pressing F6. For that, you must select with the mouse the query to be executed in autotrace mode
And then hit the AUTOTRACE button below (or F6):
The plan that was actually executed (the trace thereof) will be displayed as follows:
There are many available statistics we can see along with a plan (in the trace).
We will be interested mainly on LAST_CR_BUFFER_GETS (number of pages from the RAM buffer) and LAST_DISK READS (number of pages read from disk).
Similarly, you can EXPLAIN the plan that Oracle intends to execute (before execution), by selecting with the mouse the query to be explained and then hitting the button below (or F10):
So the difference between AUTOTRACE and EXPLAIN is that the former runs the query and shows the executed plan and its main statistics, while the latter shows an estimated best plan and its estimated statistics without running the query.
Note: what is displayed in the trace of the execution plan, by pressing F6, is configurable in SQL Developer / Preferences / Database / Autotrace Explain Plan (see screenshot below).
Note: It is important to make sure that in SQL Developer -> Preferences->Database -> Autotrace Explain Plan the option Fetch All Rows is checked. This is an option that, if not checked, allows query processing to be faster. In some cases, it is easier to fetch only a certain number of results (for instance if no order or grouping is required, etc), and only later, if necessary (if the user scrolls down on the result), other results (all of them if necessary) will be fetched; we do not want this behavior.
Here are some operations and options in Oracle query execution plans (list by no means exhaustive).
OPERATORS |
OPTION |
MEANING |
AGGREGATE |
GROUP BY |
Computing a one-line result by grouping and aggregating |
AND-EQUAL |
|
An operation that has as input a set of rowIds and returns their intersection (used in accesses based on an index) |
COUNTING |
|
Counting the number of lines |
FILTER |
|
Applies a filter on a set of rows. |
INDEX |
UNIQUE SCAN |
Finding one rowId in an index. |
INDEX |
RANGE SCAN |
Finding one or several rowIds in an index. |
MERGE JOIN |
|
Doing a merge join. |
NESTED LOOPS |
|
Doing an index nested loops join. |
SORT |
UNIQUE |
Sorting for duplicate elimination |
SORT |
GROUP BY |
Sorting for grouping |
SORT |
JOIN |
Sorting for joining (merge-join) |
SORT |
ORDER BY |
Sorting for ORDER BY |
TABLE ACCESS |
FULL |
Getting all the rows of a table |
TABLE ACCESS |
CLUSTER |
Getting rows by a search key in a clustered index |
TABLE ACCESS |
BY ROW ID |
Getting rows by rowId |
Recall: rowId and recordId are synonyms; record, tuple, and row are synonyms.
Oracle implements 3 join algorithms: Index Nested Loops Join (when there is at least one index), Sort- Merge Join and Hash Join when there is no index.
Environment 一 Cache, page size, optimizers
We may choose to work with a buffer cache (RAM) of limited size (on purpose, in order to make Oracle’s optimizer more “creative”), for that, we may need for instance to execute the following command:
alter system set db_cache_size = 400M;
Note: the cache size by default is already 400M, so no need to do this systematically.
You can first test if this is indeed the case by
show parameter db_cache_size;
The page size is 8KB. You can obtain this value (8KB is the default) by
show parameter db_block_size ;
We may also execute the following command to choose how “smart” / adaptive we want the optimizer to be:
ALTER SYSTEM SET optimizer_features_enable = " 11.2.0.4";
(you are encouraged to figure out what this might achieve)
For a SQL query to be executed, we may also choose the optimizer, between RULE, CHOOSE, ALL_ROWS:
• RULE: Optimization mode without statistics over the tables ALTER SESSION SET OPTIMIZER_MODE=RULE;
• CHOOSE: optimization mode with statistics, allows to obtain a more efficient plan ALTER SESSION SET OPTIMIZER_MODE=CHOOSE;
• ALL_ROWS: optimization mode with statistics and most advanced, allows to obtain most efficient plan
ALTER SESSION SET OPTIMIZER_MODE=ALL_ROWS;
• CACHE: If we want to use auto-trace and see the chosen plan for a query, along with the execution cost, we may want to first empty the buffer cache, so that we start with a clean slate (an empty buffer cache). Each query execution will bring pages into this RAM cache, and they may stay there and be re-accessed “for free” otherwise for subsequent executions, so with access cost of 0 for any page that is already in cache.
ALTER SYSTEM FLUSH BUFFER_CACHE;
Note: You can check if the buffer cache flush works as expected by running the same (any) query twice with auto-trace, and in between doing a buffer flush. If at the second query run the number of disk reads is 0, it means the cache has not been flushed.