CSCI 3110语言编程辅导、讲解Java，Python程序

2021.07.03 - 首页 >> Java编程

Lecture Notes for CSCI 3110:
Design and Analysis of Algorithms
Travis Gagie
Faculty of Computer Science
Dalhousie University
Summer 2021
Contents
1 “Clink” versus “BOOM” 3
I Divide and Conquer 8
2 Colouring Graphs 9
Assignment 1 19
Solution 21
3 Euclid, Karatsuba, Strassen 24
4 Fast Fourier Transform 32
Assignment 2 37
Solutions 39
5 Asymptotic Bounds 44
6 Sorting 51
Assignment 3 59
Solutions 61
1
II Greedy Algorithms 62
7 Huffman Coding 63
8 Minimum Spanning Trees 74
9 More Greedy Algorithms 82
Assignment 4 91
Midterm 1 93
Solutions 95
III Dynamic Programming 98
10 Edit Distance 99
11 Subset Sum and Knapsack 107
12 Optimal Binary Search Trees and Longest Increasing Subsequences 113
Assignment 5 120
2
Chapter 1
“Clink” versus “BOOM”
Imagine I’m standing in front of you holding up two little pieces of silver-grey metal and between
us is a bucket of water. You can see the pieces of metal look pretty much the same; I tap them
on the desk so you can tell they sound about the same; I pass them around so you can check they
feel about the same, weigh about the same, smell the same (not really at all), although I warn you
not to taste them. I take them back and throw first one into the bucket, where it goes “clink”, and
then the other, where it goes “BOOM”.
Even if we had in-person classes I probably wouldn’t be allowed to explode things in front of
my students — that’s what tenure is for! — but I want you to think about how two things that
seem the same can behave so differently. Explaining that for those two little pieces of metal gets us
into discussions that get pretty deep pretty quickly, of chemical reactions and molecules and atoms.
The famous physicist Richard Feynman said that if some disaster were going to wipe out nearly all
scientific knowledge and he could choose once sentence to pass on to future generations, it would be
“all things are made of atoms”. Other scientists might choose deep statements from their own fields
that have changed humanity’s view of the universe and our place in it: for an astronomer, “the
earth goes around the sun”; for a biologist, “species arise through a process of natural selection”;
for a psychologist, “the mind is a function of the brain”. Statements about mathematics usually
aren’t quite so controversial, at least among non-mathematicians, but the story goes that the first
mathematician to prove that the square root of 2 is irrational was thrown overboard by his fellow
Pythagoreans.
So, what have we got? You’ve been studying computer science in university for a few years
now, and I’d like to know what you’ve learned about our field that can reasonably be considered
one of the great scientific truths. “The speed of processors doubles every 18 months”? “Adding
workers to a late software project makes it later”? “All your bases are belong to us”? “The cake
is a lie”? This is when I’d really like to be able to stop and let you think, and then hear what
you have to say. Under the circumstances, however, I’m just going to have to assume most of you
chose to study computer science because it’s fun, you can do cool things with it (including saving
people’s lives and making the world a better place), it pays pretty well and it’s indoor work with
no heavy lifting — even though it may not tell us deep things about the universe.
3
That’s rather a shame. If you check Scientific American’s list of “100 or so Books that Shaped a
Century of Science”, under “Physical Sciences” and mixed in with Stephen Hawking’s A Brief His-
tory of Time, Primo Levi’s The Periodic Table, Carl Sagan’s The Pale Blue Dot, Paul Dirac’s Quan-
tum Mechanics, The Collected Papers of Albert Einstein, Linus Pauling’s Nature of the Chemical
Bond and Feynman’s QED, you’ll find books about computer science (or things close to it): Benoit
Mandelbrot’s Fractals, Norbert Weiner’s Cybernetics and, last but definitely not least, Knuth’s Art
of Computer Programming. So apparently we are doing deep things, we’re just not teaching you
about them.
Actually, I think that in a very important sense, we are the deepest of the sciences. To under-
stand where the basic rules of psychology come from, for example, you must know something about
biology; to understand where the basic rules of biology come from, you must know something about
chemistry; to understand where the basic rules of chemistry come from, you must know something
about physics; to understand where the basic rules of physics come from, you must know some-
thing about mathematics; but who tells the mathematicians what they can and cannot do? We do.
Philosophers, theologians and politicians might claim to, but I think the mathematicians largely
ignore them. One field has really stood up to mathematics and laid down the law, and that field
was computer science.
At the International Congress of Mathematicians in 1900, a famous mathematician named
David Hilbert proposed ten problems that he thought mathematicians should work on in the 20th
century. He later expanded the list to 23 problems, which were published in 1902 in the Bulletin of
the American Mathematical Society. Let’s talk about Hilbert’s Tenth Problem (for the integers): “to
devise a process according to which it can be determined in a finite number of operations whether
[a given Diophantine equation with integer coefficients] is solvable in [the integers]”. Notice Hilbert
didn’t ask whether such a process existed; back then, people assumed that if a mathematical
statement were true, with enough effort they should be able to prove it.
I’ll assume you all know what a quadratic equation in one variable is, and that you can tell if
one has a solution in the reals using the quadratic formula. There are similar but more complicated
formulas for solving cubic and quartic equations in one variable, which fortunately I never learned.
If you’re given a quintic equation in one variable with integer coefficients, then you may be able
to use the rational-root theorem (which I actually did learn once upon a time!) to get it down to
a quartic, which you can then solve with the quartic formulas. (For anything more complicated,
maybe you can try Galois Theory, but I can’t help you there.) Anyway, the Diophantine equations
Hilbert was talking about are sort of like these equations but with a finite number of variables. He
wanted a way to solve any given Diophantine equation that has integer coefficients or, at least, a
way to tell if it even has an integer solution.
So, solving Diophantine equations over the integers was considered an important mathematics
problem by important mathematicians. Unfortunately for them, in 1970 Yuri Matiyasevich showed
in his doctoral thesis that we can take a certain general model of a computer, called a Turing
Machine, and encode it as a Diophantine equation with integer coefficients, in such a way that the
Turing Machine halts if and only if the Diophantine equation has an integer solution. It was already
known that determining whether a Turing Machine halts is generally incomputable, meaning there
is no algorithm for it (and that had already thrown a spanner in Hilbert’s plan for 20th-century
mathematics), so Matiyasevich’s Theorem (also known as a Matiyasevich-Robinson-Davis-Putnam
4
Figure 1.1: The seven bridges of Ko¨nigsberg in Euler’s time (https://commons.wikimedia.org/
wiki/File:Konigsberg_bridges.png).
Theorem, for some other mathematicians who laid the groundwork) says Hilbert’s Tenth Problem
is impossible (for the integers). So, computer science told mathematics that it couldn’t have
something it really wanted.
I guess 1970 seems like a long time ago and we don’t want to relive old victories so — although
we’ll talk a bit about Turing Machines and the Halting Problem and computability and Hilbert’s
Entscheidungsproblem — in this course mostly we’re not going to focus on the gap between possible
and impossible problems (and I won’t mention Diophantine equations again). Instead, we’re going
to focus on the nature of the gap between easy problems (we have a polynomial-time solution that
runs on a standard computer) and hard problems (we don’t) — which is the deepest open question
in computer science and has been for a long time, since 1971. Echoing Hilbert, in 2000 the Clay
Mathematics Institute proposed a list of seven problems they thought mathematicians should work
on in the 21st century, including this one of easy versus hard, and backed each with a million-
dollar prize. To quickly illustrate the gap between easy and hard, I’d like to talk about two classic
problems, determining whether a graph has an Eulerian cycle and whether it has a Hamiltonian
cycle. At first these problems seem almost the same, but it turns out that one goes “clink” and
the other goes “BOOM” (we think).
In the 1730s, the citizens of Ko¨nigsberg (then in Prussia, now the Russian city of Kaliningrad)
wrote to the famous mathematician Leonhard Euler asking whether it was possible to start some-
where in their city (shown as it existed then in Figure 1.1), walk around it in such a way as to
cross each of their seven bridges once and only once, and arrive back at the starting point. Euler
answered that it was not possible since, if we draw a graph with each shore and island as a vertex
and each bridge as an edge, all of the vertices have odd degree, where a vertex’s degree is the
number of edges incident to it. He observed that in such a walk — now called an Eulerian cycle of
a graph — every time we visit a vertex we reduce the number of uncrossed edges incident to it by
2 (we arrive across one and leave across one); therefore, in order for a graph to have an Eulerian
cycle, it is a necessary condition that it be connected (that is, it must be possible to get from any
vertex to any other vertex) and every vertex must have even degree.
Euler also showed that this is also a sufficient condition, which is less obvious. Suppose we
start at some vertex v in a graph G that satisfies the condition, and walk around until we get stuck
5
(that is, we arrive at a vertex all of whose incident edges we’ve already crossed). Where can we get
stuck? Well, when we’re at a vertex w 6= v then we’ll have “crossed off” an odd number of edges
incident to w (one for each time we’ve arrived there and one for each time we’ve left, and we’ve
arrived once more often than we’ve left); therefore, since an even number minus an odd number
is an odd number and 0 isn’t odd, there’s always at least one edge left for us to leave across. It
follows that we can only get stuck back at v, after tracing out a cycle C in G.
Let G′ be the graph resulting from deleting from G the edges in C. If v has positive degree in
G′ then we can start there again and find and delete another cycle; otherwise, if some other vertex
has positive degree in G′, we can start there and find and delete another cycle. If we keep doing
this until there are no edges left, we get a decomposition of G into edge-disjoint cycles (meaning
the cycles can share vertices but not edges). Take any of the cycles C1, choose a vertex v it shares
with any of the other cycles, and choose another cycle C2 containing v. (If C1 doesn’t share any
vertices with any other cycles, then its vertices aren’t connected to the rest of the graph, contrary
to our assumption.) Replace C1 and C2 by a single (non-simple) cycle C3 that makes a figure 8 in
some sense: if we start at v then we first cross all the edges in C1 and return to v, then cross all
the edges in C2 and return to v. Repeating this cycle-merging, we eventually obtain an Eulerian
cycle of the graph.
It’s not totally trivial to implement this algorithm cleanly but it’s not all that hard either (so
I gave it as a homework exercise last year) and it obviously takes only polynomial time in the size
of the graph. In other words, the problem of determining whether a graph is Eulerian (contains an
Eulerian cycle) goes “clink”. It’s actually an important problem that comes up in de novo genome
assembly: one of the most common ways to assemble DNA reads is by building what’s called a
de Bruijn graph of the k-mers they contain, for some appropriate value of k, and then trying to
find an Eulerian tour of that graph (“tour” instead of “cycle” because it need not end where it
started). It’s likely, for example, that this played a role in sequencing the Covid genome, which
was important for developing a test for whether people were infected. Now let’s consider a problem
which seems deceptively similar at first.
In 1857 William Rowan Hamilton showed how to walk along the edges of a dodecahedron,
visiting each vertex once and only once, until arriving back at the starting point. Such a walk in
a graph is now called a Hamiltonian cycle, and it’s an obvious counterpart to an Eulerian cycle
but focusing on visiting vertices instead crossing edges. Hamilton didn’t give a general result like
Euler’s, however, and in fact no one has ever found an efficient algorithm for determining whether a
given graph is Hamiltonian (that is, it has a Hamiltonian cycle). By “efficient”, in computer science
we mean that the algorithm takes time polynomial in the size of the graph (in this case, the sum of
the number of vertices and the number of edges). Obviously there are necessary conditions we can
check quickly (for example, the graph has to be connected), and there are sufficient conditions we
can check quickly (for example, if a graph is a clique then it’s Hamiltonian, meaning it contains a
Hamiltonian cycle), and there are some kinds of graphs that are easy (for example, cycles or trees),
and some graphs are small enough that we can deal with them by brute force. . . but nobody’s ever
found an algorithm that can handle all graphs in polynomial time. In other words, it seems this
problem goes “BOOM”.
To see why people would really like to be able to determine whether a graph is Hamiltonian,
suppose you’re in charge of logistics for a delivery company and you’re looking at a map trying to
6
plan the route of a delivery truck that has to visit n cities, in any order and then return home. You
can tell the distance between each pair of cities and you want to know if it’s possible for the truck
to make its deliveries while travelling at most a certain total distance. (This problem is called the
Travelling Salesperson Problem or TSP or, assuming the truck travels only in the plane,
Euclidian TSP; computer scientists often use Small Caps for tricky problems.) If every road
between two cities has distance 1, then the truck can travel distance n if and only if the graph with
cities as vertices and roads as edges, is Hamiltonian. It follows that if HamCycle (the problem of
determining whether a graph is Hamiltonian) goes “BOOM”, then TSP goes “BOOM” too. That
is, if HamCycle is hard, then so is TSP, because in some sense HamCycle can be turned into
TSP. Don’t worry, we’ll see a lot more about this later in the course.
How can two problems — determining if a graph is Eulerian and if it’s Hamiltonian — that
seem so similar be so different? Well, we hope investigating that will lead us to a deep truth, like
investigating the reactions of those pieces of metal to water could lead us to studying atoms. We
may not even really care as much about the problems themselves as we do about what studying
them can tell us: my chemistry teacher might forgive me if I forgot that sodium metal explodes in
water, but he’d be really annoyed if I forgot the atomic theory; similarly, you may pass this course
even if you forget Euler’s algorithm, but I really do want you to remember something about the
difference between easy and hard problems.
Another way of saying this is that you shouldn’t miss the forest for the trees. Think of Euler’s
Algorithm as a tree and HamCycle as a tree and yourself as a student of geography: you’re not
really interested in individual trees as much as you’re interested in the general layout of the forest
and, metaphorically, whether there’s a big river splitting it into two pieces. In the next class,
we’ll talk about another classic problem, colouring graphs, some versions of which (1-colourability,
2-colourability, planar k-colourability for k ≥ 4) are easy and some versions of which (planar 3-
colourability, general k-colourability for k ≥ 3) are thought to be hard. Does anyone really care
about 47-colourability in particular, say? Probably not, but by the end of this course you should
understand that it’s on the same side of the river as HamCycle and TSP and planar 3-colourability
and general k-colourability for any other k ≥ 3 (that is, the “BOOM” side, to mix metaphors), and
on the opposite side from finding Eulerian cycles and 1- or 2-colourability (the “clink” side).
Investigating the difference between the complexity of finding Eulerian and Hamiltonian cycles
and between 2-colouring and 3-colouring graphs also leads us to a shallow but important truth
about this course: it’s dangerous to answer questions just by pattern matching! That is, just
because a problem sounds similar to something you’ve seen before, don’t think you can just write
the answer to the old problem and get part marks.
Summing up, here are some things to remember while taking this course:
Computer science is the deepest of the sciences!
Some problems are possible, some aren’t — and that’s deep.
Some problems are easy, some probably aren’t — and that’s deep, too.
Don’t miss the forest for the trees.
It’s dangerous to answer questions just by pattern matching.
7
Part I
Divide and Conquer
8
Chapter 2
Colouring Graphs
When I was in primary school geography was mainly about colouring maps, which I was pretty
good at except that I kept loosing my pencil crayons, so my maps weren’t as colourful as the other
children’s. I was therefore really impressed when I learned that in 1852 Francis Guthrie found a
way to colour a map of the counties of England (there were 39 back then) using only four colours,
such that no two counties with the same colour share a border (although meeting at a point is
ok). This made him wonder if it was possible to 4-colour any (planar) map like this. He wrote to
his brother, who was studying with the mathematician De Morgan (who formulated De Morgan’s
Laws of propositional logic, which you may have heard about last year), and the conjecture was
published in The Athen?um magazine in 1854 and 1860. In 1890 Heawood proved any planar map
can be 5-coloured, and in 1976 Appel and Haken finally proved the original conjecture by checking
1834 cases with a computer.
(An important (and very long) part of the proof was checked manually by Haken’s daughter
Dorothea, who taught me third-year algorithms at Queen’s 22 years later; she’s also a coauthor of
the Master Theorem that we’ll discuss next week.)
Appel and Haken’s Four-Colour Theorem is considered part of graph theory, instead of geogra-
phy, because if someone gives us a planar map and we draw a vertex in each region and put an edge
between two vertices if and only if their regions share a border, then we get a planar graph that
can be coloured with k colours such that no two vertices with the same colour share an edge, if and
only if the original map can be coloured with k colours. For example, Figure 2.1 shows the graph
we get for the map of the counties of Nova Scotia. (Actually, if we put an edge for every shared
border — so one edge between France and Spain for their border west of Andorra and another for
their border east of it, for example — then the graph and the map are duals, meaning that from
the graph we can reconstruct something that’s topologically equivalent to the map, and from that
we can reconstruct the graph, and so on.) It’s easy to find planar graphs that require 4 colours,
such as a 4-clique (a graph on four vertices with edges between every pair), but it’s not so easy
to tell whether a given planar graph can be 3-coloured. In fact, as mentioned in the last lecture,
that’s a “BOOM” problem.