Week 10  MST & NP
MST: minimum spanning tree.
Notes
P and NP
 P problem: polynomial time solvable, n^c instead of c^n.
 NP problem: polynomial time verifiable.

NPhard problem: A is a NPhard problem if \forall B\in NP, B is polynomial time reducible to A, which is
B \leq _p A
NPComplete
 Circuit satisfiability problem (CSAT)

Boolean satisfiability problem (SAT)
SAT \leq _p CSAT

Conjunctive normal form (3CNF)
 Clique of graph
Assignment  Due 12/01/2018
Credit: source code of questions are from walkccc.
16.14
Suppose that we have a set of activities to schedule among a large number of lecture halls, where any activity can take place in any lecture hall. We wish to schedule all the activities using as few lecture halls as possible. Give an efficient greedy algorithm to determine which activity should use which lecture hall.
(This problem is also known as the intervalgraph coloring problem. We can create an interval graph whose vertices are the given activities and whose edges connect incompatible activities. The smallest number of colors required to color every vertex so that no two adjacent vertices have the same color corresponds to finding the fewest lecture halls needed to schedule all of the given activities.)
Because all the lecture halls are identical, suppose we have a set of activities, we can use the GREEDYACTIVITYSELECTOR in the book to select the maximum number of compatible activities and the first lecture hall as (S_1, H_1), and then select the next maximum number of compatible activities and the corresponding hall from the rest as \{(S_2, H_2) \ldots, (S_n, H_n)\} consecutively. The complete result is
The running time of which is O(n^2).
16.24
Professor Gekko has always dreamed of inline skating across North Dakota. He plans to cross the state on highway U.S. 2, which runs from Grand Forks, on the eastern border with Minnesota, to Williston, near the western border with Montana. The professor can carry two liters of water, and he can skate m miles before running out of water. (Because North Dakota is relatively flat, the professor does not have to worry about drinking water at a greater rate on uphill sections than on flat or downhill sections.) The professor will start in Grand Forks with two full liters of water. His official North Dakota state map shows all the places along U.S. 2 at which he can refill his water and the distances between these locations.
The professor's goal is to minimize the number of water stops along his route across the state. Give an efficient method by which he can determine which water stops he should make. Prove that your strategy yields an optimal solution, and give its running time.
Professor Gekko should use greedy strategy that he only stops when he would run out of water before reaching the next stop.
 If he didn't stop at a water refill site, it is trivial to be the optimum.
 Suppose all his stops before the current one is optimal. He stopped because he would running out of water in between of current refill stop and the next one, so there would be no other strategy that can exempt him from the current stop, which mean the current stop is also optimal.
 Thus, by induction we can say the greedy strategy is optimal, and it's that it's running time is O(n).
16.1
Consider the problem of making change for n cents using the fewest number of coins. Assume that each coin's value is an integer.
a. Describe a greedy algorithm to make change consisting of quarters, dimes, nickels, and pennies. Prove that your algorithm yields an optimal solution.
b. Suppose that the available coins are in the denominations that are powers of c, i.e., the denominations are c^0, c^1, \ldots, c^k for some integers c > 1 and k \ge 1. Show that the greedy algorithm always yields an optimal solution.
c. Give a set of coin denominations for which the greedy algorithm does not yield an optimal solution. Your set should include a penny so that there is a solution for every value of n.
d. Give an O(nk)time algorithm that makes change for any set of k different coin denominations, assuming that one of the coins is a penny.
a. The greedy algorithm is to always use the largest coin that does not exceeded the remaining change.
 If the change is exactly the amount within the four coins, it is trivial that use the coin with corresponding amount is optimal.
 Otherwise, suppose all the coins before current one is the optimal solution so far, we use the largest coin that does not exceeded the remaining change again. And with any other solution, the total amount of the change would be less. With same number of coins, the greedy strategy will be the first one to reach the total amount of change, which means it uses the fewest number of coins.
 Thus by induction we can say the greedy strategy is optimal.
b. This is exactly the same to the prior question:
 If the change is exactly the amount within the coins, it is trivial that use the coin with corresponding amount is optimal.
 Otherwise, suppose all the coins before current one is the optimal solution so far, we use the largest coin that does not exceeded the remaining change again. And with any other solution, the total amount of the change would be less. With same number of coins, the greedy strategy will be the first one to reach the total amount of change, which means it uses the fewest number of coins.
 Thus by induction we can say the greedy strategy is still optimal.
c. Say that we have coins set {1, 4, 6} for change of 8 cents, the greedy algorithm will give {6, 1, 1}, but it is actually {4, 4} that is the optimal solution.
d. The running time of greedy algorithm is O(nk).
16.2
Suppose you are given a set S = \{a_1, a_2, \ldots, a_n\} of tasks, where task a_i requires p_i units of processing time to complete, once it has started. You have one computer on which to run these tasks, and the computer can run only one task at a time. Let c_i be the completion time of task a_i , that is, the time at which task a_i completes processing. Your goal is to minimize the average completion time, that is, to minimize (1 / n) \sum_{i = 1}^n c_i. For example, suppose there are two tasks, a_1 and a_2, with p_1 = 3 and p_2 = 5, and consider the schedule in which a_2 runs first, followed by a_1. Then c_2 = 5, c_1 = 8, and the average completion time is (5 + 8) / 2 = 6.5. If task a_1 runs first, however, then c_1 = 3, c_2 = 8, and the average completion time is (3 + 8) / 2 = 5.5.
a. Give an algorithm that schedules the tasks so as to minimize the average completion time. Each task must run nonpreemptively, that is, once task a_i starts, it must run continuously for p_i units of time. Prove that your algorithm minimizes the average completion time, and state the running time of your algorithm.
b. Suppose now that the tasks are not all available at once. That is, each task cannot start until its release time r_i. Suppose also that we allow preemption, so that a task can be suspended and restarted at a later time. For example, a task a_i with processing time p_i = 6 and release time r_i = 1 might start running at time 1 and be preempted at time 4. It might then resume at time 10 but be preempted at time 11, and it might finally resume at time 13 and complete at time 15. Task a_i has run for a total of 6 time units, but its running time has been divided into three pieces. In this scenario, a_i's completion time is 15. Give an algorithm that schedules the tasks so as to minimize the average completion time in this new scenario. Prove that your algorithm minimizes the average completion time, and state the running time of your algorithm.
a. The greedy strategy is sorting tasks by processing time in ascending order, and then run them consecutively.
 Suppose we are processing the first task, it is trivial that the task with the shortest processing unit have the minimum average completion time.
 Suppose i tasks that have been processed have the minimum average completion time t_i. We we add the new task with time p_{i+1}, the average completion time would become (t_i\times i + t_i + p_{i+1}) / (i + 1), whose value is determined by p_{i+1}, and any other strategy other than choose the new task with the shortest processing unit task among the rest will increase this average time. That is also the task with the shortest processing unit will result in the minimum average completion time.
 Thus by induction we can say the greedy strategy is optimal.
 The running time of the greedy algorithm is sorting time plus linear walking time, which is O(n\lg n).
b. The greedy strategy is to sort residual of unfinished tasks and all unstarted released tasks whenever there's a new task released, and proceed the task with shortest time unit.
 When there's only one task, it is trivial to process it results in the minimum average completion time.

Suppose at time t the new task A_{new} released, and i tasks that have been completed have the minimum average completion time t_i. We sort all tasks and get the task with minimum unprocessed time p_m. The new average completion time is
t_{i + 1} = \dfrac{t_i\times i + t_i + p_m}{i + 1}which is optimal whenever p_m is the smallest one.

Thus by induction we can say the greedy strategy is optimal.
 The running time of the greedy algorithm is sorting time plus square walking time, which is O(n^2\lg n)
23.11
Let (u, v) be a minimumweight edge in a connected graph G. Show that (u, v) belongs to some minimum spanning tree of G.
Suppose A is a MST covers vertices u, v and (u, v) \notin A, then there will be other egde(s) cover u, v. And since u, v is a minimumweight edge, the other edge must be with greater weight: contradict to definition of MST. So (u, v) \in A.
23.14
Give a simple example of a connected graph such that the set of edges \{(u, v): there exists a cut (S, V  S) such that (u, v) is a light edge crossing (S, V  S)\} does not form a minimum spanning tree.
From walkccc:
A triangle whose edge weights are all equal is a graph in which every edge is a light edge crossing some cut. But the triangle is cyclic, so it is not a minimum spanning tree.
23.16
Show that a graph has a unique minimum spanning tree if, for every cut of the graph, there is a unique light edge crossing the cut. Show that the converse is not true by giving a counterexample.
 Suppose G has more than one MST by having multiple edges connecting two sides of a cut (S, VS), then there will be multiple light edges with same weight: contradict with that any light edge is unique. So for every cut of the graph, there is a unique light edge crossing the cut.
 Any unique MST where there exist two vertices both connected to a third one with edges of equal weight, a cut between two vertices and the rest of the graph will result in two identical light edges.
23.110
Given a graph G and a minimum spanning tree T, suppose that we decrease the weight of one of the edges in T. Show that T is still a minimum spanning tree for G. More formally, let T be a minimum spanning tree for G with edge weights given by weight function w. Choose one edge (x, y) \in T and a positive number k, and define the weight function w' by
w'(u, v) = \begin{cases} w(u, v) & \text{ if }(u, v) \ne (x, y), \\\\ w(x, y)  k & \text{ if }(u, v) = (x, y). \end{cases}Show that T is a minimum spanning tree for G with edge weights given by w'.
Because T is the MST, any other spanning tree would have greater accumulated weights. And by decrease the weights of edges within T, it's no harm to the \textbf{minimum}, so it is still the MST.
23.21
Kruskal's algorithm can return different spanning trees for the same input graph G, depending on how it breaks ties when the edges are sorted into order. Show that for each minimum spanning tree T of G, there is a way to sort the edges of G in Kruskal's algorithm so that the algorithm returns T.
For each of T, when facing a tie, we can add a small portion to the vertice that is not included in the tree, and this actually prioritized included vertices to exist in future reconstructions.
23.24
Suppose that all edge weights in a graph are integers in the range from 1 to V. How fast can you make Kruskal's algorithm run? What if the edge weights are integers in the range from 1 to W for some constant W?
By using counting sort, if edge weights in a graph are integers in the range from 1 to V, the running time would be O(V) + O(V + E) + O(E\alpha (V)) = O(E\alpha (V)) for sets initialization, counting sort and sets union separately.\
If edge weights in a graph are integers in the range from 1 to constant W, O(V + E) term will be replaced by O(W + E) term, the result remain the same O(E\alpha (V)).