CS 612 Exercises

CS 612 Practice exercises

Fall 2009

Amortized analysis and disjoint sets

Suppose our multistack data structure includes an operation multipush(m, x) that pushes m copies of x. Analyze the amortized running time using each of the aggregate, accounting, and potential methods. Is it possible to implement such a multistack so that it has O(1) amortized cost per operation?
Suppose our binary counter includes an operation decrement( ) that subtracts one from its value. Analyze the amortized running time using each of the aggregate, accounting, and potential methods. Is it possible to implement such a binary counter so that it has O(1) amortized cost per operation?
The dynamic table we discuss in class maintains a load factor between 1/4 and 1. Redefine the insert and remove operations to maintain a load factor between any arbitrary given load factors f₁ and f₂, where 0 < f₁ < f₂ <= 1. For example, consider f₁ = 1/3 and f₂ = 3/4. For which values of f₁ and f₂ is it possible to implement such a dynamic table so that it has O(1) amortized cost per operation?
A min-priority queue provides operations insert(x) and removeMin( ). The off-line-minimum problem has input a sequence of n insert(x) and p removeMin( ) operations, where each value in {1,2,...,n} is inserted exactly once. The goal is to determine the value returned by each removeMin( ), or equivalently, to construct an array R[1...p], where each R[k] is the value returned by the k^th removeMin( ). This problem is "off-line" because it can read the entire input sequence before determining any of the returned keys. Show how to efficiently solve the off-line-minimum problem by using a disjoint-sets data structure.

In class we show how to implement a disjoint-sets data structure on values {1,2,...,n} that performs a sequence of O(m) find(x) and union(S,T) operations in O(m lg* n) time. We use two heuristics, union-by-rank and path-compression, to achieve this running time. Show that the same running time can be obtained if union-by-rank is replaced by union-by-size, and/or if path-compression is replaced by either path-splitting or path-halving. What are some practical advantages/disadvantages of each heuristic?

union-by-rank	make the root of the shorter tree point to the root of the taller tree
union-by-size	make the root of the smaller tree point to the root of the larger tree
path-compression	make every encountered node point to the root of the tree
path-splitting	make every encountered node point to its grandparent
path-halving	make each alternate node point to its grandparent

Balanced search trees

For each kind of search tree below, trace a sequence of operations as follows: First insert n keys (for example, any permutation of 1,...,n). Next find these same n keys. Finally remove these same n keys. Choose n sufficiently large to achieve all the cases in each algorithm. Kinds of search trees include binary search tree, AVL or height-balanced tree, BB[alpha] or weight-balanced tree, (a,b)-tree or B-tree, red-black tree, and splay tree.
Show how to implement each of these operations for a binary search tree in O(h+m) time, where h = height of tree and m = number of objects found. findAll(k) returns a list of all objects with key k. removeAll(k) removes all objects with key k. findInterval(low,high) returns a list of all objects with key k where low <= k <= high. removeInterval(low,high) removes all objects with key k where low <= k <= high.
Show that each insertion into an AVL tree performs only O(1) rotations. However, also show that a removal from an AVL tree can require theta(lg n) rotations.
Verify that the rotations as specified in class (cases 1,2,3,4) will properly maintain the weight balance for a BB[alpha] tree when performed after each insert or remove operation. Use alpha = 0.29.
AA trees are simpler than red-black trees. They are obtained from (2,3) trees by replacing each 2-node with a black node, and by replacing each 3-node with a black node and a red right child. Therefore an AA tree does not have any red left child. Show how to perform insert and remove in O(lg n) time on AA trees.
Complete the amortized analysis for splay trees to show that every sequence of m find operations in a splay tree with n objects takes O((m+n) lg n) time. Hint: use the node tree model.

Heaps (priority queues)

For each kind of heap below, trace a sequence of operations as follows: First insert n keys (for example, any permutation of 1,...,n). Next call removeMin (or removeMax) n times to remove these same n keys. Choose n sufficiently large to achieve all the cases in each algorithm. Kinds of heaps include balanced search trees, array-based heap, leftist heap, skew heap, binomial heap, and Fibonacci heap.
Show how to modify a double-ended heap represented as a balanced search tree so that findMin and findMax run in O(1) time. Note that insert, removeMin, and removeMax must still run in O(lg n) time.
Show how to implement a double-ended heap by linking together a min-heap and a max-heap. Do this two ways: (i) insert every key into both the min-heap and the max-heap, or (ii) insert about half of the keys into the min-heap and about half of the keys into the max-heap. The goal is to achieve O(1) time for findMin and findMax, and O(lg n) time for insert, removeMin, and removeMax.
Design a data structure for a d-dimensional heap, as follows. Each item is a d-tuple. The operations are insert, findMin_j, and removeMin_j for 1 <= j <= d. Operations with subscript j use the j^th component of the tuple as the key. The goal is to achieve O(1) time for each findMin_j, and O(lg n) time for insert and each removeMin_j.
Design a data structure that supports push, pop, findMin, and findMax in O(1) time. Operations push and pop have the same meaning as for a stack.
Design a data structure that supports enqueue in O(1) amortized time, and dequeue, findMin, and findMax in O(1) time. Operations enqueue and dequeue have the same meaning as for a queue.
Show that insert runs in O(1) amortized time for a binomial heap.

Sets of intervals

Know how to build each of these data structures: interval tree, simple interval tree, segment tree, kd-tree, measure tree, and orthogonal range tree. Also know how to answer queries using each of these data structures.
In class we designed several query algorithms that write multiple intervals or multiple points to the output stream. Modify these query algorithms to create and return a list that contains those same intervals or points, rather than writing to the output stream.
Suppose each query just needs to return a count of how many of the n intervals contain a given point, or how many of the n points lie within a given interval. How efficiently can we build the necessary data structures? How efficient can the query algorithms be? Can we make these data structures dynamic by providing efficient insert and remove operations?
Design and analyze efficient query algorithms to find which of n intervals [a₁,b₁],...,[a_n,b_n] overlap a specified interval [low,high]. Use (i) an interval tree, (ii) a simple interval tree.
Design a dynamic data structure that represents a set of n numbers and that supports these operations in O(lg n) time: insert(x), remove(x), closestPair( ). The closestPair( ) query finds and returns two numbers currently in the set that have the smallest absolute difference. Hint: the n numbers define a partition of the interval (-infinity, +infinity) into disjoint intervals. What information should be stored at each node of a tree in order to answer queries and perform updates efficiently?
Design a dynamic data structure that represents a set of n intervals and that supports these operations in O(lg n) time: insert(a,b), remove(a,b), maxIncludedPoint( ). The maxIncludedPoint( ) query finds and returns a point that is included in the largest number of intervals, that is, where the maximum number of intervals overlap. Hint: consider each left endpoint as +1, and each right endpoint as -1. What information should be stored at each node of a tree in order to answer queries and perform updates efficiently?
Show how to build a k-dimensional kd-tree in O(n lg n) time for arbitrary k >= 2. Also show how to answer a query for a k-dimensional kd-tree in O(n^1-1/k + m) time, where m is the number of points contained in the query interval.

Hash tables, tries and string searching

Be able to trace universal hashing for any chosen values of p,m,a,b and for any given sequence of insert,find,remove operations with specified keys.
Determine whether or not the following set H forms a universal set of hash functions. Suppose the keys belong to U = {0,...,2^t-1}, that is, keys are t-bit numbers. Let m=2^r, with r<t. Let H = {h_A | A = (a₀,...,a_t-1), where 0 <= a_j < m for 0 <= j < t}. To compute h_A(key), first write key in binary as b_t-1...b₁b₀, that is, key = ∑_{0 <= j < t} b_j2^j, where each bit b_j = 0 or 1. Finally define h_A(key) = XOR_{0
<= j < t} (a_jb_j), that is, the bitwise exclusive-or of those a_j for which bit b_j = 1.
Given a particular set of keys and choice of hash functions, be able to construct a two-level perfect hashing scheme for the given keys.
Given the number of bits and the number of keys for a Bloom filter, determine the optimal number of hash functions, and also the minimum probability that the Bloom filter will return a false positive answer.
Consider a trie with each node implemented as an array. Write algorithms for the dictionary operations insert, find, and remove on a trie that each run in O(|S|) time for string S,
Consider a static trie with each node implemented as a linked list. Suppose we know the relative access frequencies for strings S₁,S₂,...,S_n. Design an algorithm to order each linked list so that the average running time of the find operation is as efficient as possible.
Design an efficient algorithm that uses a trie to lexicographically sort the strings S₁,S₂,...,S_n. Also analyze the running time of the sorting algorithm using each variation of a trie.
Determine the running time for each of find, insert, remove and also the space used by the tries that are obtained by combining the compressed (or Patricia) trie with each other variation of trie.

Graphs, dynamization and persistence

Additional problems later?