Wednesday, October 31, 2007

Suffix Tree

The suffix tree for a string S is a tree whose edges are labeled with strings, and such that each suffix of S corresponds to exactly one path from the tree's root to a leaf. It is thus a radix tree for the suffixes of S.

Constructing such a tree for the string S takes time and space linear in the length of S.

Applications:
1.
Check if a string P of length m is a substring in O(m) time


Subsequence, Substring, Suffix, Prefix
  • Subsequence is a generalization of substring, suffix and prefix. Finding the longest string which is equal to a subsequence of two or more strings is known as the longest common subsequence problem.
  • Substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix.
Lexicographical order(Lexicographic), dictionary order, alphabetic order
one may assume that all words have the same length, by adding blank spaces at the end, and considering the blank space as a special character which comes before any other letter in the alphabet. This also allows ordering of phrases.

References:
http://en.wikipedia.org/wiki/Suffix_%28computer_science%29
http://en.wikipedia.org/wiki/Suffix_tree
http://en.wikipedia.org/wiki/Lexicographical_order

Tuesday, October 30, 2007

Binary Search Tree

Binary Search Tree: for each node, all elements in its left tree are less than or equal to (<=) the node, and all the elements in its right tree are larger than the node. A binary search tree is different from a binary tree.

Binary Search Tree is good at insert and lookup. On average, binary tree will locate a node in O(logn) time. BST is a good way to practice recursion.

Typical operations on BST:
1. Search
2. Delete min value
3. Delete max value
4. Remove node
We need to consider several cases: 1) leaf node, simply delete it 2) node with only one child, then we delete the node and bypass its parent pointer to the node's child 3) node with two children. We first need to find the min node in the node's right tree, replace the node with this min value and then remove the min node.
5. Delete BST
You can implement this with an elegant recursive function.


struct node
{
int data;
struct node * left;
struct node * right;
}

Lookup()

BST Checking: Check every node to see whether it meets the BST definition.

*****
Q1: Print each path from root to leaf

void PrintArr(int path[], int pathLen)
{
int i;
for(i=0;idata;
pathLen++;
}

if(node->left == NULL && node->right == NULL)
PrintArr(path, pathLen);
else
{
RecurPrintPath(node->left, path, pathLen);
RecurPrintPath(node->right, path, pathLen);
}
}

void PrintPaths(struct node * node)
{
int path[1000];

RecurPrintPath(node, path, 0);
}

NOTE: A good recursion with a helper function. Watch how path and pathLen changes and the according values in different recursion level.

Q2: Print a binary tree level by level. Top down or Bottom up. How about one level per line for the output?

The top down idea is similar to breadth-first search (BFS) using QUEUE data structure. The bottom up idea can be achieved by recursively print the top down method or we can use a stack to simulate this procedure.

code snippet:

Top-Down

while(!Q.empty())
{
printf("%d ", Q.front()->data);
if(Q.front()->left!=NULL)
Q.push(Q.front()->left);
if(Q.front()->right!=NULL)
Q.push(Q.front()->right);
Q.pop();
}

Bottom-up

while(!Q.empty())
{
//printf("%d ", Q.front()->data);
S.push(Q.front());
if(Q.front()->right!=NULL)
Q.push(Q.front()->right);
if(Q.front()->left!=NULL)
Q.push(Q.front()->left);
Q.pop();
}

while(!S.empty())
{
printf("%d ", S.top()->data);
S.pop();
}
----------------Recursive Version-------------------

void PrintLevelRecur(queue <.struct node *.> Q)
{
if(Q.empty())
return;

struct node * tmp = NULL;

if(Q.front()->left!=NULL)
Q.push(Q.front()->left);
if(Q.front()->right!=NULL)
Q.push(Q.front()->right);

tmp = Q.front();
Q.pop();

PrintLevelRecur(Q);

printf("%d ", tmp->data);


}
void PrintLevelbyLevel(struct node * node) //wrapper function
{
queue<.struct node *.> Q;

if(node!=NULL)
Q.push(node);

PrintLevelRecur(Q);
}

Dynamic Programming

Hash Tables

Applications require a dynamic set that supports only the dictionary operations INSERT, SEARCH, and DELETE. For example, a compiler maintains a symbol table.

Hash functions: a good hash function satisfies the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hashed to. It is usually straightforward in an application to devise some method for interpreting each key as a (possibly large) nature number.

The division method: map a key k into one of m slots by taking the remainder of k divided by m, h(k)= k mod m. m, a prime, should not be too close to an exact power of 2.

The multiplication method: h(k)=the floor of (k*A mod 1), A is a constant in the range 0 < A < 1.

Universal hashing: choose the hash function randomly in a way that is independent of the keys that are actually going to be stored.

Perfect hashing: the worst-case number of memory accesses required to perform a search is O(1). The set of keys must be static: once the keys are stored in the table, the set of keys never changes. The basic idea is to use a two-level hashing scheme with universal hashing at each level and there are no collisions at the second-level by choosing the hash function carefully.

R
esolve collision: chaining or open addressing.

Chaining: We put all the elements that hash to the same slot in a linked list.

Open-addressing:
All elements are stored in the hash table itself. The advantage of open addressing is that it avoids pointers altogether. To perform insertion, we successively probe the hash table until we found an empty slot in which to put the key. Deletion from an open-address hash table is difficult (think of the searching procedure).

Three techniques are commonly used to compute the probe sequences: linear probing (only m distinct probe sequences and a famous problem: primary clustering and the average search time increases), quadratic probing, and double hashing (h(k,i) = (h1(k) + i*h2(k)) mod m, the initial position, the offset, or both may vary).


Hash table Vs Binary Search Tree

A hash table can store and retrieve data quickly in O(1) or constant time. However, its uses beyond this are limited. A binary search tree can insert and retrieve in O(log(n)). A binary search tree also maintains its data in sorted order. For some data set such as 10,000 entries, a binary search tree's O(log(n)) will be fast enough.

Monday, October 29, 2007

Recursion

In many cases, your recursive functions may need additional data structures or an argument that tracks the recursion level. Often the best solution in such cases is to move the data structure or argument initialization code into a separate function. This is wrapper function.

Binary Search: running time O(log(n))

range = upper - lower;
error conditions:
1. range <0
2. range == 0 && array[lower] != target
target not exist
3. array[lower] > array[upper]
unsorted array

center = (upper-lower)/2 + lower;

Sunday, October 28, 2007

Standard Templates

1. Stack

#include <. stack.>

stack <.T.> s;

s. push(t);
s.pop();
s.top();
s.empty();
s.size();

Sorting Algorithm

Recursion Vs Iteration

http://www.refactoring.com/catalog/replaceRecursionWithIteration.html


Recurse function is not a function restarting itself, it is hundreds of functions that are each unfinished with the last one calling a new recurse function.

Recursion is implemented as a method that calls itself to solve subtasks. During the recursive call the values of the local fields of the method are placed on the method stack until the subtask performed by a recursive call is completed. Thus, whenever recursive method is called, local fields are put on the method stack and used again after the recursive call is completed. The general approach to Refactoring could probably be to implement the alternative Stack that "simulates" the method stack where the local fields could be placed.

Sometimes a recursive solution doesn't preserve any local fields during the recursive call, which makes Refactoring much easier. This is the case with, so called, tail recursions. Tail recursions are recursions where the recursive call is the last line in the method. Tail recursions are generally considered a bad practice and should be replaced with Iteration. This technique is well known to the people who work on compiler implementations. A good compiler usually performs this Refactoring on the fly.

Refactoring Mechanics:

1. Determine the base case of the Recursion. Base case, when reached, causes Recursion to end. Every Recursion must have a defined base case. In addition, each recursive call must make a progress towards the base case (otherwise recursive calls would be performed infinitely). In our example the base case is n == 0.

2. Implement a loop that will iterate until the base case is reached.

3. Make a progress towards the base case. Send the new arguments to the top of the loop instead to the recursive method.

Examples and Practices:

1. Reversely output an input line

void wrt_it(void)
{
int c;
if((c=getchar()) !='\n')
wrt_it();
putchar(c);
}

void wrt_stack()
{
stack s;
char single;

while((single = getchar())!= '\n')
s.push(single);

while(!s.empty())
{
cout << s.top();
s.pop();
}

}

2. Tree Preoder Traversal

void PreorderTraversal(node *root)
{
if(root!=NULL)
{
printf("%d\n", root->value);
PreorderTraversal(root->left);
PreorderTraversal(root->right);
}

}

3. Tree Inorder Traversal
void InorderTraversal(node *root)
{
if(root!=NULL)
{
InorderTraversal(root->left);
printf("%d\n", root->value);
InorderTraversal(root->right);
}

}

4. Tree Postorder Traversal
void InorderTraversal(node *root)
{
if(root!=NULL)
{
PostorderTraversal(root->left);
PostorderTraversal(root->right);
printf("%d\n", root->value);
}

}

5. n! (pronounced "n factorial")

Recursion relation:
n!= n* (n-1)!
0!=1!=1

int Factorial(int n)
{
if(n>1) //recursive case
return Factorial(n-1)*n;
else //base case
return 1;
}




Summary: If you understand exactly how the recursive implementation implicitly started data on the stack, you can write an iterative implementation that explicitly stores data on a stack in the same fashion.