Tuesday, November 27, 2007

Measuring Program Execution Time

Two basic mechanisms to record the passage of time:
  • Interval Counting, which is based on a low frequency time that periodically interrupts the procesor. The operating system uses the time to record the cumulative time used by each process and maintains counts of the amount of user time and the amount of system time used by each process. There are libraries functions for this time (linux) and clock (time.h, ANSI C). We can compute the total time between two different points in a program execution by making two calls to times and computing the difference of the return values. Disadvantage: The interval accounting scheme makes no attempt to resolve time more finely than the time interval. Interval counting is only useful for measuring relatively long computations-- 100,000,000 clock cycles or more and is too coarse-grained to use for any measurement having duration of less than 100 ms.
  • Cycle Counters, which is based on a counter that is incremented every clock cycle. The timer is a special register that gets incremented every single clock cycle. Special machine instructions (rdtsc for "read time stamp counter") can be used to read the value of the counter. Cycle counters provide a very precise tool for measuring the time that elapses between two different points in the execution of a program. The disadvantage is that they do not keep track of which process uses those cycles or whether the procesor is operating in kernel or user mode. Context switching and cache operations cause extreme variation in execution time. To overcome this, K-best measurement scheme is proposed but it is reasonably robust for measuring durations shorter than the timer interval.

Process Scheduling and Timer Interrupts: computers have an external timer that periodically generates an interrupt signal to the processor. The spacing between these interrupt signals is called the interval time. When a timer interrupt occurs, the operating system scheduler can choose to either resume the currently executing process or to switch to a different process. This interval time must be set short enough to ensure that the processor will switch between tasks often enough to provide the illusion of performing many tasks simutaneously. Typical timer intervals range between 1 and 10 milliseconds.

Kernel operation (in kernel mode) such as handling page faults, input, or output is considered part of each regular process rather than a separate process. When the scheduler switches from process A to process B, it must enter kernel mode to save the state of process A (still considered part of process A) and to restore the state of proces B (considered part of proces B).

Looking into the future:

  • Process-specific cycle timing. All that required is to store the count as part of the process' state.
  • Variable Rate Clocks. Power consumption is directly proportional to the clock rate,and to reduce power consumption, future systems will vary the clock rate.

Sunday, November 25, 2007

Garbage Collection

A garbage collector is a dynamic storage allocator that automatically frees allocated blocks that are no longer needed by the program. Such blocks are known as garbage and hence the term garbage collector. The process of automatically reclaiming heap storage is known as garbage collection.

A garbage collector views memory as a directed reachability graph. The nodes of the graph are partitioned into a set of root nodes and a set of heap nodes. Each heap node corresponds to an allocated block in the heap. Root nodes correspond to locations not in the heap that contain pointers into the heap. Those locations can be registers, variables on the stack, or global variables in the read-write data area of virtual memory. The role of a garbage collector is to maintain some representation of the reachability graph and periodically reclaim the unreachable nodes by freeing them and returning them to the free list.

Mark & Sweep algorithm (by McCarthy): A Mark&Sweep Garbage Collector consists of a mark phase, which marks all reachable and allocated descendants of the root nodes, followed by a sweep phase, which frees each unmarked allocated block by iterating over each block in the heap and freeing any unmarked allocated blocks that it encounters.

Wednesday, November 21, 2007

Dynamic memory allocation

Computer Systems

10.9.5 Implementation issues

A practical allocator that strikes a better balance between throughput and utilization must consider the following issues:
  • Free block organization: how do we keep track of free blocks?
  • Placement: How do we choose an appropriate free block in which to place a newly allocated block?
  • Splitting: After we place a newly allocated block in some free block, what do we do with the remainder of the free block?
  • Coalescing: What do we do with a block that has just been freed?
10.9.6 Implicit Free Lists

Any practical allocator needs some data structure that allows it to distinguish block boundaries and to distinguish between allocated and free blocks. Most allocators embed this information in the block themselves. An example is as follow.

A block consists of a one-word (4 bytes) header, the payload, and possibly some additional padding. The header encodes the block size as well as whether the block is allocated or free. If we impose a double-word alignment constraint, then the block size is always a multiple of eight and the three low-order bits of the block size are always zero.

This organization is called implicit free list because the free blocks are linked implicitly by the size fields in the headers. The allocator can indirectly traverse the entire set of free blocks by traversing all of the blocks in the heap. We also need some kind of specially marked end block. The advantage of an implicit free list is simplicity. A significant disadvantage is the cost of any operation, such as placing allocated blocks, requires a search of the free list will be linear in the total number of allocated and free blocks in the heap.

10.9.7 Placing Allocated Blocks

Three common placement policies are: first fit, next fit and best fit. All of them have advantages and disadvantages.

10.9.8 Splitting Free Blocks

If the fit is not good, then the allocator will usually opt to split the free block into two parts. The first part becomes the allocated block, and the remainder becomes a new free block.

10.9.9 Getting Additional Heap Memory

If the allocator is unable to find a fit for the requested block, it will ask the kernel for additional memory, either by calling the mmap or sbrk functions.

10.9.10 Coalescing Free Blocks

There are free blocks adjacent to the newly freed block, which is known as false fragmentation. Coalescing comes in to rescue this phenomenon by merge adjacent free blocks. When to perform such operations??? Immediate coalescing Vs Defer coalescing. The former could introduce a form of thrashing, where a block is repeatedly coalesced and then split soon thereafter.. Fast allocators often opt for some form of deferred coalescing.

10.9.11 Coalescing with Boundary Tags

Coalescing the next free block is straightforward and efficient. How would we coalesce the previous block? Knuth developed boundary tags, which allows for constant-time coalescing of the previous block. The idea is to add a footer (the boundary tag) at the end of each block, where the footer is a replica of the header. If each block includes such a footer, then the allocator can determine the starting location and status of the previous block by inspecting its footer, which is always one word away from the start of the current block.

10.9.14 Segregated Free Lists (A popular choice with production-quality allocators such as GNU malloc)

Segregated storage: maintain multiple free lists, where each list holds blocks that are roughly the same size. Two basic approaches: simple segregated storage and segregated fits.

Simple segregated storage: the free list for each size class contains same-sized blocks, each the size of the largest element of the size class.

Segregated Fits: Each list contains potentially different-sized blocks whose sizes are members of the size class.

Saturday, November 17, 2007

Thursday, November 15, 2007

Concurent Programming with Threads

Each thread associated with a process has its own stack, registers, condition codes, stack pointer, program counter and thread ID (TID). A thread context switch is faster than a process context switch.

Reaping
Terminated Threads

pthread_join (pthread_t tid, void **thread_return);
The pthread_join function blocks until thread tid terminates and then reaps any memory resources held by the terminated thread.

Joinable Vs Detached Thread

A thread is either joinable or detached. A joinable thread can be reaped and killed by other threads. Its memory resources are NOT freed until it is reaped by another thread. A detached thread cannot be reaped or killed by other threads. Its memory resources are freed automatically by the system when it terminates. By default, threads are created joinable. Threads can detach themselves by call pthread_detach(pthread_self()).

Protecting Shared Variables with Semaphores

A semaphore, s, is a global variable with a nonnegative values can only be manipulated by two special operations, called P (to test) and V (increment):
  • P(s): while (s <= 0) ; s-- ;
  • V(s): s++;
A binary semaphore is often called mutex, which provides mutual exclusive access to the shared data.

Signaling with Semaphores

A classic example is producer-consumer interaction, which is common in real systems. For the producer-consumer interaction, we typically need two semaphores, called full and empty.

Producer:

P(empty);
write buffer;
V(full)

Consumer:

P(full)
read buffer;
V(empty)

-------------------------------------------------------------
Synchronizing Threads with Mutex and Condition Variables

Mutex is used to protect sharing and like a binary semaphore. And condition variables are used for signaling. To understand condition variables, we will take pthread as an example.

pthread_cond_init(pthread_cond_t * cond, pthread_condattr_t *attr);
pthread_cond_wait(pthread_cond_t * cond, pthread_mutex_t *mutex);
pthread_cond_signal(pthread_cond_t * cond);

Pthreads require that a mutex variable be associated with the condition variable cond, and that a call to pthread_cond_wait must always be protected by that mutex:

pthread_mutex_lock(&mutex);
pthread_cond_wait(&cond, &mutex);
pthread_mutex_unlock(&mutex);

some thread later indicates that the condition associated with cond has become true by making a call pthread_cond_signal(&cond), which wakes up exactly one of the waiting threads.

Reentrant Functions: DO NOT reference any shared data when they are called by multiple threads and thus they require no synchronization operations.

Other synchronization errors

Races: A race occurs when the correctness of a program depends on one thread reaching point x in its control flow before another thread reaches point y. A golden rule is that threaded programs must work correctly for any feasible trajectory in the progress graph.

Deadlock: a collection of threads are blocked, waiting for a condition that will never be true.

Wednesday, November 14, 2007

TCP/IP

TCP Protocol runs only in the end systems and not in the intermediate network elements, the intermediate network elements do not maintain TCP connection state. A TCP connection is also always point-to-point, between a single sender and a single receiver.

Three-way handshake connection-establishment procedure: the client first sends a special TCP segment, the server responds with a second special TCP segment, and finally the client responds again with a third special segment. The first two segments carry no payload, that is, no application data; the third of these segments may carry a payload.

TCP segment structures: 1) source port # 2) dest port # 3) sequence number 4) acknowledge number 5) receive window 6)checksum 7)Sign bits: ack urg, psh,rst,syn, fin8) header length 9)options 10)data 11)urgent data pointer

Sequence numbers: TCP views data as an unstructured, but ordered, stream of bytes. Sequence number for a segment is therefore the byte-stream number of the first byte in the segment.

Acknowledge numbers: The acknowledgment number that Host A puts in its segment is the sequence number of the next byte Host A is expecting from Host B. Because TCP only acknowledges bytes up to the first missing byte in the stream, TCP is said to provide cumulative acknowledgments.

TCP Congestion Control: TCP congestion-control mechanism has each side of a connection keep track of an additional variable, the congestion window, which imposes a constraint on the rate at which a TCP sender can send traffic into the network. 1) additive-increase, multiplicative-decrease 2) Slow start, the TCP sender begins by transmitting at a slow rate(hence the term slow start), but increases its sending rate exponentially. 3) reaction to timeout events. After a timeout event, a TCP sender enters a slow-start phase. TCP manages a variable called Threshold, which determines the window size at which slow start will end and congestion avoidance will begin. The canceling of the slow-start phase after a triple duplicate ACK is called fast recovery.

Priority Queue

Sunday, November 11, 2007

Graph Algorithms

Strongly connected components

String Algorithms

1. anagram (a word or phrase made by transposing the letters of another word or phrase)

Question: Find all the anagrams in a dictionary
Solution: 1) Sign each word in the dictionary so that words in the same anagram class have the same signature, and then 2) bring together words with equal signatures. A signature scheme based on sorting is as follow: order the letters within the word alphabetically. To bring together, we can sort the words in the order of their signatures.

2. Palindrome ( a word, verse, or sentence (as *Able was I ere I saw Elba*) or a number (as 1881) that reads the same backward or forward)

3. String rotation without extra memory, used in text editor (move lines)

For example, abcdefg -> efgabcd. Step 1: Reverse the substring separately. abcdefg->dcbagfe Step 2: Reverse the whole string: dcbagfe->efgabcd

Saturday, November 10, 2007

Thursday, November 8, 2007

Sorting in Linear time

1. Counting Sort (input consists of integers in a small range)
2. Radix Sort
3. Bucket Sort (input is uniformly distributed)

Any comparison sort algorithm requires nlgn comparisons in the worse case.

Counting Sort (Running time: theta(n))

Counting sort assume that each of the n input elements is an integer in the range 0 to k, for some integer k. The basic idea of counting sort is to determine, for each input element x, the number of elements less than x. This information can be used to place element x directly into its position in the output array. We need to do a slight modification to handle the situation in which several elements have the same value. In practice, we usually use counting sort when we have k = O(n), the running time is theta(n).

Counting sort is stable: numbers with the same value appear in the output array in the same order as they do in the input array, which is very important since counting sort is often used as a subroutine in radix sort.

Counting sort requires two other arrays: the array B[1...n] holds the sorted output, and the array C[0...k] provides temporary working storage.

The algorithm itself is not complicated, watch out boundary errors when you try to implement it.

Radix Sort

Radix-Sort(A, d)
1. for i <-- 1 to d
2. do use a stable sort to sort array A on digit i

Radix sort can be used to sort records of information that are keyed by multiple fields.

Given n d-digit numbers in which each digit can take on up to k possible values, Radix-Sort correctly sorts these numbers in theta(d(n+k)) time. When d is constant and k = O(n), radix sort runs in linear time.

Bucket Sort

Bucket sort, or bin sort, is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Bucket sort is a generalization of pigeonhole sort.




WinCVS

http://www.devdaily.com/wincvs/HowToUseWinCVS/

http://ikon.as/wincvs-howto/#startingnew

TightVNC

An enhanced version of VNC (an abbreviation for Virtual Network Computing) is a great client/server software package allowing remote network access to graphical desktops.

TightVNC can be used to perform remote control and administration tasks in Windows, Unix and mixed network environments.

http://www.tightvnc.com/

Dead Lock

Network Flow

Terms:
  • A Flow in graph G must satisify 1) capacity conditions. For each e belong to G, 0<=f(e)<=Ce. 2) conservation conditions. For each node v other than s and t, the flow value over all edges entering node v equals the sum of flow values over all edges leaving node v.
  • The capacity of a cut, denoted as c(A, B), is the sum of the capacities of all edges out of A.
Properties:
  • The residual graph Gf has at most twice as many edges as G.


The value of every flow is upper-bounded by the capacity of every cut. Let f be any s-t flow, and (A, B) any s-t cut, Then the value of flow v(f) <= c(A, B) , which is the capacity of a cut (A, B).


Max-Flow Min-Cut Theorem

In every flow network, there is a maximum flow f and a minimum cut (A, B) so that v(f) = c(A,B).
There can be many minimum-capacity cuts in a graph G. All edges out of A are completed saturated with flow, while all edges into A are completely unused.

Ford-Fulkerson Algorithm
: this algorithm will terminate when there is no s-t path in the residual graph. The algorithm is as follows.

Initially f(e)=0 for all e in G
While there is an s-t path in the residual graph Gf
...Let P be a simple s-t path in Gf
...f'=augment(f,P) // the running time for this helper function is O(m)
...Update f to f'
...Update the residual graph Gf to be Gf'
Endwhile
Return f

Helper function
augment(f,P)
..Let b = bottleneck(P,f)
..For each edge (u,v) in P
....If e=(u,v) is a forward edge then then
......increase f(e) in G by b
....Else ((u,v) is a backward edge, and let e=(v,u))
.....decrease f(e) in G by b
....Endif
..Endfor
..Return ()






Edmonds-Karp Algorithm

Tuesday, November 6, 2007

LaTeX

A very power formatter, you should not miss it.

Good tutorial on the website:

http://www.maths.tcd.ie/~dwilkins/LaTeXPrimer/

Monday, November 5, 2007

Associative array: Hash Table vs BST

An associative array (also map, mapping, hash, dictionary, finite map, lookup table, and in query-processing an index or index file) is an abstract data type composed of a collection of keys and a collection of values, where each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding.

Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures.

Representations: Hash Table vs Binary Search Tree (self-balancing)

There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Skip lists are also an alternative, though relatively new and not as widely used. Relative advantages and disadvantages include:

  • Hash tables have faster average lookup and insertion time (O(1)), while some kinds of binary search tree have faster worst-case lookup and insertion time (O(log n) instead of O(n)). Hash tables have seen extensive use in real time systems, but trees can be useful in high-security real time systems where untrusted users may deliberately supply information that triggers worst-case performance in a hash table, although careful design can remove that issue. Hash tables shine in very large arrays, where O(1) performance is important. Skip lists have worst-case operation time of O(n), but average-case of O(log n), with much less insertion and deletion overhead than balanced binary trees.
  • Hash tables can have more compact storage for small value types, especially when the values are bits.
  • There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
  • Building a hash table requires a reasonable hash function for the key type, which can be difficult to write well, while balanced binary trees and skip lists only require a total ordering on the keys. On the other hand, with hash tables the data may be cyclically or partially ordered without any problems.
  • Balanced binary trees and skip lists preserve ordering allowing one to efficiently iterate over the keys in order or to efficiently locate an association whose key is nearest to a given value. Hash tables do not preserve ordering and therefore cannot perform these operations as efficiently.
  • Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys, or to count the number of keys in an ordered range.
http://en.wikipedia.org/wiki/Associative_array

Sunday, November 4, 2007

Splay Tree

A splay tree is a self-balancing binary search tree. A splay tree guarantees that any M consecutive tree operations starting from an empty tree take at most O(MlogN) time. The net effect is: there is no bad input sequences.

Good performance for a splay tree depends on the fact that it is self-balancing, and indeed self optimizing, in that frequently accessed nodes will move nearer to the root where they can be accessed more quickly. This is an advantage for nearly all practical applications, and is particularly useful for implementing caches; however it is important to note that for uniform access, a splay tree's performance will be considerably (although not asymptotically) worse than a somewhat balanced simple binary search tree.

The basic idea of the splay tree is that after a node is accessed, it is pushed to the root by a series of avl tree rotations. The intuition behind this: when a node is accessed, it is likely to be accessed again in the near future in many real applications. The side effect is a balancing tree to some extent.

The splay tree does NOT require the maintenance of height or balance information, thus saving space and simplifying the code to some extent. Splay trees are much simpler to program than AVL trees.

Assume X be a node on the access path at which we are rotating. X has both a parent (P) and a grandparent (G).
Splay Operations: 1) zig-zag case, where X is a right child and P is a left child (or vice versa). we perform a double rotation, exactly like an AVL double rotation. 2) zig-zig case. X and P are both left children. 3) zig-case. X's only parent is the root.

http://en.wikipedia.org/wiki/Splay_tree

Saturday, November 3, 2007

avl Tree: balanced binary search tree

An avl (Adelson-Velskii and Landis) tree is a binary search tree with a balance condition, the height of the left and right subtrees can differ by at most 1. The depth of the tree is O(logn).

After an insertion, only nodes that are on the path from the insertion point to the root might have their balance altered because only those nodes have their subtrees altered.

Possible violation might occur in four cases:

1. An insertion into the left subtree of the left child
2. ...................................right........................................
3. .................................... left ........................ right .....
4. .................................. right ....................... right .....

1. and 4. could be fixed by single rotate

2. and 3. could be fixed by double rotate

Rotation operations only need a few pointer changes. The node will need to record the height info within it.

Deletion in avl Tree is somewhat more complicated than insertion. And lazy-deletion is probably the best strategy.


Reference: Data structures and Algorithm Analysis in C++ Page 136

B-Trees

a B-tree is a tree data structure that keeps data sorted and allows searches, insertions, and deletions in logarithmic amortized time. It is most commonly used in databases and filesystems.

A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more hop further removed from the root.

B-trees have substantial advantages over alternative implementations when node access times far exceed access times within nodes. This usually occurs when most nodes are in secondary storage such as hard drives. By maximizing the number of child nodes within each internal node, the height of the tree decreases, balancing occurs less often, and efficiency increases. Usually this value is set such that each node takes up a full disk block or an analogous size in secondary storage. While 2-3 B-trees might be useful in main memory, and are certainly easier to explain, if the node sizes are tuned to the size of a disk block, the result might be a 129-513 B-tree.

Search: Search is performed in the typical manner, analogous to that in a binary search tree. Starting at the root, the tree is traversed top to bottom, choosing the child pointer whose separation values are on either side of the value that is being searched. Binary search is typically (but not necessarily) used within nodes to find the separation values and child tree of interest.

---------------------------------------
2-3 Tree

* Every non-leaf node has 2 or 3 children
* All data is kept in the leaves
* All leaves are at the same level (the bottom level)
* All data is kept in sorted order
* Every non-leaf node will contain 1 or 2 fields.

These contain one or two fields which indicate the range of values in its subtrees. If a node has two children, it will have one field; if the node has three children, it will have two fields. Each non-leaf node will contain a value in field 1 which is greater than the largest item in its left sub-tree, but less than or equal to the smallest item in its right sub-tree (or center sub-tree, if it has three children). If that node has three children, field 2 contains a value which is greater than the largest value in the center sub-tree, but less than or equal to the smallest item in its right sub-tree. The purpose of these values is to direct a search function to the correct sub-tree and eventually to the correct data node.

The biggest drawback with the 2-3 tree is that it requires more storage space than the normal binary search tree.
---------------------------------------
2-3-4 Trees

A 2-3-4 tree is a B-tree of order 4.

2-3-4 trees are relatively difficult to implement in most programming languages because of the large number of special cases involved in operations on the tree. Red-black trees (see below) are simpler to implement, so they tend to be used instead.
---------------------------------------

http://en.wikipedia.org/wiki/B-tree
http://en.wikipedia.org/wiki/2-3_tree

Red-Black Tree

A binary search tree is a red-black tree if:

1. Every node is either red or black
2. Every leaf (nil) is black
3. If a node is red, then both its children are black
4. Every simple path from a node to a descendant leaf contains the same number of black nodes

Black-height of a node x, bh(x), is the number of black nodes on any path from x to a leaf, not counting x. A red-black tree with n internal nodes has height at most 2lg(n+1).

Thursday, November 1, 2007

Suffix Array

Suffix array: an array of pointers to characters. The pointers in the array point to every suffix in the string, hence the name "suffix array". Suffix array can be used to efficiently search a large text.

Applications:

1. Find the longest duplicated substring of characters in it.
The duplicated substring mush appear in two different suffixes. We sort the suffix array to bring together equal suffix. and then scan through this array comparing adjacent elements to find the longest repeated string. The running time is O(nlogn) due to sorting.

1.1. Given a new input string, How would you search a suffix array to find the longest match in the stored text?
1.2. Given two input texts, find the longest string that occurs in both.

2. Locate every occurrence of a substring within the string.
Finding every occurrence of the substring is equivalent to finding every suffix that begins with the substring. Thanks to the lexicographical ordering, these suffixes will be grouped together in the suffix array, and can be found efficiently with a binary search. If implemented straightforwardly, this binary search takes O(mlogn) time, where m is the length of the substring. To avoid redoing comparisons, extra data structures giving information about the longest common prefixes (LCPs) of suffixes are constructed, giving O(m + logn) search time.

http://en.wikipedia.org/wiki/Suffix_array