# PageRank: A Case Study for Discrete-time Markov Chains and Graph Theory

Def: Let a webpage be a vertex in a directed graph such that

1. the weight of an outgoing edge connected to the vertex can only take values in the interval $(0, 1]$
2. the sum of the weights of all outgoing edge connected to the vertex equals 1
3. the weight of an outgoing edge connected to the vertex is $\frac{1}{k}$ where k is the total number of outgoing edges connected to the vertex.

Def: Let a hyperlink be any outgoing edge connected to a webpage.

Def: Let an internet be a graph $I = (V, E)$ such that all vertices $v$ in $V$ are webpages and all edges $e$ in $E$ are hyperlinks. The following definition is taken from [1]:

Seem familiar? If not, I will write it in an equivalent but more recognizable form [3]:
These are the global balance equations of a Discrete-time Markov Chain!
Let $latex I$ be an internet represented by the following diagram. [2]
The above equations can be used to calculate the ranks of the four webpages in internet I. The ranks of webpage 1, 2, 3 and 4 are 0.38, 0.12, 0.29 and 0.19 respectively. [2]
Algorithm [3] (slightly modified):
1. For an internet, determine all webpages and links between them.
2. Create a directed graph.
3. If page i has k > 0 outgoing links, then the probability of each outgoing link is 1/k.
4. The vertices of the graph are states and its links are transitions of a Markov Chain. Solve the Markov Chain to get its limiting probabilities. The limiting probabilities are the pageranks of the webpages.
It is important to note that not every internet is conveniently ergodic (i.e aperiodic and irreducible. All internets are finite so positive recurrence is not a problem). See page 4 and 5 of [1] to see how Brin and Page solve this problem by creating artificial rank sources in rank sinks. A better explanation is provided in [2] with the title “The solution of Page and Brin”. Of course, to test the pagerank algorithm, Brin and Page created the Google Search Engine initially hosted at google.stanford.edu and then self-hosted at google.com. You might have heard of it.
References:
[1] Paper: “The PageRank Citation Ranking: Bringing Order to the Web” by Brin, Page.
[2] “The Mathematics of Web Search” lecture notes (Winter, 2009) Cornell University
[3] Notes from Dr. Zartash’s lectures on DTMCs and Queueing theory and Notes from Dr. Ihsan’s lecture on the PageRank Algorithm.