PageRank: A Case Study for Discrete-time Markov Chains and Graph Theory

Def: Let a webpage be a vertex in a directed graph such that

  1. the weight of an outgoing edge connected to the vertex can only take values in the interval (0, 1]
  2. the sum of the weights of all outgoing edge connected to the vertex equals 1
  3. the weight of an outgoing edge connected to the vertex is \frac{1}{k} where k is the total number of outgoing edges connected to the vertex.

Def: Let a hyperlink be any outgoing edge connected to a webpage.

Def: Let an internet be a graph I = (V, E) such that all vertices v in V are webpages and all edges e in E are hyperlinks. The following definition is taken from [1]:

15137410_1571513429529300_6712046621396938499_o

Seem familiar? If not, I will write it in an equivalent but more recognizable form [3]:
15178106_1571530519527591_8744008906724142423_n
These are the global balance equations of a Discrete-time Markov Chain!
Let $latex I$ be an internet represented by the following diagram. [2]
The above equations can be used to calculate the ranks of the four webpages in internet I. The ranks of webpage 1, 2, 3 and 4 are 0.38, 0.12, 0.29 and 0.19 respectively. [2]
15181565_1571553169525326_3355518001003050542_n
Algorithm [3] (slightly modified):
  1. For an internet, determine all webpages and links between them.
  2. Create a directed graph.
  3. If page i has k > 0 outgoing links, then the probability of each outgoing link is 1/k.
  4. The vertices of the graph are states and its links are transitions of a Markov Chain. Solve the Markov Chain to get its limiting probabilities. The limiting probabilities are the pageranks of the webpages.
It is important to note that not every internet is conveniently ergodic (i.e aperiodic and irreducible. All internets are finite so positive recurrence is not a problem). See page 4 and 5 of [1] to see how Brin and Page solve this problem by creating artificial rank sources in rank sinks. A better explanation is provided in [2] with the title “The solution of Page and Brin”. Of course, to test the pagerank algorithm, Brin and Page created the Google Search Engine initially hosted at google.stanford.edu and then self-hosted at google.com. You might have heard of it.
References:
[1] Paper: “The PageRank Citation Ranking: Bringing Order to the Web” by Brin, Page.
[2] “The Mathematics of Web Search” lecture notes (Winter, 2009) Cornell University
[3] Notes from Dr. Zartash’s lectures on DTMCs and Queueing theory and Notes from Dr. Ihsan’s lecture on the PageRank Algorithm.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s