Development of MapReduced Topic Sensitive PageRank

Khadanga, Swaraj (2015) Development of MapReduced Topic Sensitive PageRank. BTech thesis.

[img]PDF
846Kb

Abstract

Today Search engines are smart enough to search the content as well as it can effectively rank the fetched page(s) in an useful manner. When an user search for a content in the search engine, the search engine fetches the web pages from the database server and shows the results in an organized order according to the importance of the website/web page .The importance of a page can be calculated with a PageRank value (i.e. the number of different pages point to it). If we analyze the web a little; we can observe that it forms a sparse graph with each node representing a web page and each edge representing one hyper link. More specically we can consider this graph to be directed. Hence the web can be represented as a matrix and PageRank can be formulated as a recursive linear equation and hence PageRank values can be calculated as an eigenvector to the equation. Spider trap and Dead end problems have been studied and those can be solved with the reformulation of web matrix with a random surfing probability also known as dumping factor. Considering these factors a map-reduce model can be developed and easily implemented in any Hadoop like environment. Map-Reduce takes the advantages of parallel processing in a cluster and sparseness of the web matrix also favors the choice of map-reduce programming model. Topic sensitive PageRank is studied and the map-reduce version for modified PageRank is designed and hence implemented

Item Type:Thesis (BTech)
Uncontrolled Keywords:PageRank MapReduce Hadoop Crawler Linear-Equation SCRAPY
Subjects:Engineering and Technology > Computer and Information Science > Data Mining
Divisions: Engineering and Technology > Department of Computer Science
ID Code:7702
Deposited By:Mr. Sanat Kumar Behera
Deposited On:26 May 2016 11:45
Last Modified:26 May 2016 11:45
Supervisor(s):Sahoo, M N

Repository Staff Only: item control page