A Study Regarding Inter Domain Linked Documents Similarity and their Consequent Bounce Rate

published in Studia Universitatis Babeş-Bolyai, Seria Informatica, Vol. LIX, No. 1, pp. 83-91, 2014.

Cite as

Full paper

A Study Regarding Inter Domain Linked Documents Similarity and their Consequent Bounce Rate

Authors

Diana Halita, Darius Bufnea
Department of Computer Science, Faculty of Mathematics and Computer Science,
Babeş-Bolyai University of Cluj-Napoca

Abstract

Then main objective of linking inter domain documents is to offer to the reader access to supplementary, semantic related information. However, linking web domains is sometimes artificially used, especially when the goal is to abusively increase the page rank of the destination domain. This paper presents a study regarding inter domain linked documents similarity and their consequent bounce rate. For that, we have advanced a series of experiments which outlines how similarity functions’ behavior correlates with a website bounce rate. The method presented here could be used to identify within a web site improper placed outgoing links such as ads or spam links. Based on that, a search engine could fine the results in SERP by downgrading any website that fall in the above presented category.

Key words

bounce rate, document similarity, page ranking, identify improper placed links

BibTeX bib file

halita-bufnea-2014.bib

EndNote enw file

halita-bufnea-2014.enw

References

  1. L. Becchetti, C. Castillo, D. Donato, Link-Based Characterization and Detection of Web Spam, 2nd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb, Seattle, USA, August 2006, pp. 1-8.
  2. L. Becchetti, C. Castillo, D. Donato, S. Leonardi,R. Baeza-Yates, Link Analysis for Web Spam Detection: Link-based and Content Based Techniques, ACM Transactions on the Web (TWEB), Volume 2, Issue 1, New York, USA, February 2008, pp. 1-41.
  3. A. Farahat, M. Bailey, How Effective is Targeted Advertising?, Proceedings of the 21st World Wide Web Conference 2012, Lyon, France, April 16-20, 2012, pp. 111-120.
  4. Z. Gyongy, H. Garcia-Molina, P. Berkhin, J. Pedersen, Link Spam Detection Based on Mass Estimation, 32nd International Conference in Very Large Data Bases (VLDB), Seoul, Korea, 2006, pp. 439-450.
  5. A. Huang, Similarity Measures for Text Document Clustering, Proceedings of the New Zealand Computer Science Research Student Conference, Hamilton, New Zealand, 2008, pp. 49-56.
  6. J. Leskovec, A. Rajaraman, J. Ullman, Mining of Massive Datasets, Cambridge University Press, 2010.
  7. M. Najork, Detecting Spam Web Pages through Content Analysis, International World Wide Web Conference Committee, Edinburgh, Scotland, 2006, pp. 83-92.
  8. N. Spirin, J. Han, Survey on Web Spam Detection: Principles and Algorithms, ACM SIGKDD Explorations Newsletter, Volume 13, Issue 2, December 2011, pp. 50-64.
  9. D. Zhou, C. Burges, T. Tao, Transductive Link Spam Detection, Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web, AIRWeb, New York, USA, ACM Press, 2007, pp. 21-28.

Darius Bufnea