51视频

Computer Science and Information Technology Vol. 1(1), pp. 33 - 40
DOI: 10.13189/csit.2013.010104
Reprint (PDF) (352Kb)


Detecting Cloaking Web Spam Using Hash Function


Shekoofeh Ghiam1,*, Alireza Nemaney Pour2
1 Faculty of Computer Engineering, Sharif University of Technology, International Campus, Kish Island, Hormozgan, Iran
2 Faculty of Computer Engineering, Abhar Islamic Azad University, Abhar, Iran

ABSTRACT

Web spam is an attempt to boost the ranking of special pages in search engine results. Cloaking is a kind of spamming technique. Previous cloaking detection methods based on terms/links differences between crawler and browser's copies are not accurate enough. The latest technique is tag-based method. This method could find cloaked pages better than previous algorithms. However, addressing the content of web pages provides more accurate results. This paper proposes an algorithm, working based on term differences between crawler and browser's copies. In addition, dynamic cloaking, which is a new and complicated kind of cloaking, is addressed. In order to increase the speed of comparison, we introduce hash value, calculated by Hash Function. The proposed algorithm has been tested with a data set of URLs. Experimental results indicate that our algorithm outperforms previous methods in both precision and recall. We estimate that about 9% of all URLs in data set utilize static cloaking and about 2% of all URLs utilize dynamic cloaking.

KEYWORDS
Cloaking, Hash Function, Precision, Recall, Web Spam

Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Shekoofeh Ghiam , Alireza Nemaney Pour , "Detecting Cloaking Web Spam Using Hash Function," Computer Science and Information Technology, Vol. 1, No. 1, pp. 33 - 40, 2013. DOI: 10.13189/csit.2013.010104.

(b). APA Format:
Shekoofeh Ghiam , Alireza Nemaney Pour (2013). Detecting Cloaking Web Spam Using Hash Function. Computer Science and Information Technology, 1(1), 33 - 40. DOI: 10.13189/csit.2013.010104.