Beta
65223

IMPROVED FOCUSED CRAWLING USING BAYESIAN OBJECT BASED APPROACH

Article

Last updated: 25 Dec 2024

Subjects

-

Tags

-

Abstract

The rapid growth of the World-Wide-Web made it difficult for general purpose search engines, e.g. Google and Yahoo, to retrieve most of the relevant results in response to the user queries. A vertical search engine specialized in a specific topic became vital. Building vertical search engines is accomplished by the help of a focused crawler. A focused crawler traverses the web selecting out relevant pages to a predefined topic and neglecting those out of concern. The focused crawler is guided toward those relevant pages through a crawling strategy. In this paper, a new crawling strategy is presented that helps building a vertical search engine. With this strategy, the crawler is kept focused to the user interests toward the topic. We build a model that describes the Web pages' features that distinguish relevant Web documents from those that are irrelevant. This is accomplished in the form of a supervised learning process, the web page is treated as an object having a set of features, and the features' values determine the relevancy of the web page through a Bayesian model. Results from practical experiments proved the efficiency of the proposed crawling strategy.

DOI

10.21608/mjeer.2008.65223

Authors

First Name

Ahmed

Last Name

Ghozia

MiddleName

-

Affiliation

Dept. of computer Science and Eng., Faculty of Electronic Engineering, Menoufya University, EGYPT

Email

-

City

-

Orcid

-

First Name

Hoda

Last Name

Sorour

MiddleName

-

Affiliation

Dept. of computer Science and Eng., Faculty of Electronic Engineering, Menoufya University, EGYPT.

Email

-

City

-

Orcid

-

First Name

Ashraf

Last Name

Aboshosh

MiddleName

-

Affiliation

Eng. Dept., NCRRT, Atomic Energy Authority, EGYPT

Email

-

City

-

Orcid

-

Volume

18

Article Issue

1

Related Issue

9682

Issue Date

2008-01-01

Receive Date

2019-12-15

Publish Date

2008-01-01

Page Start

49

Page End

60

Print ISSN

1687-1189

Online ISSN

2682-3535

Link

https://mjeer.journals.ekb.eg/article_65223.html

Detail API

https://mjeer.journals.ekb.eg/service?article_code=65223

Order

4

Type

Original Article

Type Code

1,088

Publication Type

Journal

Publication Title

Menoufia Journal of Electronic Engineering Research

Publication Link

https://mjeer.journals.ekb.eg/

MainTitle

IMPROVED FOCUSED CRAWLING USING BAYESIAN OBJECT BASED APPROACH

Details

Type

Article

Created At

22 Jan 2023