Beta
147759

AN INTERACTIVE TOOL FOR EXTRACTING LOW-QUALITY SPREADSHEET TABLES AND CONVERTING INTO RELATIONAL DATABASE

Article

Last updated: 22 Jan 2023

Subjects

-

Tags

-

Abstract

Spreadsheets are contained critical information on various topics and most broadly utilized in numerous spaces. There are a huge amount of spreadsheets clients around the world. As a result of their convenience, support for announcing and portrayal as diagrams and graphs and gives their makers an enormous level of opportunity in encoding their data as it simple to utilize. Tables produce a large amount of spreadsheet data. The expansion in volume and complexity of tables has prompted expanded necessities to preserve this data and reuse it. However, spreadsheets are hard to arrange with other data sources. As a result, it makes data stored in spreadsheets with low-quality.
We exhibited an automated extractor tool that gives the standard client a chance to concentrate on extracted relational tables from spreadsheets without experience in any programming language besides high-quality data extraction. The paper executed novel algorithms based on a heuristic approach for table extraction from a spreadsheet and implemented data improvement and quality rules using domain ontology for changing over between low-quality semi-structured data to high-quality relational data for reusability and integration as a Java program interfacing with SQL server database. The paper does experiments on 2 real public datasets. The percentage of improving the performance using the proposed approach on the 2 datasets are 100 % for extracting duplicated records and the percentage of successfully table identified are 100% and 85% respectively.

DOI

10.21608/ijicis.2021.51197.1045

Keywords

Spreadsheet Low-Quality, Data Cleaning, domain ontology, Relational Database, Spreadsheet Conversion

Authors

First Name

Arwa

Last Name

Awad

MiddleName

-

Affiliation

Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

arwaawad91@yahoo.com

City

Cairo

Orcid

-

First Name

Mohamed

Last Name

Roushdy

MiddleName

Ismail

Affiliation

Faculty of Computer and Information Technology, Future University in Egypt, Cairo, Egypt

Email

mohamed.roushdy@fue.edu.eg

City

Cairo

Orcid

0000-0002-9655-3229

First Name

Rania

Last Name

ElGohary

MiddleName

Abd ElRahman

Affiliation

Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

rania.elgohary@cis.asu.edu.eg

City

Cairo

Orcid

-

First Name

Ibrahim

Last Name

Moawad

MiddleName

Fathy

Affiliation

Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

ibrahim_moawad@cis.asu.edu.eg

City

Cairo

Orcid

-

Volume

21

Article Issue

1

Related Issue

21725

Issue Date

2021-02-01

Receive Date

2020-11-26

Publish Date

2021-02-01

Page Start

1

Page End

18

Print ISSN

1687-109X

Online ISSN

2535-1710

Link

https://ijicis.journals.ekb.eg/article_147759.html

Detail API

https://ijicis.journals.ekb.eg/service?article_code=147759

Order

1

Type

Original Article

Type Code

494

Publication Type

Journal

Publication Title

International Journal of Intelligent Computing and Information Sciences

Publication Link

https://ijicis.journals.ekb.eg/

MainTitle

-

Details

Type

Article

Created At

22 Jan 2023