Subjects
-Tags
-Abstract
Spreadsheets are contained critical information on various topics and most broadly utilized in numerous spaces. There are a huge amount of spreadsheets clients around the world. As a result of their convenience, support for announcing and portrayal as diagrams and graphs and gives their makers an enormous level of opportunity in encoding their data as it simple to utilize. Tables produce a large amount of spreadsheet data. The expansion in volume and complexity of tables has prompted expanded necessities to preserve this data and reuse it. However, spreadsheets are hard to arrange with other data sources. As a result, it makes data stored in spreadsheets with low-quality.
We exhibited an automated extractor tool that gives the standard client a chance to concentrate on extracted relational tables from spreadsheets without experience in any programming language besides high-quality data extraction. The paper executed novel algorithms based on a heuristic approach for table extraction from a spreadsheet and implemented data improvement and quality rules using domain ontology for changing over between low-quality semi-structured data to high-quality relational data for reusability and integration as a Java program interfacing with SQL server database. The paper does experiments on 2 real public datasets. The percentage of improving the performance using the proposed approach on the 2 datasets are 100 % for extracting duplicated records and the percentage of successfully table identified are 100% and 85% respectively.
DOI
10.21608/ijicis.2021.51197.1045
Keywords
Spreadsheet Low-Quality, Data Cleaning, domain ontology, Relational Database, Spreadsheet Conversion
Authors
MiddleName
-Affiliation
Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Email
arwaawad91@yahoo.com
Orcid
-Affiliation
Faculty of Computer and Information Technology, Future University in Egypt, Cairo, Egypt
Email
mohamed.roushdy@fue.edu.eg
Affiliation
Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Email
rania.elgohary@cis.asu.edu.eg
Orcid
-Affiliation
Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt
Email
ibrahim_moawad@cis.asu.edu.eg
Orcid
-Link
https://ijicis.journals.ekb.eg/article_147759.html
Detail API
https://ijicis.journals.ekb.eg/service?article_code=147759
Publication Title
International Journal of Intelligent Computing and Information Sciences
Publication Link
https://ijicis.journals.ekb.eg/
MainTitle
-