Beta
292028

Prediction Of O-Glycosylation Site Using Pre-Trained Language Model And Machine Learning

Article

Last updated: 03 Jan 2025

Subjects

-

Tags

-

Abstract

O-glycosylation is a typical type of protein post-translational modifications (PTMs), which is linked to several diseases and has significant roles in many biological processes. Identification of O-glycosylation sites is important to know the mechanism of the O-glycosylation process. However, the identification of PTM sites by laboratory experimental tools is time and money-consuming. Thus, the utilization of computational and artificial intelligence is becoming essential to predict O-glycosylation sites. In this paper, we proposed a new model to improve O-glycosylation site prediction using a transformer-based protein language model and machine learning. The dataset was collected and prepared from a recent data source called OGP (O-glycoprotein repository). The TAPE (Tasks Assessing Protein Embeddings) protein language model was used to feature extraction from the peptide sequences using the embedding strategy. Then, feature selection was implemented using the linear support vector machine (SVM) to select informative features. The XGBoost ensemble-based machine learning method was utilized for classification and prediction. The proposed model achieved high-performance results with 0.7761 accuracy, 0.7391 sensitivity, 0.8130 specificity, 0.8295 AUC, and 0.5537 MCC when compared with the traditional machine learning methods. On an independent dataset, the proposed method performed better than the latest available methods for predicting O-glycosylation sites.

DOI

10.21608/ijicis.2023.160986.1218

Keywords

protein language model, Machine Learning, XGBoost, Bioinformatics, O-glycosylation site prediction

Authors

First Name

Alhasan

Last Name

Alkuhlani

MiddleName

-

Affiliation

Computer Science Departement, Faculty of Computer Information Sciences, Ain shams University, Cairo, Egypt

Email

alhasan.alkuhlani@gmail.com

City

-

Orcid

0000-0003-0725-9923

First Name

Walaa

Last Name

Gad

MiddleName

-

Affiliation

Information Systems Department, Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt

Email

walaagad@cis.asu.edu.eg

City

cairo

Orcid

0000-0002-7816-3518

First Name

Mohamed

Last Name

Roushdy

MiddleName

Ismail

Affiliation

Faculty of Computer and Information Technology, Future University in Egypt, Cairo, Egypt

Email

mohamed.roushdy@fue.edu.eg

City

Cairo

Orcid

0000-0002-9655-3229

First Name

Abdel-Badeeh

Last Name

Salem

MiddleName

M.

Affiliation

Computer Sciece Department, Faculty of Computer and Information Sciences, Ain Shams University

Email

absalem@cis.asu.edu.eg

City

Cairo

Orcid

0000-0001-5013-4339

Volume

23

Article Issue

1

Related Issue

40411

Issue Date

2023-03-01

Receive Date

2022-09-05

Publish Date

2023-03-01

Page Start

41

Page End

52

Print ISSN

1687-109X

Online ISSN

2535-1710

Link

https://ijicis.journals.ekb.eg/article_292028.html

Detail API

https://ijicis.journals.ekb.eg/service?article_code=292028

Order

292,028

Type

Original Article

Type Code

494

Publication Type

Journal

Publication Title

International Journal of Intelligent Computing and Information Sciences

Publication Link

https://ijicis.journals.ekb.eg/

MainTitle

Prediction Of O-Glycosylation Site Using Pre-Trained Language Model And Machine Learning

Details

Type

Article

Created At

23 Dec 2024