E-PROBCONS: Enhanced PROBCONS for Multiple Sequence Alignment

Document Type : Original Article

Authors

1 Computer Science Dept, Faculty of Computers and Information, Menoufia University, Egypt.

2 Faculty of Computer and Information Menoufia University

3 Faculty of Computers and Information, Menoufia University, Egypt

Abstract

Abstract— the perfect alignment between three or more sequences of protein, RNA or DNA is a very difficult task in Bioinformatics. There are many techniques for alignment of multiple sequences. Many techniques enlarge speed and do not have a concern with the accuracy of the resulting alignment. However, other techniques heighten accuracy and do not have a concern with the speed. The vital goals of any technique are (a) reducing memory and execution time requirements, and (b) increasing the accuracy of multiple sequence alignment on large-scale datasets. PROBCONS is a multiple protein sequence alignment (MPSA) tool that achieves the most expected accuracy, but it has a time-consuming problem. To solve this problem and enlarging the accuracy of the MPSA, E-PROBCONS is proposed to enhance PROBCONS tool. E- PROBCONS cluster the large multiple protein sequences into structurally similar protein sequences. Then PROBCONS MPSA tool will be performed in parallel on the Amazon Elastic Cloud (EC2). The proposed approaches are more suitable for large-scale data sets and short sequences. Comparing with algorithms (e.g., PROBCONS, KALIGN, and HALIGN I), provided more than 50% improvement in terms of average sum of pairs alignment scores (SPscores) and reduce the execution time for producing the alignment result. The proposed approaches are implemented on big data framework Hadoop Map-Reduce platform in order to improve the scalability with different protein datasets.

Keywords