Evaluating Parallel Ward Algorithm for Drug Discovery

Document Type : Original Article


1 Computer Science dept., Faculty of computers and Information, Menoufia University, Egypt

2 Faculty of Computer and Information Menoufia University


Millions of compounds are now available in chemical libraries and scientists have to test these compounds against
biological targets in order to identify lead compounds. The identification of lead compounds is a key step in the drug discovery process. So, there are many hierarchical clustering algorithms are developed and modified for that purpose. Ward algorithm is one of the most popular hierarchical clustering algorithms that are used in many applications in the drug discovery process because of it is accuracy. But, it has limitation to handle large data sets within a reasonable time and memory resources. In this paper, we evaluate and compare two parallel approaches to run ward algorithm. The two approaches are parallel for loop and MapReduce framework. The results shows that parallel for loop failed to reduce computational time of ward algorithm due to overhead needed for data communications. But, MapReduce framework shows considerable reduction in computational time. The parallel ward algorithm saves 17% of time using
three nodes and saves 58% of time using six nodes using MapReduce.