A Distributed Q-Learning Approach to Fragment Assembly
Maria-Iuliana BOCICOR, Gabriela CZIBULA, István Gergely CZIBULA
1, M. Kogălniceanu street, Cluj-Napoca, 400084, Romania
Abstract: The process of DNA sequencing has nowadays become of great importance in basic biology research, as well as in various fields such as medicine, biotechnology or forensic biology. The fragment assembly problem is a very complex optimization problem that deals with sequencing of DNA, and many computational techniques including computational intelligence algorithms were designed for finding good solutions for this problem. Since DNA fragment assembly is a crucial part of any sequencing project, researchers are still focusing on developing better assemblers. We are introducing in this paper a distributed reinforcement learning based approach for solving the fragment assembly problem, an NP-complete optimization problem that attempts to reconstruct the original DNA sequence from a large number of fragments, each several hundred base-pairs long. Our model is based on a distributed Q-learning approach. The experimental evaluation of the proposed system has provided encouraging results, indicating the potential of our proposal. The advantages and drawbacks of the proposed approach are also emphasized.
Keywords: Bioinformatics, distributed reinforcement learning, DNA fragment assembly.
CITE THIS PAPER AS:
Maria-Iuliana BOCICOR, Gabriela CZIBULA, István Gergely CZIBULA, A Distributed Q-Learning Approach to Fragment Assembly, Studies in Informatics and Control, ISSN 1220-1766, vol. 20 (3), pp. 221-232, 2011.
The process of DNA sequencing has nowadays become of great importance in basic biology research, as well as in various fields such as medicine, biotechnology or forensic biology. Several techniques have been developed to achieve the DNA sequencing, but the main problem with the current technology is that it cannot read an entire genome at once, not even more than 1000 bases.
The DNA fragment assembly (FA) is a technique that attempts to reconstruct the original DNA sequence from a large number of fragments, each several hundred base-pairs long . It is an NP-hard combinatorial optimization problem  which is growing in importance and complexity as more research centers become involved on sequencing new genomes . Various heuristics, including computational intelligence algorithms, have been designed for solving the fragment assembly problem, but since this problem is a crucial part of any sequencing project, better assemblers are needed .
In this paper we aim at proposing a distributed reinforcement learning based model for solving the DNA Fragment Assemby problem. Reinforcement Learning (RL)  is an approach to machine intelligence in which an agent  can learn to behave in a certain way by receiving punishments or rewards on its chosen actions.
The model proposed in this paper extends toward a distributed approach the reinforcement learning based model that we have previously introduced in  for solving the FA problem. To our knowledge, except for the ant  based approaches, the DNA Fragment Assembly problem has not been addressed in the literature using distributed reinforcement learning, so far.
The rest of the paper is organized as follows. Section 2 presents the DNA fragment assembly problem and Section 3 briefly describes existing approaches in solving the considered problem. The fundamentals of distributed reinforcement learning are given in Section 4. Section 5 introduces the distributed reinforcement learning model that we propose for solving the fragment assembly problem. An experimental evaluation of the proposed approach is given in Section 6, and Section 7 provides an analysis of the introduced distributed model, emphasizing its advantages and drawbacks. Section 8 contains some conclusions of the paper and future development of our work.
- LI, L., S. KHURI, A Comparison of DNA Fragment Assembly Algorithms, in Proc. of the Intl. Conf. on Mathematics and Engineering Techniques in Medicine and Biological Sciences. CSREA Press, 2004, pp. 329-335.
- TALMACIU, M., E. NECHITA, Some Combinatorial Optimization Problems for Weak-bisplit Graphs, Studies in Informatics and Control, vol. 19, no. 4, 2010, pp. 427-434.
- HASSANIEN, A. E., M. G. MILANOVA, T. G. SMOLINSKI, A. ABRAHAM, Computational Intelligence in Solving Bioinformatics Problems: Reviews, Perspectives, and Challenges, in Computational Intelligence in Biomedicine and Bioinformatics, 2008, pp. 3-47.
- SUTTON, R. S., A. G. BARTO, Reinforcement Learning: An Introduction, MIT Press, 1998.
- SUSNEA, I., G. VASILIU, A. FILIPESCU, A. RADASCHIN, Virtual Pheromones for Real-time Control of Autonomous Mobile Robots, Studies in Informatics and Control, vol. 18, no. 3, 2009, pp. 233-240.
- BOCICOR, M., G. CZIBULA, I. G. CZIBULA, A Reinforcement Learning Approach for Solving the Fragment Assembly Problem, SYNASC 2011 – 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. IEEE Computer Society, 2011, submitted.
- DORIGO, M., T. STŰTZLE, Ant Colony Optimization, Scituate, MA, USA: Bradford Company, 2004.
- SANGER, F., A. COULSON, G. HONG, D. F. HILL, G. PETERSEN, Nucleotide Sequence of Bacteriophage Lambda DNA, J. Molecular Biology, vol. 162, no. 4, 1982, pp. 729-773.
- KOSTERS, W., Bioinformatics: Fragment Assembly, IPA-Algorithms and Complexity – course, 2007. [Online]. Available: http://www.liacs.nl/kosters/bio/bio.pdf.
- PEVZNER, P. A., Computational Molecular Biology: An Algorithmic Approach, MIT Press, 2000.
- KIKUCHI, S., G. CHAKRABORTY, Heuristically Tuned GA to Solve Genome Fragment Assembly Problem, IEEE CEC, 2006, pp. 1491-1498.
- KECECIOGLU, J. D., E. W. MYERS, Combinatiorial Algorithms for DNA Sequence Assembly, Algorithmica, vol. 13, no. 1/2, 1995, pp. 7-51.
- PARSONS, R. J., S. FORREST, C. BURKS, Genetic Algorithms, Operators, and DNA Fragment Assembly, in Machine Learning. Kluwer Academic Publishers, 1995, pp. 11-33.
- ANGELERI, E., B. APOLLONI, D. DE FALCO, L. GRANDI, DNA Fragment Assembly using Neural Prediction Techniques, Intl. Journal Neural Systems, vol. 9, no. 6, 1999, pp. 523-544.
- LUQUE, G., E. ALBA TORRES, S. KHURI, Assembling DNA Fragments with a Distributed Genetic Algorithm, Parallel Computing for Bioinformatics and Computational Biology, 2006, pp. 285-302.
- MEKSANGSOUY, P., N. CHAIYARATANA, DNA Fragment Assembly using an Ant Colony System Algorithm, in Proceedings of CEC’03 – vol.3. IEEE Press, 2003, pp. 1756-1763.
- WETCHARAPORN, W., N. CHAIYARATANA, S. TONGSIMA, DNA Fragment Assembly by Ant Colony and Nearest Neighbour Heuristics, Artificial Intelligence and Soft Computing ICAISC 2006, pp. 1008-1017.
- PEREZ-URIBE, A., Introduction to Reinforcement Learning, 1998, http://lslwww.epfl.ch/~anperez/RL/RL.html.
- RUSSELL, S., P. NORVIG, Artificial Intelligence – A Modern Approach, ser. Prentice Hall International Series in Artificial Intelligence. Prentice Hall, 2003.
- CLAUS, C., C. BOUTILIER, The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems, In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998, pp. 746-752.
- TAN, M., Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents, 1997, pp. 487-494.
- LAUER, M., M. RIEDMILLER, An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-agent Systems, Proceedings of International Conference on Machine Learning (ICML), 2000, pp. 535-542.
- MARIANO, C. E., E. MORALES, A New Distributed Reinforcement Learning Algorithm for Multiple Objective Optimization Problems, Lecture Notes In Artificial Intelligence, November 2000, pp. 290-299.
- DAYAN, P., T. SEJNOWSKI, TD(Lambda) Converges with Probability 1, Machine Learning., vol. 14, 1994, pp. 295-301.
- WATKINS, C. J. C. H., P. DAYAN, Q-learning, Machine Learning, vol. 8, no. 3-4, 1992, pp. 279-292.
- CZIBULA, G., M. BOCICOR, I. CZIBULA, A Reinforcement Learning Model for Solving the Folding Problem, International Journal of Computer Technology and Applications, vol. 2, 2011, pp. 171-182.
- GONZAGA, T., C. BENTES, R. FARIAS, M. C. CASTRO, A. C. GARCIA, Using Distributed-shared Memory Mechanisms for Agents Communication in a Distributed System, in Proc. of the 7th Intl. Conf. on Intelligent Systems Design and Applications. USA, IEEE Comp. Society, 2007, pp. 39-46.
- SMITH, T., M. WATERMAN, Identification of Common Molecular Subsequences, Journal of Molecular Biology, vol. 147, no. 1, 1981, pp. 195-197.