arXiv:1508.03000v1 [cs.RO] 12 Aug 2015 Few common failure cases in mobile robots Ramviyas Parasuraman 1, CVAP, Royal Institute of Technology (KTH), Stockholm, Sweden. Abstract A mobile robot deployed for remote inspection, surveying or rescue missions can fail due to various possibilities and can be hardware or software related. These failure scenarios necessitate manual recovery (self-rescue) of the robot from the environment. It would bring unforeseen challenges to recover the mobile robot if the environment where it was deployed had hazardous or harmful conditions (e.g. ionizing radiations). While it is not fully possible to predict all the failures in the robot, failures can be reduced by employing certain design/usage considerations. Few example failure cases based on real experiences are presented in this short article along with generic suggestions on overcoming the illustrated failure situations. 1 Introduction Mobile robots are increasingly considered in applica- tions in hostile or hazardous environments where humans cannot perform some tasks due to safety issues such as high radiation levels or because of challenges in the en- vironments [1,2]. Furthermore, there exist a special cat- egory called rescue robots, which are meant to help the rescuers during disaster situation and conditions where human may not be fully able to perform the tasks due to harmful or hazardous conditions. Robots failing in such environments brings unfortunate situations such as the robot getting abandoned in the site [1]. In this article, some possible failure scenarios based on the experiences learned with the robots at CERN as well as from other researchers meet-up during the IEEE SSRR summer school 2012 are presented. The experi- ences are based on (at least) the following robots: TIM [3] - a robotic train used for remote inspection and radi- ation survey for the Large Hadron Collider (LHC), Tele- max - an Explosive Ordinance Disposal (EOD) robot used for remote inspection at CERN, and a KUKA youBot mobile platform used for research in robotic and manipulation applications. Limited suggestions for design considerations aiming to reduce the chances of failures are also discussed from a simplistic perspective. For a detailed analysis on failure scenarios (how and when a mobile robot could fail), re- fer [4]. Additionally, observing rescue robot design stan- dards (in progress) from NIST and ASTM [5] could also be beneficial in avoiding the failure scenarios. 2 Hardware and software failures It should be considered that mechanical or electronic component failures may occur at any time during the us- age of a mobile robot. We explain in this section several cases of failure or unfortunate scenarios and explain a possible pointers to avoid these problems. 1. Chronic: When a robotic device is left switched on and was not used for a long time, they might have a tendency to become mobile due to friction loss or in- ertia effects. (E.g. A forklift could lose its pressure on the fork carrying a heavy load for quite a while). Therefore it should be made sure that the robot is im- mobile (by using a normally closed mechanical brake or the motion power) when the robot is static. 2. Calibration: In the TIM robot [3], we observed that the robot consumed twice as much power when mov- ing forward relative to backward movement. This was later discovered to be caused by a motor failure. The lesson here is that, before using a robot, it should be calibrated or verified for proper functioning of move- ments in all directions. This also applies to other im- portant functional elements for e.g. sensing. 3. Thermal: There are high chances of minor failures in robots due to thermal influences. For instance, once when we were about to give a specific demonstration with the youBot to a large audience, its onboard com- puter got shutdown due to overheating in a warm con- ference room which was unexpected. This bad expe- rience could have been avoided if we had designed for proper heat dissipation/ventilation in the robot. 4. Over-current: Even though short circuit problems are usually negligible in most robots as they em- ploy professional electronics components, over cur- rent and circuit related problem may occur anytime. For instance, a USB powered WiFi card consumed too much of power from the USB port (exceeded its limit) resulting in damage to the the computer motherboard and eventual failure of the robot operation. Thus en- suring proper power availability (by using current lim- iter or a relay) to every devices and sensors connected to the robot could help avoiding such issues. 5. Memory: In one of the experiences, a sudden crash in the robot’s computer hard disk resulted in instant robot failure and prevented its mission of autonomous localization and mapping of the environment. This brings up the need to have (and execute) the robot pro- grams in redundant efficient manner, use solid state drives and also include a recovery routine in the RAM memory to switch a defective hard disk containing the main routine or atleast to shut-down the robot safely. 6. Frequency: Over-clocking of a robots computer could result in serious damages to the robot and its per- formance as well. It could also trigger a spark or 1The author was affiliated with European Organization for Nuclear Research (CERN), Switzerland during this work. Contact: ramviyas@kth.se fire (there was a reported case that a high frequency switch got fired but then the system was immediately shut-out to avoid further damage) and therefore the robot systems and functions should carefully be de- signed not to be overcooked or over-clocked. 7. Faults: If the robot is used in high energy particle physics or ionizing radiation facilities, the probabil- ity that the electronics getting damaged due to single event upset or cumulative effects of radiation are high (there were several reported incidents for such cases and for instance there exist a special group at CERN to study the effects of radiation to electronics). Hence, for such applications, we emphasize the importance of having redundant fault-tolerant hardware and soft- ware modules for important functionality as well as having fail-safe algorithms on-board the robot 8. Data verification and validation: Autonomous robots fully depend on the data obtained from its sensors or sent to its actuators. Data related issues might occur if algorithms are not devised properly. In one instance, the laser range scanner provided false values in im- proper (black painted/water surface) surfaces as ex- pected but was not accounted in the algorithm. This data was used for autonomous path planning and re- sulted in undesired actions/behaviors. Therefore, sen- sor data should always be validated for proper values even if one use a safety sensing device. In another incident, a faulty Joystick provided error (or fault) value to the algorithm which coincided with the reverse motor actuation value, and thus resulted in robot behaving strangely by moving in the opposite of the intended direction when the joystick abruptly failed (may be due to its low battery level). A les- son here is that the range of sensor error values and the actuators input values should be strictly verified for non-coincidences, properly tuned and essentially utilized. 3 Energy and Communication Managing a mobile robot’s energy and communica- tion (wired or wireless) system form the key priority in predicting and avoiding most of the failure situations. We have observed several energy and communication fail- ure situations of the TIM robot either when the robot ran out of energy or when it was not possible to communi- cate to the ground staff operating the robot. Thus, en- suring the energy and communication requirement for a robotic mission can be fulfilled before starting the mis- sion is highly important. A basic energy management system consists of an energy autonomy prediction system (estimating the time and distance autonomy) will be helpful in making sure that the robot meets energy expectations. This can be done by simulating the energy consumption behavior be- fore a mission or online prediction of energy autonomy during a mission (e.g. [6]). Similarly, a (wireless) communication autonomy pre- diction system could be employed to estimate achievable distance (range), message latency, data transfer rate, etc in a given environment. This can be included in the mis- sion planning stage or in offline mode (not during the mission) as exemplified in [7] where offline prediction of wireless capabilities was examined. For example, we faced several challenges in wireless communication with robots (such as low operational ranges, frequent discon- nections, abrupt quality changes, etc.) at CERN under- ground tunnels. This is because the nature of the envi- ronment itself is challenging for radio signal propaga- tion due to the effects of reflections from large metallic objects or deep multipath fading. Therefore, for a suc- cessful robotic operation, monitoring of the connectivity (connection quality) and deciding what action to take in the event of loss of communication is certainly required if cases such as loss of robots in high radiation zones (e.g. abandoned Quince robot in Fukushima nuclear re- actor building [1]) should not be repeated in the future. Finally, a vital consideration for robot practition- ers is to develop/implement a functionality in robots to alert/inform the operator before the robot running out of energy or loosing wireless connectivity. This could also be extended to include situations such as when the re- maining energy in robot is not enough to get the robot back to its home station or when the predicted communi- cation capability/reach in the robot’s path is not adequate. References [1] K. Nagatani, S. Kiribayashi, Y. Okada, S. Tadokoro, T. Nishimura, T. Yoshida, E. Koyanagi, and Y. Hada, “Redesign of rescue mobile robot Quince,” in IEEE International Symposium on Safety Security and Rescue Robotics, 2011, pp. 13–18. [2] R. Murphy, J. Kravitz, S. Stover, and R. Shoureshi, “Mobile robots in mine rescue and recovery,” Robotics Automation Magazine, IEEE, vol. 16, no. 2, pp. 91–103, June 2009. [3] K. Kershaw, F. Chapron, A. Coin, F. Delsaux, T. Fe- niet, J.-L. Grenard, and R. Valbuena, “Remote in- spection, measurement and handling for LHC,” in Particle Accelerator Conference, 2007. PAC. IEEE, June 2007, pp. 332 –334. [4] J. Carlson and R. Murphy, “How UGVs physically fail in the field,” IEEE Transactions on Robotics, vol. 21, no. 3, pp. 423–437, 2005. [5] R. R. Murphy, “Proposals for New UGV, UMV, UAV, and HRI Standards for Rescue Robots,” in Pro- ceedings of the 10th Performance Metrics for Intelli- gent Systems Workshop. ACM, 2010, pp. 9–13. [6] R. Parasuraman, K. Kershaw, P. Pagala, and M. Ferre, “Model based on-line energy predic- tion system for semi-autonomous mobile robots,” in 5th International Conference on Intelligent Systems, Modelling and Simulation (ISMS), 2014. [7] R. Parasuraman, K. Kershaw, M. Ferre Perez, et al., “Experimental investigation of radio signal propa- gation in scientific facilities for telerobotic applica- tions,” International Journal of Advanced Robotic Systems, vol. 10, pp. 1–11, 2013. arXiv:1508.03000v1 [cs.RO] 12 Aug 2015 Few common failure cases in mobile robots Ramviyas Parasuraman 1, CVAP, Royal Institute of Technology (KTH), Stockholm, Sweden. Abstract A mobile robot deployed for remote inspection, surveying or rescue missions can fail due to various possibilities and can be hardware or software related. These failure scenarios necessitate manual recovery (self-rescue) of the robot from the environment. It would bring unforeseen challenges to recover the mobile robot if the environment where it was deployed had hazardous or harmful conditions (e.g. ionizing radiations). While it is not fully possible to predict all the failures in the robot, failures can be reduced by employing certain design/usage considerations. Few example failure cases based on real experiences are presented in this short article along with generic suggestions on overcoming the illustrated failure situations. 1 Introduction Mobile robots are increasingly considered in applica- tions in hostile or hazardous environments where humans cannot perform some tasks due to safety issues such as high radiation levels or because of challenges in the en- vironments [1,2]. Furthermore, there exist a special cat- egory called rescue robots, which are meant to help the rescuers during disaster situation and conditions where human may not be fully able to perform the tasks due to harmful or hazardous conditions. Robots failing in such environments brings unfortunate situations such as the robot getting abandoned in the site [1]. In this article, some possible failure scenarios based on the experiences learned with the robots at CERN as well as from other researchers meet-up during the IEEE SSRR summer school 2012 are presented. The experi- ences are based on (at least) the following robots: TIM [3] - a robotic train used for remote inspection and radi- ation survey for the Large Hadron Collider (LHC), Tele- max - an Explosive Ordinance Disposal (EOD) robot used for remote inspection at CERN, and a KUKA youBot mobile platform used for research in robotic and manipulation applications. Limited suggestions for design considerations aiming to reduce the chances of failures are also discussed from a simplistic perspective. For a detailed analysis on failure scenarios (how and when a mobile robot could fail), re- fer [4]. Additionally, observing rescue robot design stan- dards (in progress) from NIST and ASTM [5] could also be beneficial in avoiding the failure scenarios. 2 Hardware and software failures It should be considered that mechanical or electronic component failures may occur at any time during the us- age of a mobile robot. We explain in this section several cases of failure or unfortunate scenarios and explain a possible pointers to avoid these problems. 1. Chronic: When a robotic device is left switched on and was not used for a long time, they might have a tendency to become mobile due to friction loss or in- ertia effects. (E.g. A forklift could lose its pressure on the fork carrying a heavy load for quite a while). Therefore it should be made sure that the robot is im- mobile (by using a normally closed mechanical brake or the motion power) when the robot is static. 2. Calibration: In the TIM robot [3], we observed that the robot consumed twice as much power when mov- ing forward relative to backward movement. This was later discovered to be caused by a motor failure. The lesson here is that, before using a robot, it should be calibrated or verified for proper functioning of move- ments in all directions. This also applies to other im- portant functional elements for e.g. sensing. 3. Thermal: There are high chances of minor failures in robots due to thermal influences. For instance, once when we were about to give a specific demonstration with the youBot to a large audience, its onboard com- puter got shutdown due to overheating in a warm con- ference room which was unexpected. This bad expe- rience could have been avoided if we had designed for proper heat dissipation/ventilation in the robot. 4. Over-current: Even though short circuit problems are usually negligible in most robots as they em- ploy professional electronics components, over cur- rent and circuit related problem may occur anytime. For instance, a USB powered WiFi card consumed too much of power from the USB port (exceeded its limit) resulting in damage to the the computer motherboard and eventual failure of the robot operation. Thus en- suring proper power availability (by using current lim- iter or a relay) to every devices and sensors connected to the robot could help avoiding such issues. 5. Memory: In one of the experiences, a sudden crash in the robot’s computer hard disk resulted in instant robot failure and prevented its mission of autonomous localization and mapping of the environment. This brings up the need to have (and execute) the robot pro- grams in redundant efficient manner, use solid state drives and also include a recovery routine in the RAM memory to switch a defective hard disk containing the main routine or atleast to shut-down the robot safely. 6. Frequency: Over-clocking of a robots computer could result in serious damages to the robot and its per- formance as well. It could also trigger a spark or 1The author was affiliated with European Organization for Nuclear Research (CERN), Switzerland during this work. Contact: ramviyas@kth.se fire (there was a reported case that a high frequency switch got fired but then the system was immediately shut-out to avoid further damage) and therefore the robot systems and functions should carefully be de- signed not to be overcooked or over-clocked. 7. Faults: If the robot is used in high energy particle physics or ionizing radiation facilities, the probabil- ity that the electronics getting damaged due to single event upset or cumulative effects of radiation are high (there were several reported incidents for such cases and for instance there exist a special group at CERN to study the effects of radiation to electronics). Hence, for such applications, we emphasize the importance of having redundant fault-tolerant hardware and soft- ware modules for important functionality as well as having fail-safe algorithms on-board the robot 8. Data verification and validation: Autonomous robots fully depend on the data obtained from its sensors or sent to its actuators. Data related issues might occur if algorithms are not devised properly. In one instance, the laser range scanner provided false values in im- proper (black painted/water surface) surfaces as ex- pected but was not accounted in the algorithm. This data was used for autonomous path planning and re- sulted in undesired actions/behaviors. Therefore, sen- sor data should always be validated for proper values even if one use a safety sensing device. In another incident, a faulty Joystick provided error (or fault) value to the algorithm which coincided with the reverse motor actuation value, and thus resulted in robot behaving strangely by moving in the opposite of the intended direction when the joystick abruptly failed (may be due to its low battery level). A les- son here is that the range of sensor error values and the actuators input values should be strictly verified for non-coincidences, properly tuned and essentially utilized. 3 Energy and Communication Managing a mobile robot’s energy and communica- tion (wired or wireless) system form the key priority in predicting and avoiding most of the failure situations. We have observed several energy and communication fail- ure situations of the TIM robot either when the robot ran out of energy or when it was not possible to communi- cate to the ground staff operating the robot. Thus, en- suring the energy and communication requirement for a robotic mission can be fulfilled before starting the mis- sion is highly important. A basic energy management system consists of an energy autonomy prediction system (estimating the time and distance autonomy) will be helpful in making sure that the robot meets energy expectations. This can be done by simulating the energy consumption behavior be- fore a mission or online prediction of energy autonomy during a mission (e.g. [6]). Similarly, a (wireless) communication autonomy pre- diction system could be employed to estimate achievable distance (range), message latency, data transfer rate, etc in a given environment. This can be included in the mis- sion planning stage or in offline mode (not during the mission) as exemplified in [7] where offline prediction of wireless capabilities was examined. For example, we faced several challenges in wireless communication with robots (such as low operational ranges, frequent discon- nections, abrupt quality changes, etc.) at CERN under- ground tunnels. This is because the nature of the envi- ronment itself is challenging for radio signal propaga- tion due to the effects of reflections from large metallic objects or deep multipath fading. Therefore, for a suc- cessful robotic operation, monitoring of the connectivity (connection quality) and deciding what action to take in the event of loss of communication is certainly required if cases such as loss of robots in high radiation zones (e.g. abandoned Quince robot in Fukushima nuclear re- actor building [1]) should not be repeated in the future. Finally, a vital consideration for robot practition- ers is to develop/implement a functionality in robots to alert/inform the operator before the robot running out of energy or loosing wireless connectivity. This could also be extended to include situations such as when the re- maining energy in robot is not enough to get the robot back to its home station or when the predicted communi- cation capability/reach in the robot’s path is not adequate. References [1] K. Nagatani, S. Kiribayashi, Y. Okada, S. Tadokoro, T. Nishimura, T. Yoshida, E. Koyanagi, and Y. Hada, “Redesign of rescue mobile robot Quince,” in IEEE International Symposium on Safety Security and Rescue Robotics, 2011, pp. 13–18. [2] R. Murphy, J. Kravitz, S. Stover, and R. Shoureshi, “Mobile robots in mine rescue and recovery,” Robotics Automation Magazine, IEEE, vol. 16, no. 2, pp. 91–103, June 2009. [3] K. Kershaw, F. Chapron, A. Coin, F. Delsaux, T. Fe- niet, J.-L. Grenard, and R. Valbuena, “Remote in- spection, measurement and handling for LHC,” in Particle Accelerator Conference, 2007. PAC. IEEE, June 2007, pp. 332 –334. [4] J. Carlson and R. Murphy, “How UGVs physically fail in the field,” IEEE Transactions on Robotics, vol. 21, no. 3, pp. 423–437, 2005. [5] R. R. Murphy, “Proposals for New UGV, UMV, UAV, and HRI Standards for Rescue Robots,” in Pro- ceedings of the 10th Performance Metrics for Intelli- gent Systems Workshop. ACM, 2010, pp. 9–13. [6] R. Parasuraman, K. Kershaw, P. Pagala, and M. Ferre, “Model based on-line energy predic- tion system for semi-autonomous mobile robots,” in 5th International Conference on Intelligent Systems, Modelling and Simulation (ISMS), 2014. [7] R. Parasuraman, K. Kershaw, M. Ferre Perez, et al., “Experimental investigation of radio signal propa- gation in scientific facilities for telerobotic applica- tions,” International Journal of Advanced Robotic Systems, vol. 10, pp. 1–11, 2013.