Transparent fault-tolerance in parallel orca programs

With the advent of large-scale parallel computing systems, making parallel programs fault-tolerant becomes an important problem, because the probability of a failure increases with the number of processors. In this paper, we describe a very simple scheme for rendering a class of parallel Orca progra...

Full description

Bibliographic Details
Main Authors: M. Frans Kaashoek, Raymond Michiels, Henri E. Bal, Andrew S. Tanenbaum
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 1992
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.7539
http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.98.7539
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.98.7539 2023-05-15T17:53:04+02:00 Transparent fault-tolerance in parallel orca programs M. Frans Kaashoek Raymond Michiels Henri E. Bal Andrew S. Tanenbaum The Pennsylvania State University CiteSeerX Archives 1992 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.7539 http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.7539 http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf text 1992 ftciteseerx 2016-01-08T20:07:30Z With the advent of large-scale parallel computing systems, making parallel programs fault-tolerant becomes an important problem, because the probability of a failure increases with the number of processors. In this paper, we describe a very simple scheme for rendering a class of parallel Orca programs fault-tolerant. Also, we discuss our experience with implementing this scheme on Amoeba. Our approach works for parallel applications that are not interactive. The approach is based on making a globally consistent checkpoint from time to time and rolling back to the last checkpoint when a processor fails. Making a consistent global checkpoint is easy in Orca, because its implementation is based on reliable broadcast. The advantages of our approach are its simplicity, ease of implementation, low overhead, and transparency to the Orca programmer. 1. Text Orca Unknown
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
description With the advent of large-scale parallel computing systems, making parallel programs fault-tolerant becomes an important problem, because the probability of a failure increases with the number of processors. In this paper, we describe a very simple scheme for rendering a class of parallel Orca programs fault-tolerant. Also, we discuss our experience with implementing this scheme on Amoeba. Our approach works for parallel applications that are not interactive. The approach is based on making a globally consistent checkpoint from time to time and rolling back to the last checkpoint when a processor fails. Making a consistent global checkpoint is easy in Orca, because its implementation is based on reliable broadcast. The advantages of our approach are its simplicity, ease of implementation, low overhead, and transparency to the Orca programmer. 1.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author M. Frans Kaashoek
Raymond Michiels
Henri E. Bal
Andrew S. Tanenbaum
spellingShingle M. Frans Kaashoek
Raymond Michiels
Henri E. Bal
Andrew S. Tanenbaum
Transparent fault-tolerance in parallel orca programs
author_facet M. Frans Kaashoek
Raymond Michiels
Henri E. Bal
Andrew S. Tanenbaum
author_sort M. Frans Kaashoek
title Transparent fault-tolerance in parallel orca programs
title_short Transparent fault-tolerance in parallel orca programs
title_full Transparent fault-tolerance in parallel orca programs
title_fullStr Transparent fault-tolerance in parallel orca programs
title_full_unstemmed Transparent fault-tolerance in parallel orca programs
title_sort transparent fault-tolerance in parallel orca programs
publishDate 1992
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.7539
http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf
genre Orca
genre_facet Orca
op_source http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.98.7539
http://www.cs.vu.nl/~ast/publications/sedms-1992.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766160801650966528