Transparent fault-tolerance in parallel orca programs
With the advent of large-scale parallel computing systems, making parallel programs fault-tolerant becomes an important problem, because the probability of a failure increases with the number of processors. In this paper, we describe a very simple scheme for rendering a class of parallel Orca progra...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Published: |
1992
|
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3356 http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf |
id |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.104.3356 |
---|---|
record_format |
openpolar |
spelling |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.104.3356 2023-05-15T17:53:04+02:00 Transparent fault-tolerance in parallel orca programs M. Frans Kaashoek Raymond Michiels Henri E. Bal Andrew S. Tanenbaum The Pennsylvania State University CiteSeerX Archives 1992 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3356 http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3356 http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf text 1992 ftciteseerx 2020-03-08T01:22:01Z With the advent of large-scale parallel computing systems, making parallel programs fault-tolerant becomes an important problem, because the probability of a failure increases with the number of processors. In this paper, we describe a very simple scheme for rendering a class of parallel Orca programs fault-tolerant. Also, we discuss our experience with implementing this scheme on Amoeba. Our approach works for parallel applications that are not interactive. The approach is based on making a globally consistent checkpoint from time to time and rolling back to the last checkpoint when a processor fails. Making a consistent global checkpoint is easy in Orca, because its implementation is based on reliable broadcast. The advantages of our approach are its simplicity, ease of implementation, low overhead, and transparency to the Orca programmer. 1. Text Orca Unknown |
institution |
Open Polar |
collection |
Unknown |
op_collection_id |
ftciteseerx |
language |
English |
description |
With the advent of large-scale parallel computing systems, making parallel programs fault-tolerant becomes an important problem, because the probability of a failure increases with the number of processors. In this paper, we describe a very simple scheme for rendering a class of parallel Orca programs fault-tolerant. Also, we discuss our experience with implementing this scheme on Amoeba. Our approach works for parallel applications that are not interactive. The approach is based on making a globally consistent checkpoint from time to time and rolling back to the last checkpoint when a processor fails. Making a consistent global checkpoint is easy in Orca, because its implementation is based on reliable broadcast. The advantages of our approach are its simplicity, ease of implementation, low overhead, and transparency to the Orca programmer. 1. |
author2 |
The Pennsylvania State University CiteSeerX Archives |
format |
Text |
author |
M. Frans Kaashoek Raymond Michiels Henri E. Bal Andrew S. Tanenbaum |
spellingShingle |
M. Frans Kaashoek Raymond Michiels Henri E. Bal Andrew S. Tanenbaum Transparent fault-tolerance in parallel orca programs |
author_facet |
M. Frans Kaashoek Raymond Michiels Henri E. Bal Andrew S. Tanenbaum |
author_sort |
M. Frans Kaashoek |
title |
Transparent fault-tolerance in parallel orca programs |
title_short |
Transparent fault-tolerance in parallel orca programs |
title_full |
Transparent fault-tolerance in parallel orca programs |
title_fullStr |
Transparent fault-tolerance in parallel orca programs |
title_full_unstemmed |
Transparent fault-tolerance in parallel orca programs |
title_sort |
transparent fault-tolerance in parallel orca programs |
publishDate |
1992 |
url |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3356 http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf |
genre |
Orca |
genre_facet |
Orca |
op_source |
http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf |
op_relation |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.3356 http://dare.ubvu.vu.nl/bitstream/1871/2577/1/10972.pdf |
op_rights |
Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
_version_ |
1766160803269967872 |