We introduce the multitask failure detection problem for VLA models, and propose SAFE, a failure detector that can detect failures for unseen tasks zero-shot and achieve state-of-the-art performance.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results