Introduction and Motivation

In the future robots will work closely with humans in the home to aid in various domestic tasks. Foremost are tasks in the kitchen. For a robot to help out and learn new abilities in the kitchen domain it must be able to understand a human’s instructions. This work focuses in particular on how a robot may represent concrete verbs in the kitchen domain. Concrete action verbs are verbs that denote concrete activities performed by an agent in the world. These actions are visually perceivable events that can potentially be understood by computer vision algorithms. Furthermore, they can be categorized into two classes based on their semantics: Result Verbs and Manner Verbs. Manner verbs typically denote the Form or the Manner in which the action denoted by the verb is performed, whereas Result verbs (the focus of this work) denote the Change of State (CoS) that the object of the verb undergoes as a result of the action denoted by the verb. In order to ground these verbs to the environment, the robot must have a rich representation of the changes of state associated with the verb. Existing verb resources such as Verb Net  do not contain this rich information. In Verb Net, although its semantic representation for various verbs may indicate that a change of state is involved, it does not always provide the specics associated with the verb’s meaning. For example, the change will occur to some attribute of the verb’s direct object such as color, number of pieces, speed, etc.

Change of State in Verb Semantics

Lexical semantics is important to designing methods for robots to learn verbs because it indicates what must be learned as part of the verb representation. Verbs can be divided into two broad categories: stative verbs that denote states (such as know, depend, loathe) and action verbs which denote actions (such as run, throw, cook) In this work we are primarily interested in the latter. A concrete action verb is one that, in combination with its arguments and modiers, denotes an action in the world (as opposed to denoting a state or an abstract action not visible in the world). Hovav and Levin further divide the types of action verbs into Manner verbs, which \specify as part of their meaning a manner of carrying out an action”, and Result verbs, which \specify the coming about of a result state”. For example,

1) Manner verbs :  nibble, rub, scribble, sweep, utter, laugh, run,swim…
2) Result verbs  : clean, cover, empty, ll, freeze, kill, melt, open, arrive, die, enter,faint…

In this work we focus specically on result verbs, i.e. verbs of Change of State (CoS). A set of \canonical realization rules” specify how a particular change of state is incorporated into a verb’s semantics. Semantics are determined based on the combination of a \root”, which is particular to the verb (e.g., a result-state), and an \event schema” template as shown in Figure 3

Figure 3 : Event schema for verbs that denote externally caused state changes

Figure 3 : Event schema for verbs that denote externally caused state changes

Previous work has further classied result verbs into three categories: Change of State verbs, which denote a change of state to a property of the verb’s object (e.g. `to warm’), Inherently Directed Motion verbs, which denote movement along a path in relation to a land- mark object (e.g. `to arrive’), and Incremental Theme Verbs, which denote the incremental change of volume or area of the object (e.g. `to eat’) [8]. In our work we propose a specic set of result-states that may be used to dene the semantics of most concrete action verbs in the kitchen domain. Note that we use the term Change of State in a more general way throughout this paper such that the location and volume or area of an object are part of its state.

A Pilot Study and Ontology of
Change of State

Crowdsource Setup

In order to build an ontology of the types of changes of state that verbs in the cooking domain may denote, we conducted a pilot study. Verb-object pairs were presented to turkers via Amazon Mechanical Turk (AMT) and turkers were asked to describe the changes of state that occur to the object as a result of the verb. Then the turker’s open-ended descriptions were analyzed and categorized. Verbs and objects from the TACoS corpus were chosen for this crowd sourcing study. The TACoS corpus  is a collection of natural language descriptions of the actions that occur in a set of cooking videos. I.e., it contains 18227 sentences collected via AMT that describe various cooking events (preparing a cucumber, scrambling eggs, etc.). This is an ideal corpus to explore the types of changes of state and manners that verbs denote since it contains mainly descriptions of concrete actions. Moreover, possibly because most actions in the cooking domain are goal-directed, a majority of the verbs in the descriptions denote results of action (changes of state). The ten verbs (shown in Table 4) were chosen based on the criteria that they take an agent argument and that they occur relatively frequently in the corpus and with a variety of different direct objects. Furthermore, they must be concrete, meaning that they denote

Table 4: Verbs and objects used for data collection (pilot)

Table 4: Verbs and objects used for data collection (pilot)

some observable event in the world. Verbs of this type are the most relevant for a kitchen robot. Lastly, verb of the verbs were chosen because they only denote a change of state, and the other verb were chosen because they denote some manner of action (possibly in addition to change of state).