Crowdsource Setup

In order to build an ontology of the types of changes of state that verbs in the cooking domain may denote, we conducted a pilot study. Verb-object pairs were presented to turkers via Amazon Mechanical Turk (AMT) and turkers were asked to describe the changes of state that occur to the object as a result of the verb. Then the turker’s open-ended descriptions were analyzed and categorized.
Verbs and objects from the TACoS corpus were chosen for this crowdsourcing study. The TACoS corpus is a collection of natural language descriptions of the actions that occur in a set of cooking videos. I.e., it contains 18227 sentences collected via AMT that describe various cooking events (preparing a cucumber, scrambling eggs, etc.). This is an ideal corpus to explore the types of changes of state and manners that verbs denote since it contains mainly descriptions of concrete actions. Moreover, possibly because most actions in the cooking domain are goal-directed, a majority of the verbs in the descriptions denote results of action (changes of state).
The ten verbs (shown in Table 1) were chosen based on the criteria that they take an agent argument and that they occur relatively frequently in the corpus and with a variety of dierent direct objects. Furthermore, they must be concrete, meaning that they denote

Table 1: Verbs and objects used for data collection (pilot)

Table 1: Verbs and objects used for data collection (pilot)

some observable event in the world. Verbs of this type are the most relevant for a kitchen robot. Lastly, ve of the verbs were chosen because they only denote a change of state, and the other ve were chosen because they denote some manner of action (possibly in addition to change of state).
To examine how CoS depends on the context, we paired verbs with different objects (3 objects per verb, shown in Table 1) and presented the verbs to turkers with and without a video of the action described by the verb (+/-scene). Objects were chosen based on the criteria that they are dissimilar to each other, since we hypothesize that the change of state indicated by the verb will differ depending on the object’s features. For example, broccoli and bowl were chosen as objects for the verb shake because one is a vegetable and the other a kitchen utensil, having very different features. Thus, there are 10*3*2 = 60 conditions.
For each condition we collected 30 turker responses. In addition to turkers responses about what changes of state the verbs indicated, we also collected responses about the manner of the action.

Ontology Design and Annotation

worth knowing that some descriptions actually describe multiple changes of state as shown in Figure 1.
Given that both adjectives and CoS verbs have their semantics dened in terms of a scale structure (for gradable verbs and adjectives), some of the above attributes are moti-vated by the semantic types of adjectives from Dixon and Aikhenvald’s categorization.
These adjective categories include Dimension, Color, Physical Property, Quantication, and Position.
Although the instructions given to the turkers specically requested a description of the change of state undergone by the object, several responses contained descriptions of changes of other objects. Often, a part of the direct object was described, rather than the whole object. And, sometimes some completely dierent object, that was still associated in the action, was described. Thus, CoS descriptions can be categorized as describing a change to the DirectObject, PartOfObject, or AssociatedObject. Some examples are shown below.


Cut-cucumber: \The size of the cucumber changes”


Wipe-knife: \The knife gets cleaner. More metal is showing”


Clean-dishes: \Debris and residue fall away from the dishes”

Figure 1: Example of CoS frame applied to a description of change of state

Figure 1: Example of CoS frame applied to a description of change of state

In addition to attribute of change and the object undergoing change, the turkers descriptions often contained a third important aspect of a change of state: the result value. I.e. the result value of the attribute after it changes. These values can be categorized in several different ways depending on the attribute, but generally there are two polar values. For example, the SizeLengthVolumeThickness, Wetness, NumberOfPieces, etc. attribute may Increase or Ddecrease in value. On the other hand, not all result values can be categorized in this way. For example, the Shape attribute is usually described simply as having changed in some vague way, or to have undergone a specic change. Thus, Specic and Change are two more general result values.
These three aspects of a change of state, the Attribute, Object, and result Value, make up the CoS frame which can be used to label a verb-object pair, as shown in Figure 1 for a sentence which describes three changes of state. Thus, the CoS ontology presented here consists of a CoS frame and the options used to ll the frame slots.