This VAEC_readme.txt file was generated on 2020-06-24 by Taylor Webb GENERAL INFORMATION 1. Title of Dataset: Visual Analogy Extrapolation Challenge (VAEC) 2. Author Information A. Principal Investigator Contact Information Name: Taylor Webb Institution: UCLA Psychology Department Email: taylor.w.webb@gmail.com SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: none 2. Links to publications that cite or use the data: 3. Links to other publicly accessible locations of the data: none 4. Links/relationships to ancillary data sets: none 5. Was data derived from another source? no 6. Recommended citation for this dataset: DATA & FILE OVERVIEW 1. File List: VAEC Translation Extrapolation Regime consists of the following files: analogy_train.hy analogy_test1.hy analogy_test2.hy analogy_test3.hy analogy_test4.hy analogy_test5.hy VAEC Scale Extrapolation Regime consists of the following files: analogy_scale_train.hy analogy_scale_test1.hy analogy_scale_test2.hy analogy_scale_test3.hy analogy_scale_test4.hy analogy_scale_test5.hy 2. Relationship between files, if important: For the Translation Extrapolation Regime, each file contains data for a separate 'region' (i.e. Regions 1-6 in the associated publication). For the Scale Extrapolation Regime, each file contains data for a separate 'scale' (i.e. Scales 1-6). 3. Additional related data collected that was not included in the current data package: none 4. Are there multiple versions of the dataset? no METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: The VAEC dataset was generated using code available at https://github.com/taylorwwebb/learning_representations_that_support_extrapolation 2. Methods for processing the data: none 3. Instrument- or software-specific information needed to interpret the data: To load the data, use the following software/packages: python h5py Additionally, it may be useful to have the following package: NumPy INSTRUCTIONS FOR LOADING DATA Use h5py.File to load a file, make sure to use read mode ('r') when loading data. Here is an example, loading data for Region 1 of the Translation Extrapolation Regime: trans_region1 = h5py.File('./analogy_scale_train.hy', 'r') Each file contains 19,040 separate analogy problems. To select a specific problem, use the index for that problem as a key: trans_region1_problem0 = trans_region1['0'] trans_region1_problem1 = trans_region1['1'] ... To see indices for all problems in a file: trans_region1.keys() Each analogy problem contains 6 variables. To see the names of these variables: trans_region1_problem0.keys() The variables are: 'ABCD' 'analogy_dim' 'dist' 'imgs' 'latent_class' 'not_D' To select these variables, use the variable name as a key: trans_region1_problem0_imgs = trans_region1_problem0['imgs'] It may also be useful to convert the data to NumPy array format: trans_region1_problem0_imgs = np.array(trans_region1_problem0['imgs']) Below is a complete description of all variables: 'imgs': This variable contains all images associated with an analogy problem. There are 7 images total. These 7 images constitute the 7 multiple-choice options for the analogy problem. Additionally, the analogy problem is made up of 4 of these images (A, B, C, and D). The first dimension of this image array indexes the separate images. To select a specific image (in this example, image 0): trans_region1_problem0_imgs[0,:,:,:] Each image is 128 X 128 X 3, and is encoded as a uint8 RGB image. 'ABCD': This variables contains the indices of images for the 4 terms of the analogy problem (A, B, C, D). To get the indices for these terms: A_ind = trans_region1_problem0['ABCD'][0] B_ind = trans_region1_problem0['ABCD'][1] C_ind = trans_region1_problem0['ABCD'][2] D_ind = trans_region1_problem0['ABCD'][3] These indices can then be used to select the corresponding image from 'imgs', as described above. 'not_D': This variables contains the indices of all images that are *not* item D. When presenting all 7 images as multiple-choices, any of these images would consitute an incorrect answer. There are 6 indices in this variable, and they can be accessed in the same way that indices are accessed for 'ABCD'. 'latent_class': This variable encodes the values along the underlying dimensions of the object space, for all images in 'imgs'. The first dimension of this variable encodes a specific image (out of 7), and the second dimension encodes each specific dimension in the object space (out of 4), in the following order: X location Y location size brightness To select the value for a particular image along a particular dimension in object space (in this example, the X location of the first image in 'imgs'): img0_X = trans_region1_problem0['latent_class'][0,0] These values will be integers ranging between 0 and 41. 'analogy_dim': This variable encodes the dimension (in terms of the underlying object space) along which the images in a given analogy problem vary. To use the values, refer to the following key: 0 = X location 1 = Y location 2 = size 3 = brightness For example, if trans_region1_problem0['analogy_dim'] is equal to 2, the objects for this problem vary only in terms of their size. 'dist': This variables encodes the distance, in terms of values in the underlying object space, between objects A and B (the value along the dimension 'analogy_dim' for object B - object A). This will be the same as the distance between objects C and D.