A system for determining an action performed within an input image includes a memory to store one or more instructions, and a processor communicatively coupled to the memory, and configured to execute the one or more instructions in the memory. The processor employs a convolutional neural network (CNN) that includes...