Computing That Serves

Colloquium: Human and Video: Big Data and Small Details


Thursday, April 10, 2014 - 11:00am


Jianbo Shi


Ryan Farrell

We study close interactions among people in game playing or on crowed urban streets.  Our goal is to capture all levels of actions, from body motions and human poses to intentions. 
For human motion, we model it with multiple perceptual quanta: from Recognizable Granularities (human body) to Segmentable Granularities (moving pixels/parts).  We developed a computational process called `graph steering' for resolving perplexity across the different granularity levels. Two instances are presented: 1) detection and tracking of pedestrian under persistent occlusion in urban street videos, and 2) motion and pose estimation of actors in movies.  
For human intention prediction, we explicitly model people trajectories/actions as goal-directed obstacle-avoiding paths with multiple hypotheses.  We create flexible and realistic hypotheses for plausible pedestrian/basketball player trajectories using concepts from Homotopy Class of Planning.  We show examples of tracking people in crowded city and basketball players in games.
This is a joint work with Katerina Fragkiadaki.


Jianbo Shi is an associate professor of Computer and Information Science at University of Pennsylvania. He studied Computer Science and Mathematics as an undergraduate at Cornell University where he received his B.A. in 1994. He received his Ph.D. degree in Computer Science from University of California at Berkeley in 1998. From 1999 to 2002, he was a research faculty at Robotics Institute at Carnegie Mellon University. In 2003 he joined University of Pennsylvania. He was awarded the Longuet-Higgins Prize for a contribution that has stood the test of time in 2007.

His primary research interests are in computer vision, including 1) Human recognition with the ultimate goal of developing computation algorithms to understand human behavior in video; 2) Image Segmentation and Object Recognition with the goal to extract “interesting” patterns from data, and guide the grouping process to achieve specific vision tasks, such as recognizing familiar object shapes.