We propose a framework for spatio-temporal real-time analysis of dynamic scenes. It is designed to improve the grounding situation of autonomous agents in (simulated) physical domains. We introduce a knowledge processing pipeline ranging from relevance-driven compilation of a qualitative scene description to a knowledge-based detection of complex event and action sequences, conceived as a spatio-temporal pattern matching problem. A methodology for the formalization of motion patterns and their inner composition is introduced and applied to capture human expertise about domain-specific motion situations. We present extensive experimental results from the 3D soccer simulation that substantiate the online applicability of our approach under tournament conditions, based on 5 Hz a) precise and b) noisy/incomplete perception.