Por favor, use este identificador para citar o enlazar este ítem: http://cimat.repositorioinstitucional.mx/jspui/handle/1008/1166
TRAJECTORY FORECASTING FOR AUTONOMOUS VEHICLES
Juan Luis Baldelomar
Acceso Abierto
Atribución-NoComercial
Bitmaps
COMPUTACIÓN
MATEMÁTICAS INDUSTRIALES
The following work is focused on the area of Trajectory Forecasting for autonomous driving vehicles. This problem has been tackled from different perspectives and for different contexts. Here, we focus on the prediction of trajectories for driving agents. The problem has two main dimensions that are relevant to accurately model it and obtain good results: the temporal dimension and the spatial dimension. Sequence-to-sequence models have been massively used to model the temporal dimension of the problem as we can see with models based on LSTM networks. In more recent approaches, Transformers attention mechanisms have been explored, after the incredible results they achieved on the NLP field. To model the spatial relationships among agents, graph neural networks have been proposed in several works. However, mechanisms like the one from the Transformer have not been widely explored to also model the spatial dimension of the problem, and we carried out that task as the main contribution of our work. We propose a model based on the Transformer architecture to tackle both, the temporal and spatial dimensions of the problem. We use two Transformer Encoders at two dimensions of the problem, the spatial and the temporal dimensions. The model receives as input a scene with all the neighbors present in it at specific time steps and outputs the prediction of all the trajectories for each agent in the scene, which means we are doing the joint prediction of all the agents rather than predicting the trajectory for each agent by itself. This allows us to take into consideration spatial relations in the sequence. The model works in the following way. The first Transformer along with a handcrafted CNN modules are used to extract spatial features. In this case the input is constructed in a way that we expect the first encoder to process spatial relations between the agents present in a scene. Those spatial features are then used as input for what we call a Temporal Transformer, because it works at the temporal dimension of the problem. This is achieved by doing a transposition of the temporal and spatial dimension of the output of the first encoder. The decoder then receives the output of the second encoder as a traditional Transformer model. The model is trained in an auto-regressive manner as the AgentFormer model~\cite{agentFormer} because it showed significant improvements over the more classical Teacher Forcing approach. We worked with two datasets to train and test
22-02-2023
Trabajo de grado, maestría
OTRAS
Versión aceptada
acceptedVersion - Versión aceptada
Aparece en las colecciones: Tesis del CIMAT

Cargar archivos:


Fichero Descripción Tamaño Formato  
TE 867.pdf1.94 MBAdobe PDFVisualizar/Abrir