Por favor, use este identificador para citar o enlazar este ítem:
http://cimat.repositorioinstitucional.mx/jspui/handle/1008/1109
HIERARCHICAL ATTENTION AND TRANSFORMERS FOR AUTOMATIC MOVIE RATING | |
LUIS FERNANDO PARDO SIXTOS | |
Acceso Abierto | |
Atribución-NoComercial | |
computación | |
The Motion Picture Association of America (MPAA) issues, through CARA, a rating to motion pictures that intends to provide a guide for parents to decide if a movie is suitable for their children. Currently there are 5 possible categories: G, PG, PG-13, R, and NC-17, where G is for the general public and NC-17 is adult only. These ratings also work as an insight for the general public about the target audience of a movie, and for movie theaters to determine who is admitted in movie screenings. Hence, it is important for movie makers to know the rating of a movie as earlier in the production process as possible. However, the rating is usually assigned in post production, when changes in the movie can be very expensive. Predicting the rating from the movie script would allow these changes to be done even before the filming starts. Furthermore, advances in this direction would also favor cheaper large scale video classification from other sources, for example, social media and Youtube. The MPAA rating prediction can be stated as a classification problem. The research so far has been focused on directly applying deep learning models (e.g., LSTM) that are agnostic to different particularities of the problem. For example, very long text sequences (movie scripts), as well as content and style words with semantic and syntactic dependencies among them. This thesis proposed novel and effective strategies for MPAA rating prediction. For this, our first proposal adapts hierarchical networks, which are useful to model large sequences and exploit the natural structure of the documents: words, sentences, and scenes (i.e. chunks of sentences). Furthermore, we combine state-of-the-art transformers and RNN based attention models into our hierarchical framework, allowing us to exploit the benefits of transfer learning, and exploit longer dependencies among words by means of self-attention. The proposed approaches have multiple benefits. On the on hand, the RNN hierarchical models have a lower computational cost, proved discriminative power, and can be used to analyze the movie at a scene level. On the other hand, the transformer based models have a better performance but are more difficult to interpret. To address this problem we devise a simple but effective visualization technique to extract the most important words and sequences from the attention layers in the transformer; this is our third contribution. Results include empirical evidence on the usefulness of the proposed | |
11-12-2020 | |
Trabajo de grado, maestría | |
OTRAS | |
Versión aceptada | |
acceptedVersion - Versión aceptada | |
Aparece en las colecciones: | Tesis del CIMAT |
Cargar archivos:
Fichero | Descripción | Tamaño | Formato | |
---|---|---|---|---|
TE 818.pdf | 3.69 MB | Adobe PDF | Visualizar/Abrir |