Multimodal understanding of long documents : from topic modeling to question answering