MMML - 1.1 introduction

Published: by Creative Commons Licence

What is Multimodal?

Modality: The way in which something happens or is experienced.

  • Modality refers to a certain type of information and/or the representation format in which information is stored.
  • Sensory modality: one of the primary forms of sensation, as vision or touch; channel of communication.

Medium: A means or instrumentality for storing or communicating information; system of communication/transmission.

  • Medium is the means whereby this information is delivered to the senses of the interpreter.

Prior Research on "Multimodal"

Four eras of multimodal research:

  1. The "behavior" era
  2. The "computational" era
  3. The "interaction" era
  4. The "deep learning" era

Core Technical Challenges

1) Representation: Learning how to represent and summarize multimodal data in a way that exploits the complementarity and redundancy.

2) Alignment: Identify the direct relations between (sub)elements from two or more different modalities.

3) Translation: Process of changing data from one modality to another, where the translation relationship can often be open-ended or subjective.

4) Fusion: To join information from two or more modalities to perform a prediction task.

1) Co-Learning: Transfer knowledge between modalities, including their representations and predictive models.