MMML - 1.1 introduction

Published: January 29, 2022 by Authur Lee

What is Multimodal?

Modality: The way in which something happens or is experienced.

Modality refers to a certain type of information and/or the representation format in which information is stored.
Sensory modality: one of the primary forms of sensation, as vision or touch; channel of communication.

Medium: A means or instrumentality for storing or communicating information; system of communication/transmission.

Medium is the means whereby this information is delivered to the senses of the interpreter.

Four eras of multimodal research:

1) Representation: Learning how to represent and summarize multimodal data in a way that exploits the complementarity and redundancy.

2) Alignment: Identify the direct relations between (sub)elements from two or more different modalities.

3) Translation: Process of changing data from one modality to another, where the translation relationship can often be open-ended or subjective.

4) Fusion: To join information from two or more modalities to perform a prediction task.

1) Co-Learning: Transfer knowledge between modalities, including their representations and predictive models.