MMML - 1.1 introduction
What is Multimodal?
Modality: The way in which something happens or is experienced.
- Modality refers to a certain type of information and/or the representation format in which information is stored.
- Sensory modality: one of the primary forms of sensation, as vision or touch; channel of communication.
Medium: A means or instrumentality for storing or communicating information; system of communication/transmission.
- Medium is the means whereby this information is delivered to the senses of the interpreter.
Prior Research on "Multimodal"
Four eras of multimodal research:
- The "behavior" era
- The "computational" era
- The "interaction" era
- The "deep learning" era
Core Technical Challenges
1) Representation: Learning how to represent and summarize multimodal data in a way that exploits the complementarity and redundancy.

2) Alignment: Identify the direct relations between (sub)elements from two or more different modalities.

3) Translation: Process of changing data from one modality to another, where the translation relationship can often be open-ended or subjective.

4) Fusion: To join information from two or more modalities to perform a prediction task.


1) Co-Learning: Transfer knowledge between modalities, including their representations and predictive models.
