🎠MELD — Multimodal Emotion & Sentiment Recognition
Upload a video to analyze emotion and sentiment using fused text, audio, and visual features.
1.99M params | 7 emotions + 3 sentiments | LayerNorm + MultiheadAttention fusion
For best results, upload a video with clear speech and visible face.
Try these examples
Architecture: Audio (MFCC) + Text (BERT) + Visual (frame features) → LayerNorm projections → MultiheadAttention(256, 4 heads) → Fusion → Dual classifiers
Built by Kareem Waly · GitHub · Google Scholar