multi-modal

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

vqa awesome-list multi-modal multi-modal-learning attention-networks

Updated Jul 6, 2023

microsoft / farmvibes-ai

Star

FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability

weather sustainability ai agriculture geospatial remote-sensing multi-modal geospatial-analytics stac

Updated May 18, 2023
Jupyter Notebook

salesforce / UniControl

Star

Unified Controllable Visual Generation Model

generation multi-modal aigc

Updated Aug 3, 2023
Python

tangxyw / RecSysPapers

Star

推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.

Updated Aug 9, 2023
Python

dvlab-research / LISA

Star

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

segmentation multi-modal llm large-language-model

Updated Aug 9, 2023
Python

boschresearch / OASIS

Star

Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)

machine-learning computer-vision deep-learning pytorch gan image-generation multi-modal generative-adversarial-networks oasis image-to-image-translation bcai semantic-image-synthesis iclr2021 label-to-image-translation

Updated Nov 8, 2022
Python

IntelLabs / fastRAG

Star

Efficient Retrieval Augmentation and Generation Framework

nlp benchmark information-retrieval transformers knowledge-graph question-answering summarization multi-modal semantic-search diffusion sentence-transformers colbert

Updated Jul 27, 2023
Python

v-iashin / SpecVQGAN

Star

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

audio video pytorch transformer gan multi-modal evaluation-metrics video-understanding vas video-features vqvae bmvc melgan audio-generation vggsound

Updated Jun 6, 2023
Jupyter Notebook

EndlessSora / TSIT

Star

[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation

generative-adversarial-network gan style-transfer image-manipulation image-generation versatile multi-modal feature-transformation image-to-image-translation multi-scale two-stream-networks semantic-image-synthesis eccv2020

Updated Nov 28, 2021
Python

wangsuzhen / Audio2Head

Star

code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021

paper codes multi-modal talking-head talking-face ijcai2021

Updated Jul 23, 2022
Python

Improve this page

Add a description, image, and links to the multi-modal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-modal topic, visit your repo's landing page and select "manage topics."

Learn more

Jul	AUG	Sep
	09
2022	2023	2024