Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
-
Updated
May 24, 2023 - Python
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Vector search for humans.
ModelScope: bring the notion of Model-as-a-Service to life.
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Open Source Routing Engine for OpenStreetMap
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
An Open Toolkit for Knowledge Graph Extraction and Construction published at EMNLP2022 System Demonstrations.
Recent Transformer-based CV and related works.
[pip install medmnist] 18 MNIST-like Datasets for 2D and 3D Biomedical Image Classification
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
FarmVibes.AI: Multi-Modal GeoSpatial ML Models for Agriculture and Sustainability
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Official implementation of the paper "You Only Need Adversarial Supervision for Semantic Image Synthesis" (ICLR 2021)
Efficient Retrieval Augmentation and Generation Framework
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
[ECCV 2020 Spotlight] A Simple and Versatile Framework for Image-to-Image Translation
code for paper "Audio2Head: Audio-driven One-shot Talking-head Generation with Natural Head Motion" in the conference of IJCAI 2021
Add a description, image, and links to the multi-modal topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal topic, visit your repo's landing page and select "manage topics."