multi-modal-learning

I've been chatting with some others interested in training CLIP for different domain tasks. They expressed interest in a simple way to use a pre-trained text transformer.

Some basic support for Hugging Face or generic classes of transformers shouldn't be too crazy of an extension to what is already fleshed out.

Jun	JUL	Aug
	14
2021	2022	2023

multi-modal-learning

Here are 31 public repositories matching this topic...

mlfoundations / open_clip

Generalizable Text Transformer Usage

error when running training/main.py

jokieleung / awesome-visual-question-answering

lucidrains / x-clip

OFA-Sys / Chinese-CLIP

moabarar / nemar

josedolz / HyperDenseNet_pytorch

rentainhe / TRAR-VQA

liyichen-cly / MMEA

ivclab / NeuralMerger

likyoo / Multimodal-Remote-Sensing-Toolkit

depshad / Deep-Learning-Framework-for-Multi-modal-Product-Classification

peymanbateni / multimodal-emotion-analysis-in-conversations

gaurav104 / WSS-CMER

Boreas-pxl / M2HSE

v-iashin / CORSMAL

raphaelmemmesheimer / gimme_signals_action_recognition

sayakpaul / Multimodal-Entailment-Baseline

liveseongho / DramaQA

jackyjsy / SAM-SLR-v2

iCVTEAM / M3TR

zhjohnchan / awesome-vision-and-language-pretraining

murali1996 / nlp-notes

itsShnik / allForOne

talipucar / DomainTranslation

kjanjua26 / Do_Cross_Modal_Systems_Leverage_Semantic_Relationships

Karami-m / Deep-Probabilistic-Multi-View

fpsluozi / tofindwaldo

projectaligned / projectaligned.github.io

rinnakk / japanese-clip

Cbhihe / Lexical_embedding

Improve this page

Add this topic to your repo