The Wayback Machine - https://web.archive.org/web/20200716174624/https://github.com/thomwolf

thomwolf Follow

thomwolf Follow

Thomas Wolf thomwolf

Science Lead @ Huggingface Inc.

Follow

1.1k followers · 106 following · 992

Highlights

Arctic Code Vault Contributor

Organizations

Pinned

huggingface/transformers

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Python 30.8k 7.3k
huggingface/neuralcoref

✨Fast Coreference Resolution in spaCy with Neural Networks

Python 2k 350
huggingface/naacl_transfer_learning_tutorial

Repository of code for the tutorial on Transfer Learning in NLP held at NAACL 2019 in Minneapolis, MN, USA

Python 649 97
huggingface/transfer-learning-conv-ai

🦄 State-of-the-Art Conversational AI with Transfer Learning

Python 928 242
huggingface/pytorch-pretrained-BigGAN

🦋A PyTorch implementation of BigGAN with pretrained weights and conversion scripts.

Python 697 109
Magic-Sand

Magic-Sand is a software for operating an augmented reality sandbox

C++ 456 94

1,156 contributions in the last year

Contribution activity

July 2020

thomwolf/nlp Python Jul 13

Created a pull request in huggingface/transformers that received 2 comments

[pipelines] Update fill mask pipeline to remove special tokens in the output

Small fix to remove the special tokens from the output of the fill mask pipeline.

+4 −2 • 2 comments

Update dataset loading and features - Add TREC dataset Jul 13
Make the json script more flexible Jul 10
Starting to add some real doc Jul 8

Search qa Jul 16
Concatenate datasets Jul 15
Fix extracted files directory for the DownloadManager Jul 15
Add contiguous sharding Jul 15
Remove remaining nested dict Jul 15
Fix memory issue when doing select Jul 15
Add dataset post processing for faiss indexes Jul 11
Fix cached file path for metrics with different config names Jul 10
Allow indexing Dataset via np.ndarray Jul 9
🐛[BugFix]fix seqeval Jul 8
More faiss control Jul 8
add pandas dataset Jul 8
add from_pandas and from_dict Jul 8

[AutoModels] Fix config params handling of all PT and TF AutoModels Jul 15
Fix Trainer in DataParallel setting Jul 13
More explicit error when failing to tensorize overflowing tokens Jul 9
Fix #5507 Jul 6
Various tokenizers fixes Jul 6
GPT2 tokenizer should not output token type IDs Jul 6
The `add_space_before_punct_symbol` is only for TransfoXL Jul 6
Gradient checkpointing BERT & ALBERT poc Jul 3
Exposing prepare_for_model for both slow & fast tokenizers Jul 3
Change model outputs types to self-document outputs Jul 2

Created an issue in huggingface/nlp that received 3 comments

[Dataset requests] New datasets for Text Classification

We are missing a few datasets for Text Classification which is an important field. Namely, it would be really nice to add: TREC-6 dataset (see her…

3 comments

[Quick poll] Give your opinion on the future of 🤗 transformers Jul 1

You can’t perform that action at this time.