Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 50 million developers.
Sign upHighlights
- Arctic Code Vault Contributor
Pinned
1,156 contributions in the last year
Contribution activity
July 2020
- thomwolf/nlp Python
Created a pull request in huggingface/transformers that received 2 comments
[pipelines] Update fill mask pipeline to remove special tokens in the output
Small fix to remove the special tokens from the output of the fill mask pipeline.
+4
−2
•
2
comments
- Search qa
- Concatenate datasets
- Fix extracted files directory for the DownloadManager
- Add contiguous sharding
- Remove remaining nested dict
- Fix memory issue when doing select
- Add dataset post processing for faiss indexes
- Fix cached file path for metrics with different config names
- Allow indexing Dataset via np.ndarray
- 🐛[BugFix]fix seqeval
- More faiss control
- add pandas dataset
- add from_pandas and from_dict
- [AutoModels] Fix config params handling of all PT and TF AutoModels
- Fix Trainer in DataParallel setting
- More explicit error when failing to tensorize overflowing tokens
- Fix #5507
- Various tokenizers fixes
- GPT2 tokenizer should not output token type IDs
- The `add_space_before_punct_symbol` is only for TransfoXL
- Gradient checkpointing BERT & ALBERT poc
- Exposing prepare_for_model for both slow & fast tokenizers
- Change model outputs types to self-document outputs
Created an issue in huggingface/nlp that received 3 comments
[Dataset requests] New datasets for Text Classification
We are missing a few datasets for Text Classification which is an important field. Namely, it would be really nice to add: TREC-6 dataset (see her…
3
comments
- Conversion through to_pandas output numpy arrays for lists instead of python objects
- [dataset] Structure of MLQA seems unecessary nested
- to_pandas conversion doesn't always work and output numpy arrays instead of lists
- Features should be updated when `map()` changes schema
- [Dataset requests] New datasets for Open Question Answering

