The Wayback Machine - https://web.archive.org/web/20201202062734/https://github.com/huggingface/datasets/pull/606
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick fix :) #606

Merged
merged 8 commits into from Sep 10, 2020
Merged

Quick fix :) #606

merged 8 commits into from Sep 10, 2020

Conversation

@thomwolf
Copy link
Member

@thomwolf thomwolf commented Sep 10, 2020

nlp => datasets

thomwolf added 3 commits Sep 10, 2020
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on 4c79292 Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.027638 / 0.011353 (0.016286) 0.016214 / 0.011008 (0.005206) 0.055018 / 0.038508 (0.016510) 0.034608 / 0.023109 (0.011498) 0.223489 / 0.275898 (-0.052409) 0.260111 / 0.323480 (-0.063369) 0.010890 / 0.007986 (0.002904) 0.004676 / 0.004328 (0.000348) 0.009979 / 0.004250 (0.005729) 0.058479 / 0.037052 (0.021427) 0.261165 / 0.258489 (0.002676) 0.284013 / 0.293841 (-0.009828) 0.156151 / 0.128546 (0.027605) 0.134351 / 0.075646 (0.058704) 0.557449 / 0.419271 (0.138177) 0.653178 / 0.043533 (0.609645) 0.229797 / 0.255139 (-0.025342) 0.269457 / 0.283200 (-0.013743) 0.097134 / 0.141683 (-0.044549) 1.939439 / 1.452155 (0.487285) 2.160402 / 1.492716 (0.667685)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041702 / 0.037411 (0.004291) 0.020076 / 0.014526 (0.005550) 0.073045 / 0.176557 (-0.103511) 0.098905 / 0.737135 (-0.638230) 0.029397 / 0.296338 (-0.266942)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.195881 / 0.215209 (-0.019328) 1.983604 / 2.077655 (-0.094051) 1.236268 / 1.504120 (-0.267852) 1.181048 / 1.541195 (-0.360146) 1.241471 / 1.468490 (-0.227019) 5.791277 / 4.584777 (1.206500) 4.909395 / 3.745712 (1.163683) 8.137659 / 5.269862 (2.867797) 6.745335 / 4.565676 (2.179659) 0.710881 / 0.424275 (0.286605) 0.011655 / 0.007607 (0.004048) 0.261898 / 0.226044 (0.035853) 2.610257 / 2.268929 (0.341329) 1.833505 / 55.444624 (-53.611119) 1.603234 / 6.876477 (-5.273243) 1.672414 / 2.142072 (-0.469658) 6.181796 / 4.805227 (1.376568) 7.731389 / 6.500664 (1.230725) 12.153267 / 0.075469 (12.077798)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 93.975209 / 1.841788 (92.133422) 16.586325 / 8.074308 (8.512017) 14.601410 / 10.191392 (4.410018) 0.900286 / 0.680424 (0.219862) 0.296409 / 0.534201 (-0.237792) 0.808980 / 0.579283 (0.229697) 0.578803 / 0.434364 (0.144440) 0.806875 / 0.540337 (0.266537) 1.743010 / 1.386936 (0.356074)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018024 / 0.011353 (0.006671) 0.014867 / 0.011008 (0.003859) 0.057146 / 0.038508 (0.018638) 0.035096 / 0.023109 (0.011987) 0.422941 / 0.275898 (0.147043) 0.433483 / 0.323480 (0.110003) 0.010246 / 0.007986 (0.002261) 0.004861 / 0.004328 (0.000533) 0.007399 / 0.004250 (0.003148) 0.051161 / 0.037052 (0.014109) 0.403236 / 0.258489 (0.144747) 0.445144 / 0.293841 (0.151303) 0.142694 / 0.128546 (0.014148) 0.113621 / 0.075646 (0.037975) 0.534672 / 0.419271 (0.115401) 0.455303 / 0.043533 (0.411770) 0.394906 / 0.255139 (0.139767) 0.410977 / 0.283200 (0.127777) 0.105491 / 0.141683 (-0.036192) 2.007552 / 1.452155 (0.555397) 1.921263 / 1.492716 (0.428547)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.044594 / 0.037411 (0.007183) 0.021603 / 0.014526 (0.007077) 0.028749 / 0.176557 (-0.147807) 0.096436 / 0.737135 (-0.640699) 0.030064 / 0.296338 (-0.266275)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.275037 / 0.215209 (0.059828) 2.727625 / 2.077655 (0.649970) 2.044187 / 1.504120 (0.540067) 1.968737 / 1.541195 (0.427542) 2.051824 / 1.468490 (0.583334) 5.874491 / 4.584777 (1.289715) 5.450425 / 3.745712 (1.704713) 7.736012 / 5.269862 (2.466150) 6.728298 / 4.565676 (2.162622) 0.651142 / 0.424275 (0.226867) 0.011969 / 0.007607 (0.004362) 0.332892 / 0.226044 (0.106848) 3.393248 / 2.268929 (1.124320) 23.386321 / 55.444624 (-32.058304) 4.232797 / 6.876477 (-2.643680) 2.332743 / 2.142072 (0.190671) 6.401074 / 4.805227 (1.595847) 2.889343 / 6.500664 (-3.611321) 0.037776 / 0.075469 (-0.037693)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 91.404556 / 1.841788 (89.562768) 16.553257 / 8.074308 (8.478949) 13.962146 / 10.191392 (3.770754) 0.927679 / 0.680424 (0.247256) 0.602120 / 0.534201 (0.067919) 0.755745 / 0.579283 (0.176462) 0.553564 / 0.434364 (0.119200) 0.734414 / 0.540337 (0.194076) 1.686214 / 1.386936 (0.299278)
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on 62c5111 Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019148 / 0.011353 (0.007795) 0.016170 / 0.011008 (0.005162) 0.043038 / 0.038508 (0.004530) 0.029652 / 0.023109 (0.006542) 0.174417 / 0.275898 (-0.101481) 0.189991 / 0.323480 (-0.133489) 0.008643 / 0.007986 (0.000657) 0.005001 / 0.004328 (0.000672) 0.006393 / 0.004250 (0.002142) 0.043187 / 0.037052 (0.006135) 0.186340 / 0.258489 (-0.072149) 0.205890 / 0.293841 (-0.087951) 0.161656 / 0.128546 (0.033110) 0.114143 / 0.075646 (0.038496) 0.389636 / 0.419271 (-0.029635) 0.492015 / 0.043533 (0.448482) 0.175067 / 0.255139 (-0.080072) 0.192906 / 0.283200 (-0.090293) 0.080511 / 0.141683 (-0.061172) 1.562899 / 1.452155 (0.110744) 1.831550 / 1.492716 (0.338833)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.038791 / 0.037411 (0.001379) 0.019983 / 0.014526 (0.005457) 0.057667 / 0.176557 (-0.118890) 0.088595 / 0.737135 (-0.648541) 0.026422 / 0.296338 (-0.269916)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.190250 / 0.215209 (-0.024959) 1.929380 / 2.077655 (-0.148274) 1.131788 / 1.504120 (-0.372332) 1.023444 / 1.541195 (-0.517750) 1.124336 / 1.468490 (-0.344154) 6.273425 / 4.584777 (1.688648) 5.615431 / 3.745712 (1.869719) 7.726736 / 5.269862 (2.456875) 6.847425 / 4.565676 (2.281748) 0.657323 / 0.424275 (0.233048) 0.010595 / 0.007607 (0.002988) 0.220684 / 0.226044 (-0.005361) 2.200855 / 2.268929 (-0.068074) 1.660650 / 55.444624 (-53.783975) 1.384939 / 6.876477 (-5.491537) 1.390074 / 2.142072 (-0.751999) 6.639403 / 4.805227 (1.834175) 4.594911 / 6.500664 (-1.905753) 6.956495 / 0.075469 (6.881026)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 113.916272 / 1.841788 (112.074484) 12.179292 / 8.074308 (4.104984) 13.628782 / 10.191392 (3.437390) 0.461103 / 0.680424 (-0.219321) 0.261316 / 0.534201 (-0.272885) 0.751887 / 0.579283 (0.172604) 0.591470 / 0.434364 (0.157106) 0.743311 / 0.540337 (0.202974) 1.522535 / 1.386936 (0.135599)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018014 / 0.011353 (0.006661) 0.013539 / 0.011008 (0.002531) 0.049787 / 0.038508 (0.011279) 0.033919 / 0.023109 (0.010810) 0.287153 / 0.275898 (0.011255) 0.329634 / 0.323480 (0.006155) 0.007911 / 0.007986 (-0.000075) 0.004574 / 0.004328 (0.000245) 0.006028 / 0.004250 (0.001777) 0.038866 / 0.037052 (0.001814) 0.304843 / 0.258489 (0.046354) 0.337546 / 0.293841 (0.043705) 0.142785 / 0.128546 (0.014238) 0.118884 / 0.075646 (0.043238) 0.404650 / 0.419271 (-0.014622) 0.403834 / 0.043533 (0.360302) 0.284522 / 0.255139 (0.029383) 0.316546 / 0.283200 (0.033346) 0.084729 / 0.141683 (-0.056954) 1.569337 / 1.452155 (0.117182) 1.599256 / 1.492716 (0.106540)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039430 / 0.037411 (0.002019) 0.021212 / 0.014526 (0.006686) 0.055679 / 0.176557 (-0.120877) 0.081856 / 0.737135 (-0.655280) 0.049044 / 0.296338 (-0.247294)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.260783 / 0.215209 (0.045573) 2.587601 / 2.077655 (0.509947) 1.846348 / 1.504120 (0.342228) 1.799228 / 1.541195 (0.258033) 1.733956 / 1.468490 (0.265466) 6.191410 / 4.584777 (1.606633) 5.566109 / 3.745712 (1.820397) 7.702677 / 5.269862 (2.432815) 6.599413 / 4.565676 (2.033737) 0.645236 / 0.424275 (0.220961) 0.010735 / 0.007607 (0.003128) 0.263732 / 0.226044 (0.037687) 2.818850 / 2.268929 (0.549921) 12.288883 / 55.444624 (-43.155741) 2.894013 / 6.876477 (-3.982464) 1.939767 / 2.142072 (-0.202305) 6.252359 / 4.805227 (1.447132) 1.823086 / 6.500664 (-4.677578) 0.026755 / 0.075469 (-0.048714)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 116.977507 / 1.841788 (115.135719) 13.232334 / 8.074308 (5.158026) 15.156511 / 10.191392 (4.965119) 0.752321 / 0.680424 (0.071897) 0.500571 / 0.534201 (-0.033630) 0.773368 / 0.579283 (0.194085) 0.594382 / 0.434364 (0.160018) 0.737019 / 0.540337 (0.196681) 1.450352 / 1.386936 (0.063416)
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on f91b29f Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016950 / 0.011353 (0.005597) 0.015546 / 0.011008 (0.004538) 0.054088 / 0.038508 (0.015580) 0.033699 / 0.023109 (0.010590) 0.201062 / 0.275898 (-0.074836) 0.250970 / 0.323480 (-0.072510) 0.009599 / 0.007986 (0.001613) 0.004554 / 0.004328 (0.000225) 0.009442 / 0.004250 (0.005192) 0.048017 / 0.037052 (0.010964) 0.202123 / 0.258489 (-0.056366) 0.224397 / 0.293841 (-0.069444) 0.149750 / 0.128546 (0.021204) 0.103039 / 0.075646 (0.027393) 0.493502 / 0.419271 (0.074230) 0.553087 / 0.043533 (0.509554) 0.193669 / 0.255139 (-0.061470) 0.217891 / 0.283200 (-0.065308) 0.087512 / 0.141683 (-0.054171) 1.748126 / 1.452155 (0.295972) 1.945770 / 1.492716 (0.453054)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.038808 / 0.037411 (0.001397) 0.019459 / 0.014526 (0.004934) 0.075463 / 0.176557 (-0.101093) 0.097779 / 0.737135 (-0.639356) 0.027898 / 0.296338 (-0.268440)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.195069 / 0.215209 (-0.020140) 1.970286 / 2.077655 (-0.107369) 1.134242 / 1.504120 (-0.369878) 1.133443 / 1.541195 (-0.407752) 1.165213 / 1.468490 (-0.303277) 5.781870 / 4.584777 (1.197093) 4.751155 / 3.745712 (1.005443) 7.346226 / 5.269862 (2.076364) 6.263107 / 4.565676 (1.697431) 0.607458 / 0.424275 (0.183183) 0.011152 / 0.007607 (0.003545) 0.212833 / 0.226044 (-0.013211) 2.230848 / 2.268929 (-0.038081) 1.637563 / 55.444624 (-53.807061) 1.568548 / 6.876477 (-5.307929) 1.670517 / 2.142072 (-0.471555) 6.204322 / 4.805227 (1.399095) 3.745892 / 6.500664 (-2.754772) 6.738933 / 0.075469 (6.663464)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 89.578056 / 1.841788 (87.736268) 14.106082 / 8.074308 (6.031774) 12.828463 / 10.191392 (2.637071) 0.426670 / 0.680424 (-0.253754) 0.268276 / 0.534201 (-0.265925) 0.755941 / 0.579283 (0.176658) 0.503389 / 0.434364 (0.069025) 0.700248 / 0.540337 (0.159910) 1.561131 / 1.386936 (0.174195)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.016467 / 0.011353 (0.005114) 0.015217 / 0.011008 (0.004208) 0.048231 / 0.038508 (0.009722) 0.033341 / 0.023109 (0.010231) 0.388402 / 0.275898 (0.112504) 0.385269 / 0.323480 (0.061790) 0.008497 / 0.007986 (0.000511) 0.004471 / 0.004328 (0.000143) 0.007792 / 0.004250 (0.003541) 0.045603 / 0.037052 (0.008550) 0.393505 / 0.258489 (0.135016) 0.404203 / 0.293841 (0.110362) 0.144659 / 0.128546 (0.016113) 0.109299 / 0.075646 (0.033653) 0.463379 / 0.419271 (0.044108) 0.598813 / 0.043533 (0.555280) 0.380583 / 0.255139 (0.125444) 0.403680 / 0.283200 (0.120480) 0.097553 / 0.141683 (-0.044130) 1.850292 / 1.452155 (0.398137) 1.852688 / 1.492716 (0.359971)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.043443 / 0.037411 (0.006031) 0.035494 / 0.014526 (0.020968) 0.025808 / 0.176557 (-0.150749) 0.092716 / 0.737135 (-0.644419) 0.026975 / 0.296338 (-0.269364)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.257937 / 0.215209 (0.042728) 2.676080 / 2.077655 (0.598425) 2.092468 / 1.504120 (0.588348) 2.029361 / 1.541195 (0.488166) 2.008844 / 1.468490 (0.540353) 5.495240 / 4.584777 (0.910463) 4.749601 / 3.745712 (1.003889) 7.511943 / 5.269862 (2.242082) 6.257181 / 4.565676 (1.691504) 0.622458 / 0.424275 (0.198183) 0.011663 / 0.007607 (0.004055) 0.291413 / 0.226044 (0.065368) 3.058831 / 2.268929 (0.789903) 20.772394 / 55.444624 (-34.672230) 4.201062 / 6.876477 (-2.675415) 2.342566 / 2.142072 (0.200494) 6.182417 / 4.805227 (1.377189) 2.777193 / 6.500664 (-3.723471) 0.037019 / 0.075469 (-0.038450)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 89.487385 / 1.841788 (87.645598) 14.735662 / 8.074308 (6.661354) 12.797439 / 10.191392 (2.606047) 1.210369 / 0.680424 (0.529945) 0.531790 / 0.534201 (-0.002411) 0.747200 / 0.579283 (0.167917) 0.526717 / 0.434364 (0.092353) 0.685780 / 0.540337 (0.145443) 1.438029 / 1.386936 (0.051093)
@stefan-it
Copy link
Contributor

@stefan-it stefan-it commented Sep 10, 2020

❤️

@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on d6fdebb Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017147 / 0.011353 (0.005794) 0.016301 / 0.011008 (0.005292) 0.053928 / 0.038508 (0.015420) 0.034899 / 0.023109 (0.011790) 0.238868 / 0.275898 (-0.037030) 0.250404 / 0.323480 (-0.073076) 0.018717 / 0.007986 (0.010731) 0.004662 / 0.004328 (0.000333) 0.009032 / 0.004250 (0.004782) 0.055823 / 0.037052 (0.018771) 0.235139 / 0.258489 (-0.023350) 0.254224 / 0.293841 (-0.039617) 0.154241 / 0.128546 (0.025694) 0.122672 / 0.075646 (0.047025) 0.512888 / 0.419271 (0.093616) 0.533524 / 0.043533 (0.489991) 0.241845 / 0.255139 (-0.013294) 0.257843 / 0.283200 (-0.025356) 0.102125 / 0.141683 (-0.039558) 1.924407 / 1.452155 (0.472253) 2.097629 / 1.492716 (0.604913)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.044088 / 0.037411 (0.006676) 0.023737 / 0.014526 (0.009211) 0.099448 / 0.176557 (-0.077108) 0.152431 / 0.737135 (-0.584704) 0.146864 / 0.296338 (-0.149474)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.193538 / 0.215209 (-0.021671) 1.939486 / 2.077655 (-0.138169) 1.225829 / 1.504120 (-0.278291) 1.162216 / 1.541195 (-0.378979) 1.221609 / 1.468490 (-0.246881) 5.792625 / 4.584777 (1.207848) 4.912239 / 3.745712 (1.166527) 7.844917 / 5.269862 (2.575056) 6.445696 / 4.565676 (1.880019) 0.649975 / 0.424275 (0.225700) 0.015611 / 0.007607 (0.008003) 0.252688 / 0.226044 (0.026643) 2.569997 / 2.268929 (0.301069) 1.811335 / 55.444624 (-53.633289) 1.726897 / 6.876477 (-5.149579) 1.789983 / 2.142072 (-0.352090) 6.540446 / 4.805227 (1.735219) 4.311511 / 6.500664 (-2.189153) 12.085898 / 0.075469 (12.010429)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 94.982961 / 1.841788 (93.141174) 16.317485 / 8.074308 (8.243177) 14.185236 / 10.191392 (3.993844) 0.920176 / 0.680424 (0.239753) 0.334963 / 0.534201 (-0.199238) 0.789213 / 0.579283 (0.209930) 0.564800 / 0.434364 (0.130436) 0.751816 / 0.540337 (0.211479) 1.688929 / 1.386936 (0.301993)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.017771 / 0.011353 (0.006418) 0.015591 / 0.011008 (0.004582) 0.062600 / 0.038508 (0.024092) 0.035261 / 0.023109 (0.012152) 0.370024 / 0.275898 (0.094126) 0.404539 / 0.323480 (0.081059) 0.011940 / 0.007986 (0.003954) 0.005262 / 0.004328 (0.000933) 0.008972 / 0.004250 (0.004722) 0.051098 / 0.037052 (0.014046) 0.383473 / 0.258489 (0.124984) 0.427732 / 0.293841 (0.133891) 0.144292 / 0.128546 (0.015746) 0.116459 / 0.075646 (0.040813) 0.548033 / 0.419271 (0.128761) 0.614114 / 0.043533 (0.570582) 0.373160 / 0.255139 (0.118021) 0.387909 / 0.283200 (0.104710) 0.117734 / 0.141683 (-0.023949) 1.969538 / 1.452155 (0.517384) 2.020225 / 1.492716 (0.527509)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.051002 / 0.037411 (0.013591) 0.024714 / 0.014526 (0.010189) 0.039624 / 0.176557 (-0.136933) 0.169620 / 0.737135 (-0.567515) 0.144368 / 0.296338 (-0.151970)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.262360 / 0.215209 (0.047151) 2.612547 / 2.077655 (0.534893) 2.032326 / 1.504120 (0.528206) 2.012698 / 1.541195 (0.471504) 2.087607 / 1.468490 (0.619117) 5.974431 / 4.584777 (1.389654) 5.332280 / 3.745712 (1.586567) 7.966859 / 5.269862 (2.696997) 6.909480 / 4.565676 (2.343803) 0.668583 / 0.424275 (0.244308) 0.011942 / 0.007607 (0.004335) 0.293542 / 0.226044 (0.067498) 3.128638 / 2.268929 (0.859710) 21.886702 / 55.444624 (-33.557922) 4.199898 / 6.876477 (-2.676579) 2.454890 / 2.142072 (0.312818) 6.526698 / 4.805227 (1.721471) 3.057838 / 6.500664 (-3.442826) 0.043595 / 0.075469 (-0.031874)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 91.421128 / 1.841788 (89.579341) 16.133575 / 8.074308 (8.059267) 13.473059 / 10.191392 (3.281667) 0.843217 / 0.680424 (0.162793) 0.581228 / 0.534201 (0.047027) 0.760811 / 0.579283 (0.181528) 0.530588 / 0.434364 (0.096224) 0.740119 / 0.540337 (0.199781) 1.650669 / 1.386936 (0.263733)
thomwolf added 2 commits Sep 10, 2020
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on cc0c0c4 Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.027163 / 0.011353 (0.015810) 0.014898 / 0.011008 (0.003890) 0.053381 / 0.038508 (0.014873) 0.034856 / 0.023109 (0.011747) 0.217819 / 0.275898 (-0.058079) 0.258068 / 0.323480 (-0.065412) 0.010237 / 0.007986 (0.002252) 0.004559 / 0.004328 (0.000231) 0.010222 / 0.004250 (0.005971) 0.051848 / 0.037052 (0.014796) 0.217429 / 0.258489 (-0.041060) 0.247196 / 0.293841 (-0.046645) 0.141268 / 0.128546 (0.012722) 0.108874 / 0.075646 (0.033228) 0.528326 / 0.419271 (0.109054) 0.522601 / 0.043533 (0.479068) 0.215926 / 0.255139 (-0.039213) 0.235013 / 0.283200 (-0.048186) 0.088748 / 0.141683 (-0.052935) 1.886227 / 1.452155 (0.434072) 1.946110 / 1.492716 (0.453393)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.041341 / 0.037411 (0.003929) 0.021447 / 0.014526 (0.006921) 0.060088 / 0.176557 (-0.116468) 0.101780 / 0.737135 (-0.635355) 0.031756 / 0.296338 (-0.264583)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.191958 / 0.215209 (-0.023251) 1.956769 / 2.077655 (-0.120886) 1.239702 / 1.504120 (-0.264418) 1.178080 / 1.541195 (-0.363115) 1.245347 / 1.468490 (-0.223143) 5.783270 / 4.584777 (1.198493) 4.984510 / 3.745712 (1.238798) 7.626517 / 5.269862 (2.356655) 6.541433 / 4.565676 (1.975757) 0.666776 / 0.424275 (0.242501) 0.011595 / 0.007607 (0.003988) 0.238587 / 0.226044 (0.012543) 2.316223 / 2.268929 (0.047295) 1.722403 / 55.444624 (-53.722222) 1.594014 / 6.876477 (-5.282463) 1.698284 / 2.142072 (-0.443788) 6.405526 / 4.805227 (1.600299) 5.743037 / 6.500664 (-0.757627) 7.060866 / 0.075469 (6.985397)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 89.364012 / 1.841788 (87.522224) 15.593181 / 8.074308 (7.518873) 13.740730 / 10.191392 (3.549338) 0.868651 / 0.680424 (0.188227) 0.291771 / 0.534201 (-0.242430) 0.764619 / 0.579283 (0.185335) 0.540023 / 0.434364 (0.105659) 0.729212 / 0.540337 (0.188875) 1.633085 / 1.386936 (0.246149)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018227 / 0.011353 (0.006874) 0.014906 / 0.011008 (0.003898) 0.055392 / 0.038508 (0.016884) 0.035438 / 0.023109 (0.012328) 0.384287 / 0.275898 (0.108389) 0.418317 / 0.323480 (0.094837) 0.009682 / 0.007986 (0.001696) 0.004552 / 0.004328 (0.000223) 0.007692 / 0.004250 (0.003442) 0.049590 / 0.037052 (0.012537) 0.385283 / 0.258489 (0.126794) 0.424502 / 0.293841 (0.130661) 0.140638 / 0.128546 (0.012092) 0.118228 / 0.075646 (0.042582) 0.532864 / 0.419271 (0.113592) 0.532752 / 0.043533 (0.489219) 0.374659 / 0.255139 (0.119520) 0.397055 / 0.283200 (0.113856) 0.101995 / 0.141683 (-0.039688) 1.902181 / 1.452155 (0.450027) 1.981542 / 1.492716 (0.488826)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.045445 / 0.037411 (0.008034) 0.021998 / 0.014526 (0.007472) 0.028938 / 0.176557 (-0.147618) 0.098006 / 0.737135 (-0.639129) 0.030808 / 0.296338 (-0.265531)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.256650 / 0.215209 (0.041441) 2.636395 / 2.077655 (0.558741) 1.954436 / 1.504120 (0.450316) 1.904450 / 1.541195 (0.363255) 1.981515 / 1.468490 (0.513025) 5.853469 / 4.584777 (1.268692) 5.155402 / 3.745712 (1.409690) 7.689868 / 5.269862 (2.420006) 6.709082 / 4.565676 (2.143405) 0.662790 / 0.424275 (0.238515) 0.011674 / 0.007607 (0.004067) 0.283535 / 0.226044 (0.057491) 2.959350 / 2.268929 (0.690421) 22.480259 / 55.444624 (-32.964365) 4.070771 / 6.876477 (-2.805706) 2.189003 / 2.142072 (0.046931) 6.345653 / 4.805227 (1.540425) 2.900279 / 6.500664 (-3.600385) 0.037963 / 0.075469 (-0.037507)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 91.402717 / 1.841788 (89.560929) 16.042984 / 8.074308 (7.968675) 13.621119 / 10.191392 (3.429727) 0.905240 / 0.680424 (0.224816) 0.661231 / 0.534201 (0.127031) 0.766523 / 0.579283 (0.187240) 0.549510 / 0.434364 (0.115146) 0.750552 / 0.540337 (0.210215) 1.668562 / 1.386936 (0.281626)
@thomwolf thomwolf marked this pull request as ready for review Sep 10, 2020
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on 1d20285 Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018402 / 0.011353 (0.007049) 0.018304 / 0.011008 (0.007296) 0.051084 / 0.038508 (0.012576) 0.031126 / 0.023109 (0.008017) 0.200768 / 0.275898 (-0.075130) 0.209768 / 0.323480 (-0.113712) 0.005940 / 0.007986 (-0.002046) 0.004832 / 0.004328 (0.000504) 0.005831 / 0.004250 (0.001581) 0.044745 / 0.037052 (0.007693) 0.198933 / 0.258489 (-0.059557) 0.211317 / 0.293841 (-0.082524) 0.158468 / 0.128546 (0.029921) 0.123864 / 0.075646 (0.048218) 0.411937 / 0.419271 (-0.007334) 0.508507 / 0.043533 (0.464974) 0.198464 / 0.255139 (-0.056675) 0.211860 / 0.283200 (-0.071340) 0.078483 / 0.141683 (-0.063199) 1.721305 / 1.452155 (0.269150) 1.750566 / 1.492716 (0.257850)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.035889 / 0.037411 (-0.001522) 0.020833 / 0.014526 (0.006308) 0.023979 / 0.176557 (-0.152578) 0.079108 / 0.737135 (-0.658028) 0.077068 / 0.296338 (-0.219271)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.214033 / 0.215209 (-0.001176) 2.056861 / 2.077655 (-0.020793) 1.204725 / 1.504120 (-0.299395) 1.146148 / 1.541195 (-0.395046) 1.140197 / 1.468490 (-0.328294) 6.740767 / 4.584777 (2.155990) 6.108148 / 3.745712 (2.362436) 8.195293 / 5.269862 (2.925431) 7.257651 / 4.565676 (2.691974) 0.679182 / 0.424275 (0.254907) 0.011368 / 0.007607 (0.003760) 0.221453 / 0.226044 (-0.004591) 2.378028 / 2.268929 (0.109099) 1.673777 / 55.444624 (-53.770848) 1.563412 / 6.876477 (-5.313065) 1.602233 / 2.142072 (-0.539839) 6.947903 / 4.805227 (2.142675) 9.021711 / 6.500664 (2.521047) 8.918943 / 0.075469 (8.843474)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 121.617759 / 1.841788 (119.775971) 13.197972 / 8.074308 (5.123664) 14.357798 / 10.191392 (4.166406) 0.892160 / 0.680424 (0.211736) 0.292672 / 0.534201 (-0.241529) 0.858964 / 0.579283 (0.279681) 0.640776 / 0.434364 (0.206413) 0.801580 / 0.540337 (0.261242) 1.638385 / 1.386936 (0.251449)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018547 / 0.011353 (0.007194) 0.017904 / 0.011008 (0.006895) 0.044173 / 0.038508 (0.005665) 0.032787 / 0.023109 (0.009678) 0.329578 / 0.275898 (0.053680) 0.358938 / 0.323480 (0.035458) 0.009088 / 0.007986 (0.001102) 0.005257 / 0.004328 (0.000928) 0.007477 / 0.004250 (0.003226) 0.044227 / 0.037052 (0.007174) 0.314710 / 0.258489 (0.056221) 0.365960 / 0.293841 (0.072119) 0.154113 / 0.128546 (0.025567) 0.128782 / 0.075646 (0.053136) 0.423974 / 0.419271 (0.004702) 0.420324 / 0.043533 (0.376791) 0.318135 / 0.255139 (0.062996) 0.319970 / 0.283200 (0.036771) 0.090008 / 0.141683 (-0.051675) 1.698016 / 1.452155 (0.245862) 1.782341 / 1.492716 (0.289625)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039290 / 0.037411 (0.001879) 0.022032 / 0.014526 (0.007506) 0.024617 / 0.176557 (-0.151940) 0.080764 / 0.737135 (-0.656371) 0.046343 / 0.296338 (-0.249996)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.282156 / 0.215209 (0.066947) 2.967110 / 2.077655 (0.889456) 2.189141 / 1.504120 (0.685021) 2.108707 / 1.541195 (0.567512) 2.170131 / 1.468490 (0.701641) 6.785898 / 4.584777 (2.201121) 5.800769 / 3.745712 (2.055057) 8.292627 / 5.269862 (3.022765) 7.271448 / 4.565676 (2.705771) 0.746707 / 0.424275 (0.322432) 0.012323 / 0.007607 (0.004716) 0.339691 / 0.226044 (0.113646) 3.428409 / 2.268929 (1.159480) 13.473077 / 55.444624 (-41.971548) 3.525065 / 6.876477 (-3.351412) 2.314538 / 2.142072 (0.172465) 6.923184 / 4.805227 (2.117957) 2.203080 / 6.500664 (-4.297584) 0.025825 / 0.075469 (-0.049644)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 120.617234 / 1.841788 (118.775446) 14.093931 / 8.074308 (6.019623) 14.679461 / 10.191392 (4.488069) 0.859454 / 0.680424 (0.179030) 0.547555 / 0.534201 (0.013354) 0.829857 / 0.579283 (0.250574) 0.623130 / 0.434364 (0.188766) 0.770317 / 0.540337 (0.229980) 1.607311 / 1.386936 (0.220375)
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on ac3bf60 Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.020115 / 0.011353 (0.008762) 0.016649 / 0.011008 (0.005640) 0.052360 / 0.038508 (0.013852) 0.034708 / 0.023109 (0.011599) 0.235314 / 0.275898 (-0.040584) 0.262319 / 0.323480 (-0.061161) 0.011406 / 0.007986 (0.003420) 0.005670 / 0.004328 (0.001341) 0.007787 / 0.004250 (0.003537) 0.051572 / 0.037052 (0.014520) 0.238301 / 0.258489 (-0.020188) 0.266587 / 0.293841 (-0.027254) 0.171471 / 0.128546 (0.042925) 0.139117 / 0.075646 (0.063471) 0.493063 / 0.419271 (0.073792) 0.568933 / 0.043533 (0.525400) 0.239224 / 0.255139 (-0.015915) 0.254524 / 0.283200 (-0.028675) 0.090973 / 0.141683 (-0.050710) 2.038702 / 1.452155 (0.586547) 2.086991 / 1.492716 (0.594274)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.042498 / 0.037411 (0.005087) 0.022706 / 0.014526 (0.008180) 0.061266 / 0.176557 (-0.115291) 0.093787 / 0.737135 (-0.643349) 0.028482 / 0.296338 (-0.267856)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.231180 / 0.215209 (0.015971) 2.373326 / 2.077655 (0.295672) 1.436626 / 1.504120 (-0.067494) 1.286577 / 1.541195 (-0.254618) 1.368968 / 1.468490 (-0.099522) 7.227862 / 4.584777 (2.643085) 6.448712 / 3.745712 (2.703000) 9.234240 / 5.269862 (3.964378) 7.999752 / 4.565676 (3.434076) 0.823721 / 0.424275 (0.399446) 0.014258 / 0.007607 (0.006651) 0.267484 / 0.226044 (0.041440) 2.872074 / 2.268929 (0.603145) 1.931501 / 55.444624 (-53.513124) 1.811371 / 6.876477 (-5.065106) 1.890573 / 2.142072 (-0.251499) 7.644046 / 4.805227 (2.838819) 6.367947 / 6.500664 (-0.132717) 7.942578 / 0.075469 (7.867108)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 131.615758 / 1.841788 (129.773970) 14.716714 / 8.074308 (6.642406) 16.196067 / 10.191392 (6.004675) 0.508164 / 0.680424 (-0.172260) 0.324609 / 0.534201 (-0.209592) 0.927963 / 0.579283 (0.348680) 0.706530 / 0.434364 (0.272166) 0.897164 / 0.540337 (0.356827) 1.897636 / 1.386936 (0.510700)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.019631 / 0.011353 (0.008278) 0.019097 / 0.011008 (0.008089) 0.052926 / 0.038508 (0.014418) 0.034757 / 0.023109 (0.011647) 0.406792 / 0.275898 (0.130894) 0.415520 / 0.323480 (0.092040) 0.010856 / 0.007986 (0.002870) 0.005648 / 0.004328 (0.001320) 0.007600 / 0.004250 (0.003350) 0.051793 / 0.037052 (0.014740) 0.389311 / 0.258489 (0.130822) 0.438551 / 0.293841 (0.144710) 0.161687 / 0.128546 (0.033141) 0.143155 / 0.075646 (0.067509) 0.486173 / 0.419271 (0.066901) 0.472172 / 0.043533 (0.428639) 0.405282 / 0.255139 (0.150143) 0.397337 / 0.283200 (0.114137) 0.102010 / 0.141683 (-0.039673) 2.055689 / 1.452155 (0.603535) 2.041724 / 1.492716 (0.549008)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.049320 / 0.037411 (0.011909) 0.024161 / 0.014526 (0.009635) 0.031122 / 0.176557 (-0.145435) 0.096137 / 0.737135 (-0.640999) 0.047844 / 0.296338 (-0.248495)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.329052 / 0.215209 (0.113843) 3.311940 / 2.077655 (1.234285) 2.318698 / 1.504120 (0.814578) 2.167429 / 1.541195 (0.626234) 2.234394 / 1.468490 (0.765904) 7.437238 / 4.584777 (2.852461) 6.403009 / 3.745712 (2.657296) 9.023030 / 5.269862 (3.753169) 7.935835 / 4.565676 (3.370158) 0.768734 / 0.424275 (0.344459) 0.012872 / 0.007607 (0.005265) 0.382426 / 0.226044 (0.156381) 3.901632 / 2.268929 (1.632704) 27.240068 / 55.444624 (-28.204556) 5.022757 / 6.876477 (-1.853720) 2.524514 / 2.142072 (0.382442) 7.507361 / 4.805227 (2.702134) 3.572879 / 6.500664 (-2.927785) 0.045419 / 0.075469 (-0.030050)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 136.508863 / 1.841788 (134.667075) 16.189645 / 8.074308 (8.115336) 17.373528 / 10.191392 (7.182136) 0.948859 / 0.680424 (0.268435) 0.667054 / 0.534201 (0.132853) 0.915534 / 0.579283 (0.336251) 0.685494 / 0.434364 (0.251131) 0.898263 / 0.540337 (0.357925) 1.864250 / 1.386936 (0.477314)
@lhoestq lhoestq merged commit 5f4c6e8 into master Sep 10, 2020
3 of 5 checks passed
3 of 5 checks passed
run
Details
ci/circleci: run_dataset_script_tests_pyarrow_0p17 Your tests failed on CircleCI
Details
ci/circleci: run_dataset_script_tests_pyarrow_1 Your tests failed on CircleCI
Details
ci/circleci: build_doc Your tests passed on CircleCI!
Details
ci/circleci: check_code_quality Your tests passed on CircleCI!
Details
@lhoestq lhoestq deleted the datasets branch Sep 10, 2020
@github-actions

This comment has been minimized.

Copy link

@github-actions github-actions bot commented on 6f6e44c Sep 10, 2020

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.018278 / 0.011353 (0.006925) 0.018655 / 0.011008 (0.007646) 0.051474 / 0.038508 (0.012966) 0.030857 / 0.023109 (0.007748) 0.227897 / 0.275898 (-0.048001) 0.245603 / 0.323480 (-0.077877) 0.010849 / 0.007986 (0.002864) 0.006287 / 0.004328 (0.001958) 0.008152 / 0.004250 (0.003902) 0.050434 / 0.037052 (0.013382) 0.221169 / 0.258489 (-0.037320) 0.258100 / 0.293841 (-0.035741) 0.170605 / 0.128546 (0.042059) 0.137303 / 0.075646 (0.061656) 0.473449 / 0.419271 (0.054177) 0.562509 / 0.043533 (0.518976) 0.321978 / 0.255139 (0.066839) 0.235552 / 0.283200 (-0.047647) 0.088137 / 0.141683 (-0.053546) 1.854300 / 1.452155 (0.402145) 1.891019 / 1.492716 (0.398303)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.039988 / 0.037411 (0.002577) 0.023647 / 0.014526 (0.009122) 0.296375 / 0.176557 (0.119819) 0.137627 / 0.737135 (-0.599508) 0.184518 / 0.296338 (-0.111821)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.214769 / 0.215209 (-0.000441) 2.130116 / 2.077655 (0.052462) 1.310703 / 1.504120 (-0.193416) 1.210377 / 1.541195 (-0.330818) 1.246437 / 1.468490 (-0.222054) 6.942688 / 4.584777 (2.357911) 6.040424 / 3.745712 (2.294712) 8.958592 / 5.269862 (3.688730) 7.798150 / 4.565676 (3.232474) 0.768507 / 0.424275 (0.344232) 0.012939 / 0.007607 (0.005332) 0.272560 / 0.226044 (0.046516) 2.672491 / 2.268929 (0.403563) 1.836744 / 55.444624 (-53.607880) 1.646636 / 6.876477 (-5.229841) 1.687295 / 2.142072 (-0.454777) 7.491775 / 4.805227 (2.686547) 6.553821 / 6.500664 (0.053157) 5.807233 / 0.075469 (5.731764)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 123.327349 / 1.841788 (121.485562) 14.810967 / 8.074308 (6.736658) 16.070809 / 10.191392 (5.879417) 0.905213 / 0.680424 (0.224790) 0.303394 / 0.534201 (-0.230807) 0.919602 / 0.579283 (0.340319) 0.646742 / 0.434364 (0.212378) 0.826928 / 0.540337 (0.286591) 1.661580 / 1.386936 (0.274644)
PyArrow==1.0
Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric read_batch_formatted_as_numpy after write_array2d read_batch_formatted_as_numpy after write_flattened_sequence read_batch_formatted_as_numpy after write_nested_sequence read_batch_unformated after write_array2d read_batch_unformated after write_flattened_sequence read_batch_unformated after write_nested_sequence read_col_formatted_as_numpy after write_array2d read_col_formatted_as_numpy after write_flattened_sequence read_col_formatted_as_numpy after write_nested_sequence read_col_unformated after write_array2d read_col_unformated after write_flattened_sequence read_col_unformated after write_nested_sequence read_formatted_as_numpy after write_array2d read_formatted_as_numpy after write_flattened_sequence read_formatted_as_numpy after write_nested_sequence read_unformated after write_array2d read_unformated after write_flattened_sequence read_unformated after write_nested_sequence write_array2d write_flattened_sequence write_nested_sequence
new / old (diff) 0.020187 / 0.011353 (0.008834) 0.017209 / 0.011008 (0.006201) 0.057188 / 0.038508 (0.018680) 0.033007 / 0.023109 (0.009898) 0.350680 / 0.275898 (0.074782) 0.385017 / 0.323480 (0.061537) 0.010227 / 0.007986 (0.002241) 0.004829 / 0.004328 (0.000501) 0.007631 / 0.004250 (0.003381) 0.046205 / 0.037052 (0.009152) 0.334735 / 0.258489 (0.076246) 0.389929 / 0.293841 (0.096088) 0.159298 / 0.128546 (0.030752) 0.133890 / 0.075646 (0.058244) 0.464886 / 0.419271 (0.045615) 0.438743 / 0.043533 (0.395210) 0.354771 / 0.255139 (0.099632) 0.379383 / 0.283200 (0.096183) 0.094074 / 0.141683 (-0.047609) 1.821776 / 1.452155 (0.369622) 1.997978 / 1.492716 (0.505262)

Benchmark: benchmark_indices_mapping.json

metric select shard shuffle sort train_test_split
new / old (diff) 0.047715 / 0.037411 (0.010304) 0.023494 / 0.014526 (0.008968) 0.027261 / 0.176557 (-0.149295) 0.090549 / 0.737135 (-0.646587) 0.039873 / 0.296338 (-0.256466)

Benchmark: benchmark_iterating.json

metric read 5000 read 50000 read_batch 50000 10 read_batch 50000 100 read_batch 50000 1000 read_formatted numpy 5000 read_formatted pandas 5000 read_formatted tensorflow 5000 read_formatted torch 5000 read_formatted_batch numpy 5000 10 read_formatted_batch numpy 5000 1000 shuffled read 5000 shuffled read 50000 shuffled read_batch 50000 10 shuffled read_batch 50000 100 shuffled read_batch 50000 1000 shuffled read_formatted numpy 5000 shuffled read_formatted_batch numpy 5000 10 shuffled read_formatted_batch numpy 5000 1000
new / old (diff) 0.276239 / 0.215209 (0.061030) 2.848318 / 2.077655 (0.770663) 2.130219 / 1.504120 (0.626099) 1.986036 / 1.541195 (0.444841) 2.051952 / 1.468490 (0.583462) 7.122611 / 4.584777 (2.537834) 6.144563 / 3.745712 (2.398851) 8.916670 / 5.269862 (3.646808) 7.880003 / 4.565676 (3.314327) 0.728955 / 0.424275 (0.304680) 0.013369 / 0.007607 (0.005762) 0.320207 / 0.226044 (0.094163) 3.240598 / 2.268929 (0.971670) 17.819562 / 55.444624 (-37.625062) 4.051462 / 6.876477 (-2.825015) 2.324801 / 2.142072 (0.182728) 7.406239 / 4.805227 (2.601012) 2.775892 / 6.500664 (-3.724772) 0.044377 / 0.075469 (-0.031092)

Benchmark: benchmark_map_filter.json

metric filter map fast-tokenizer batched map identity map identity batched map no-op batched map no-op batched numpy map no-op batched pandas map no-op batched pytorch map no-op batched tensorflow
new / old (diff) 123.481526 / 1.841788 (121.639739) 14.480595 / 8.074308 (6.406287) 16.000892 / 10.191392 (5.809500) 0.805055 / 0.680424 (0.124631) 0.590393 / 0.534201 (0.056192) 0.883525 / 0.579283 (0.304242) 0.658490 / 0.434364 (0.224126) 0.817504 / 0.540337 (0.277166) 1.719845 / 1.386936 (0.332909)
JetRunner added a commit that referenced this pull request Sep 17, 2020
* Changing the name

* style + quality

* update doc and logo

* clean up

* circle-CI on the branche for now

* fix daily dialog dataset

* fix urls

Co-authored-by: Quentin Lhoest <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.