Quick fix :) #606

thomwolf · 2020-09-10T14:32:06Z

nlp => datasets

github-actions · 2020-09-10T14:44:32Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.027638 / 0.011353 (0.016286)	0.016214 / 0.011008 (0.005206)	0.055018 / 0.038508 (0.016510)	0.034608 / 0.023109 (0.011498)	0.223489 / 0.275898 (-0.052409)	0.260111 / 0.323480 (-0.063369)	0.010890 / 0.007986 (0.002904)	0.004676 / 0.004328 (0.000348)	0.009979 / 0.004250 (0.005729)	0.058479 / 0.037052 (0.021427)	0.261165 / 0.258489 (0.002676)	0.284013 / 0.293841 (-0.009828)	0.156151 / 0.128546 (0.027605)	0.134351 / 0.075646 (0.058704)	0.557449 / 0.419271 (0.138177)	0.653178 / 0.043533 (0.609645)	0.229797 / 0.255139 (-0.025342)	0.269457 / 0.283200 (-0.013743)	0.097134 / 0.141683 (-0.044549)	1.939439 / 1.452155 (0.487285)	2.160402 / 1.492716 (0.667685)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.041702 / 0.037411 (0.004291)	0.020076 / 0.014526 (0.005550)	0.073045 / 0.176557 (-0.103511)	0.098905 / 0.737135 (-0.638230)	0.029397 / 0.296338 (-0.266942)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.195881 / 0.215209 (-0.019328)	1.983604 / 2.077655 (-0.094051)	1.236268 / 1.504120 (-0.267852)	1.181048 / 1.541195 (-0.360146)	1.241471 / 1.468490 (-0.227019)	5.791277 / 4.584777 (1.206500)	4.909395 / 3.745712 (1.163683)	8.137659 / 5.269862 (2.867797)	6.745335 / 4.565676 (2.179659)	0.710881 / 0.424275 (0.286605)	0.011655 / 0.007607 (0.004048)	0.261898 / 0.226044 (0.035853)	2.610257 / 2.268929 (0.341329)	1.833505 / 55.444624 (-53.611119)	1.603234 / 6.876477 (-5.273243)	1.672414 / 2.142072 (-0.469658)	6.181796 / 4.805227 (1.376568)	7.731389 / 6.500664 (1.230725)	12.153267 / 0.075469 (12.077798)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	93.975209 / 1.841788 (92.133422)	16.586325 / 8.074308 (8.512017)	14.601410 / 10.191392 (4.410018)	0.900286 / 0.680424 (0.219862)	0.296409 / 0.534201 (-0.237792)	0.808980 / 0.579283 (0.229697)	0.578803 / 0.434364 (0.144440)	0.806875 / 0.540337 (0.266537)	1.743010 / 1.386936 (0.356074)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.018024 / 0.011353 (0.006671)	0.014867 / 0.011008 (0.003859)	0.057146 / 0.038508 (0.018638)	0.035096 / 0.023109 (0.011987)	0.422941 / 0.275898 (0.147043)	0.433483 / 0.323480 (0.110003)	0.010246 / 0.007986 (0.002261)	0.004861 / 0.004328 (0.000533)	0.007399 / 0.004250 (0.003148)	0.051161 / 0.037052 (0.014109)	0.403236 / 0.258489 (0.144747)	0.445144 / 0.293841 (0.151303)	0.142694 / 0.128546 (0.014148)	0.113621 / 0.075646 (0.037975)	0.534672 / 0.419271 (0.115401)	0.455303 / 0.043533 (0.411770)	0.394906 / 0.255139 (0.139767)	0.410977 / 0.283200 (0.127777)	0.105491 / 0.141683 (-0.036192)	2.007552 / 1.452155 (0.555397)	1.921263 / 1.492716 (0.428547)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.044594 / 0.037411 (0.007183)	0.021603 / 0.014526 (0.007077)	0.028749 / 0.176557 (-0.147807)	0.096436 / 0.737135 (-0.640699)	0.030064 / 0.296338 (-0.266275)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.275037 / 0.215209 (0.059828)	2.727625 / 2.077655 (0.649970)	2.044187 / 1.504120 (0.540067)	1.968737 / 1.541195 (0.427542)	2.051824 / 1.468490 (0.583334)	5.874491 / 4.584777 (1.289715)	5.450425 / 3.745712 (1.704713)	7.736012 / 5.269862 (2.466150)	6.728298 / 4.565676 (2.162622)	0.651142 / 0.424275 (0.226867)	0.011969 / 0.007607 (0.004362)	0.332892 / 0.226044 (0.106848)	3.393248 / 2.268929 (1.124320)	23.386321 / 55.444624 (-32.058304)	4.232797 / 6.876477 (-2.643680)	2.332743 / 2.142072 (0.190671)	6.401074 / 4.805227 (1.595847)	2.889343 / 6.500664 (-3.611321)	0.037776 / 0.075469 (-0.037693)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	91.404556 / 1.841788 (89.562768)	16.553257 / 8.074308 (8.478949)	13.962146 / 10.191392 (3.770754)	0.927679 / 0.680424 (0.247256)	0.602120 / 0.534201 (0.067919)	0.755745 / 0.579283 (0.176462)	0.553564 / 0.434364 (0.119200)	0.734414 / 0.540337 (0.194076)	1.686214 / 1.386936 (0.299278)

github-actions · 2020-09-10T14:46:17Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.019148 / 0.011353 (0.007795)	0.016170 / 0.011008 (0.005162)	0.043038 / 0.038508 (0.004530)	0.029652 / 0.023109 (0.006542)	0.174417 / 0.275898 (-0.101481)	0.189991 / 0.323480 (-0.133489)	0.008643 / 0.007986 (0.000657)	0.005001 / 0.004328 (0.000672)	0.006393 / 0.004250 (0.002142)	0.043187 / 0.037052 (0.006135)	0.186340 / 0.258489 (-0.072149)	0.205890 / 0.293841 (-0.087951)	0.161656 / 0.128546 (0.033110)	0.114143 / 0.075646 (0.038496)	0.389636 / 0.419271 (-0.029635)	0.492015 / 0.043533 (0.448482)	0.175067 / 0.255139 (-0.080072)	0.192906 / 0.283200 (-0.090293)	0.080511 / 0.141683 (-0.061172)	1.562899 / 1.452155 (0.110744)	1.831550 / 1.492716 (0.338833)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.038791 / 0.037411 (0.001379)	0.019983 / 0.014526 (0.005457)	0.057667 / 0.176557 (-0.118890)	0.088595 / 0.737135 (-0.648541)	0.026422 / 0.296338 (-0.269916)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.190250 / 0.215209 (-0.024959)	1.929380 / 2.077655 (-0.148274)	1.131788 / 1.504120 (-0.372332)	1.023444 / 1.541195 (-0.517750)	1.124336 / 1.468490 (-0.344154)	6.273425 / 4.584777 (1.688648)	5.615431 / 3.745712 (1.869719)	7.726736 / 5.269862 (2.456875)	6.847425 / 4.565676 (2.281748)	0.657323 / 0.424275 (0.233048)	0.010595 / 0.007607 (0.002988)	0.220684 / 0.226044 (-0.005361)	2.200855 / 2.268929 (-0.068074)	1.660650 / 55.444624 (-53.783975)	1.384939 / 6.876477 (-5.491537)	1.390074 / 2.142072 (-0.751999)	6.639403 / 4.805227 (1.834175)	4.594911 / 6.500664 (-1.905753)	6.956495 / 0.075469 (6.881026)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	113.916272 / 1.841788 (112.074484)	12.179292 / 8.074308 (4.104984)	13.628782 / 10.191392 (3.437390)	0.461103 / 0.680424 (-0.219321)	0.261316 / 0.534201 (-0.272885)	0.751887 / 0.579283 (0.172604)	0.591470 / 0.434364 (0.157106)	0.743311 / 0.540337 (0.202974)	1.522535 / 1.386936 (0.135599)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.018014 / 0.011353 (0.006661)	0.013539 / 0.011008 (0.002531)	0.049787 / 0.038508 (0.011279)	0.033919 / 0.023109 (0.010810)	0.287153 / 0.275898 (0.011255)	0.329634 / 0.323480 (0.006155)	0.007911 / 0.007986 (-0.000075)	0.004574 / 0.004328 (0.000245)	0.006028 / 0.004250 (0.001777)	0.038866 / 0.037052 (0.001814)	0.304843 / 0.258489 (0.046354)	0.337546 / 0.293841 (0.043705)	0.142785 / 0.128546 (0.014238)	0.118884 / 0.075646 (0.043238)	0.404650 / 0.419271 (-0.014622)	0.403834 / 0.043533 (0.360302)	0.284522 / 0.255139 (0.029383)	0.316546 / 0.283200 (0.033346)	0.084729 / 0.141683 (-0.056954)	1.569337 / 1.452155 (0.117182)	1.599256 / 1.492716 (0.106540)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.039430 / 0.037411 (0.002019)	0.021212 / 0.014526 (0.006686)	0.055679 / 0.176557 (-0.120877)	0.081856 / 0.737135 (-0.655280)	0.049044 / 0.296338 (-0.247294)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.260783 / 0.215209 (0.045573)	2.587601 / 2.077655 (0.509947)	1.846348 / 1.504120 (0.342228)	1.799228 / 1.541195 (0.258033)	1.733956 / 1.468490 (0.265466)	6.191410 / 4.584777 (1.606633)	5.566109 / 3.745712 (1.820397)	7.702677 / 5.269862 (2.432815)	6.599413 / 4.565676 (2.033737)	0.645236 / 0.424275 (0.220961)	0.010735 / 0.007607 (0.003128)	0.263732 / 0.226044 (0.037687)	2.818850 / 2.268929 (0.549921)	12.288883 / 55.444624 (-43.155741)	2.894013 / 6.876477 (-3.982464)	1.939767 / 2.142072 (-0.202305)	6.252359 / 4.805227 (1.447132)	1.823086 / 6.500664 (-4.677578)	0.026755 / 0.075469 (-0.048714)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	116.977507 / 1.841788 (115.135719)	13.232334 / 8.074308 (5.158026)	15.156511 / 10.191392 (4.965119)	0.752321 / 0.680424 (0.071897)	0.500571 / 0.534201 (-0.033630)	0.773368 / 0.579283 (0.194085)	0.594382 / 0.434364 (0.160018)	0.737019 / 0.540337 (0.196681)	1.450352 / 1.386936 (0.063416)

github-actions · 2020-09-10T14:51:24Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.016950 / 0.011353 (0.005597)	0.015546 / 0.011008 (0.004538)	0.054088 / 0.038508 (0.015580)	0.033699 / 0.023109 (0.010590)	0.201062 / 0.275898 (-0.074836)	0.250970 / 0.323480 (-0.072510)	0.009599 / 0.007986 (0.001613)	0.004554 / 0.004328 (0.000225)	0.009442 / 0.004250 (0.005192)	0.048017 / 0.037052 (0.010964)	0.202123 / 0.258489 (-0.056366)	0.224397 / 0.293841 (-0.069444)	0.149750 / 0.128546 (0.021204)	0.103039 / 0.075646 (0.027393)	0.493502 / 0.419271 (0.074230)	0.553087 / 0.043533 (0.509554)	0.193669 / 0.255139 (-0.061470)	0.217891 / 0.283200 (-0.065308)	0.087512 / 0.141683 (-0.054171)	1.748126 / 1.452155 (0.295972)	1.945770 / 1.492716 (0.453054)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.038808 / 0.037411 (0.001397)	0.019459 / 0.014526 (0.004934)	0.075463 / 0.176557 (-0.101093)	0.097779 / 0.737135 (-0.639356)	0.027898 / 0.296338 (-0.268440)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.195069 / 0.215209 (-0.020140)	1.970286 / 2.077655 (-0.107369)	1.134242 / 1.504120 (-0.369878)	1.133443 / 1.541195 (-0.407752)	1.165213 / 1.468490 (-0.303277)	5.781870 / 4.584777 (1.197093)	4.751155 / 3.745712 (1.005443)	7.346226 / 5.269862 (2.076364)	6.263107 / 4.565676 (1.697431)	0.607458 / 0.424275 (0.183183)	0.011152 / 0.007607 (0.003545)	0.212833 / 0.226044 (-0.013211)	2.230848 / 2.268929 (-0.038081)	1.637563 / 55.444624 (-53.807061)	1.568548 / 6.876477 (-5.307929)	1.670517 / 2.142072 (-0.471555)	6.204322 / 4.805227 (1.399095)	3.745892 / 6.500664 (-2.754772)	6.738933 / 0.075469 (6.663464)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	89.578056 / 1.841788 (87.736268)	14.106082 / 8.074308 (6.031774)	12.828463 / 10.191392 (2.637071)	0.426670 / 0.680424 (-0.253754)	0.268276 / 0.534201 (-0.265925)	0.755941 / 0.579283 (0.176658)	0.503389 / 0.434364 (0.069025)	0.700248 / 0.540337 (0.159910)	1.561131 / 1.386936 (0.174195)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.016467 / 0.011353 (0.005114)	0.015217 / 0.011008 (0.004208)	0.048231 / 0.038508 (0.009722)	0.033341 / 0.023109 (0.010231)	0.388402 / 0.275898 (0.112504)	0.385269 / 0.323480 (0.061790)	0.008497 / 0.007986 (0.000511)	0.004471 / 0.004328 (0.000143)	0.007792 / 0.004250 (0.003541)	0.045603 / 0.037052 (0.008550)	0.393505 / 0.258489 (0.135016)	0.404203 / 0.293841 (0.110362)	0.144659 / 0.128546 (0.016113)	0.109299 / 0.075646 (0.033653)	0.463379 / 0.419271 (0.044108)	0.598813 / 0.043533 (0.555280)	0.380583 / 0.255139 (0.125444)	0.403680 / 0.283200 (0.120480)	0.097553 / 0.141683 (-0.044130)	1.850292 / 1.452155 (0.398137)	1.852688 / 1.492716 (0.359971)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.043443 / 0.037411 (0.006031)	0.035494 / 0.014526 (0.020968)	0.025808 / 0.176557 (-0.150749)	0.092716 / 0.737135 (-0.644419)	0.026975 / 0.296338 (-0.269364)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.257937 / 0.215209 (0.042728)	2.676080 / 2.077655 (0.598425)	2.092468 / 1.504120 (0.588348)	2.029361 / 1.541195 (0.488166)	2.008844 / 1.468490 (0.540353)	5.495240 / 4.584777 (0.910463)	4.749601 / 3.745712 (1.003889)	7.511943 / 5.269862 (2.242082)	6.257181 / 4.565676 (1.691504)	0.622458 / 0.424275 (0.198183)	0.011663 / 0.007607 (0.004055)	0.291413 / 0.226044 (0.065368)	3.058831 / 2.268929 (0.789903)	20.772394 / 55.444624 (-34.672230)	4.201062 / 6.876477 (-2.675415)	2.342566 / 2.142072 (0.200494)	6.182417 / 4.805227 (1.377189)	2.777193 / 6.500664 (-3.723471)	0.037019 / 0.075469 (-0.038450)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	89.487385 / 1.841788 (87.645598)	14.735662 / 8.074308 (6.661354)	12.797439 / 10.191392 (2.606047)	1.210369 / 0.680424 (0.529945)	0.531790 / 0.534201 (-0.002411)	0.747200 / 0.579283 (0.167917)	0.526717 / 0.434364 (0.092353)	0.685780 / 0.540337 (0.145443)	1.438029 / 1.386936 (0.051093)

stefan-it · 2020-09-10T14:55:31Z

❤️

github-actions · 2020-09-10T14:58:42Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.017147 / 0.011353 (0.005794)	0.016301 / 0.011008 (0.005292)	0.053928 / 0.038508 (0.015420)	0.034899 / 0.023109 (0.011790)	0.238868 / 0.275898 (-0.037030)	0.250404 / 0.323480 (-0.073076)	0.018717 / 0.007986 (0.010731)	0.004662 / 0.004328 (0.000333)	0.009032 / 0.004250 (0.004782)	0.055823 / 0.037052 (0.018771)	0.235139 / 0.258489 (-0.023350)	0.254224 / 0.293841 (-0.039617)	0.154241 / 0.128546 (0.025694)	0.122672 / 0.075646 (0.047025)	0.512888 / 0.419271 (0.093616)	0.533524 / 0.043533 (0.489991)	0.241845 / 0.255139 (-0.013294)	0.257843 / 0.283200 (-0.025356)	0.102125 / 0.141683 (-0.039558)	1.924407 / 1.452155 (0.472253)	2.097629 / 1.492716 (0.604913)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.044088 / 0.037411 (0.006676)	0.023737 / 0.014526 (0.009211)	0.099448 / 0.176557 (-0.077108)	0.152431 / 0.737135 (-0.584704)	0.146864 / 0.296338 (-0.149474)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.193538 / 0.215209 (-0.021671)	1.939486 / 2.077655 (-0.138169)	1.225829 / 1.504120 (-0.278291)	1.162216 / 1.541195 (-0.378979)	1.221609 / 1.468490 (-0.246881)	5.792625 / 4.584777 (1.207848)	4.912239 / 3.745712 (1.166527)	7.844917 / 5.269862 (2.575056)	6.445696 / 4.565676 (1.880019)	0.649975 / 0.424275 (0.225700)	0.015611 / 0.007607 (0.008003)	0.252688 / 0.226044 (0.026643)	2.569997 / 2.268929 (0.301069)	1.811335 / 55.444624 (-53.633289)	1.726897 / 6.876477 (-5.149579)	1.789983 / 2.142072 (-0.352090)	6.540446 / 4.805227 (1.735219)	4.311511 / 6.500664 (-2.189153)	12.085898 / 0.075469 (12.010429)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	94.982961 / 1.841788 (93.141174)	16.317485 / 8.074308 (8.243177)	14.185236 / 10.191392 (3.993844)	0.920176 / 0.680424 (0.239753)	0.334963 / 0.534201 (-0.199238)	0.789213 / 0.579283 (0.209930)	0.564800 / 0.434364 (0.130436)	0.751816 / 0.540337 (0.211479)	1.688929 / 1.386936 (0.301993)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.017771 / 0.011353 (0.006418)	0.015591 / 0.011008 (0.004582)	0.062600 / 0.038508 (0.024092)	0.035261 / 0.023109 (0.012152)	0.370024 / 0.275898 (0.094126)	0.404539 / 0.323480 (0.081059)	0.011940 / 0.007986 (0.003954)	0.005262 / 0.004328 (0.000933)	0.008972 / 0.004250 (0.004722)	0.051098 / 0.037052 (0.014046)	0.383473 / 0.258489 (0.124984)	0.427732 / 0.293841 (0.133891)	0.144292 / 0.128546 (0.015746)	0.116459 / 0.075646 (0.040813)	0.548033 / 0.419271 (0.128761)	0.614114 / 0.043533 (0.570582)	0.373160 / 0.255139 (0.118021)	0.387909 / 0.283200 (0.104710)	0.117734 / 0.141683 (-0.023949)	1.969538 / 1.452155 (0.517384)	2.020225 / 1.492716 (0.527509)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.051002 / 0.037411 (0.013591)	0.024714 / 0.014526 (0.010189)	0.039624 / 0.176557 (-0.136933)	0.169620 / 0.737135 (-0.567515)	0.144368 / 0.296338 (-0.151970)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.262360 / 0.215209 (0.047151)	2.612547 / 2.077655 (0.534893)	2.032326 / 1.504120 (0.528206)	2.012698 / 1.541195 (0.471504)	2.087607 / 1.468490 (0.619117)	5.974431 / 4.584777 (1.389654)	5.332280 / 3.745712 (1.586567)	7.966859 / 5.269862 (2.696997)	6.909480 / 4.565676 (2.343803)	0.668583 / 0.424275 (0.244308)	0.011942 / 0.007607 (0.004335)	0.293542 / 0.226044 (0.067498)	3.128638 / 2.268929 (0.859710)	21.886702 / 55.444624 (-33.557922)	4.199898 / 6.876477 (-2.676579)	2.454890 / 2.142072 (0.312818)	6.526698 / 4.805227 (1.721471)	3.057838 / 6.500664 (-3.442826)	0.043595 / 0.075469 (-0.031874)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	91.421128 / 1.841788 (89.579341)	16.133575 / 8.074308 (8.059267)	13.473059 / 10.191392 (3.281667)	0.843217 / 0.680424 (0.162793)	0.581228 / 0.534201 (0.047027)	0.760811 / 0.579283 (0.181528)	0.530588 / 0.434364 (0.096224)	0.740119 / 0.540337 (0.199781)	1.650669 / 1.386936 (0.263733)

github-actions · 2020-09-10T15:14:36Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.027163 / 0.011353 (0.015810)	0.014898 / 0.011008 (0.003890)	0.053381 / 0.038508 (0.014873)	0.034856 / 0.023109 (0.011747)	0.217819 / 0.275898 (-0.058079)	0.258068 / 0.323480 (-0.065412)	0.010237 / 0.007986 (0.002252)	0.004559 / 0.004328 (0.000231)	0.010222 / 0.004250 (0.005971)	0.051848 / 0.037052 (0.014796)	0.217429 / 0.258489 (-0.041060)	0.247196 / 0.293841 (-0.046645)	0.141268 / 0.128546 (0.012722)	0.108874 / 0.075646 (0.033228)	0.528326 / 0.419271 (0.109054)	0.522601 / 0.043533 (0.479068)	0.215926 / 0.255139 (-0.039213)	0.235013 / 0.283200 (-0.048186)	0.088748 / 0.141683 (-0.052935)	1.886227 / 1.452155 (0.434072)	1.946110 / 1.492716 (0.453393)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.041341 / 0.037411 (0.003929)	0.021447 / 0.014526 (0.006921)	0.060088 / 0.176557 (-0.116468)	0.101780 / 0.737135 (-0.635355)	0.031756 / 0.296338 (-0.264583)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.191958 / 0.215209 (-0.023251)	1.956769 / 2.077655 (-0.120886)	1.239702 / 1.504120 (-0.264418)	1.178080 / 1.541195 (-0.363115)	1.245347 / 1.468490 (-0.223143)	5.783270 / 4.584777 (1.198493)	4.984510 / 3.745712 (1.238798)	7.626517 / 5.269862 (2.356655)	6.541433 / 4.565676 (1.975757)	0.666776 / 0.424275 (0.242501)	0.011595 / 0.007607 (0.003988)	0.238587 / 0.226044 (0.012543)	2.316223 / 2.268929 (0.047295)	1.722403 / 55.444624 (-53.722222)	1.594014 / 6.876477 (-5.282463)	1.698284 / 2.142072 (-0.443788)	6.405526 / 4.805227 (1.600299)	5.743037 / 6.500664 (-0.757627)	7.060866 / 0.075469 (6.985397)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	89.364012 / 1.841788 (87.522224)	15.593181 / 8.074308 (7.518873)	13.740730 / 10.191392 (3.549338)	0.868651 / 0.680424 (0.188227)	0.291771 / 0.534201 (-0.242430)	0.764619 / 0.579283 (0.185335)	0.540023 / 0.434364 (0.105659)	0.729212 / 0.540337 (0.188875)	1.633085 / 1.386936 (0.246149)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.018227 / 0.011353 (0.006874)	0.014906 / 0.011008 (0.003898)	0.055392 / 0.038508 (0.016884)	0.035438 / 0.023109 (0.012328)	0.384287 / 0.275898 (0.108389)	0.418317 / 0.323480 (0.094837)	0.009682 / 0.007986 (0.001696)	0.004552 / 0.004328 (0.000223)	0.007692 / 0.004250 (0.003442)	0.049590 / 0.037052 (0.012537)	0.385283 / 0.258489 (0.126794)	0.424502 / 0.293841 (0.130661)	0.140638 / 0.128546 (0.012092)	0.118228 / 0.075646 (0.042582)	0.532864 / 0.419271 (0.113592)	0.532752 / 0.043533 (0.489219)	0.374659 / 0.255139 (0.119520)	0.397055 / 0.283200 (0.113856)	0.101995 / 0.141683 (-0.039688)	1.902181 / 1.452155 (0.450027)	1.981542 / 1.492716 (0.488826)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.045445 / 0.037411 (0.008034)	0.021998 / 0.014526 (0.007472)	0.028938 / 0.176557 (-0.147618)	0.098006 / 0.737135 (-0.639129)	0.030808 / 0.296338 (-0.265531)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.256650 / 0.215209 (0.041441)	2.636395 / 2.077655 (0.558741)	1.954436 / 1.504120 (0.450316)	1.904450 / 1.541195 (0.363255)	1.981515 / 1.468490 (0.513025)	5.853469 / 4.584777 (1.268692)	5.155402 / 3.745712 (1.409690)	7.689868 / 5.269862 (2.420006)	6.709082 / 4.565676 (2.143405)	0.662790 / 0.424275 (0.238515)	0.011674 / 0.007607 (0.004067)	0.283535 / 0.226044 (0.057491)	2.959350 / 2.268929 (0.690421)	22.480259 / 55.444624 (-32.964365)	4.070771 / 6.876477 (-2.805706)	2.189003 / 2.142072 (0.046931)	6.345653 / 4.805227 (1.540425)	2.900279 / 6.500664 (-3.600385)	0.037963 / 0.075469 (-0.037507)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	91.402717 / 1.841788 (89.560929)	16.042984 / 8.074308 (7.968675)	13.621119 / 10.191392 (3.429727)	0.905240 / 0.680424 (0.224816)	0.661231 / 0.534201 (0.127031)	0.766523 / 0.579283 (0.187240)	0.549510 / 0.434364 (0.115146)	0.750552 / 0.540337 (0.210215)	1.668562 / 1.386936 (0.281626)

github-actions · 2020-09-10T15:24:17Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.018402 / 0.011353 (0.007049)	0.018304 / 0.011008 (0.007296)	0.051084 / 0.038508 (0.012576)	0.031126 / 0.023109 (0.008017)	0.200768 / 0.275898 (-0.075130)	0.209768 / 0.323480 (-0.113712)	0.005940 / 0.007986 (-0.002046)	0.004832 / 0.004328 (0.000504)	0.005831 / 0.004250 (0.001581)	0.044745 / 0.037052 (0.007693)	0.198933 / 0.258489 (-0.059557)	0.211317 / 0.293841 (-0.082524)	0.158468 / 0.128546 (0.029921)	0.123864 / 0.075646 (0.048218)	0.411937 / 0.419271 (-0.007334)	0.508507 / 0.043533 (0.464974)	0.198464 / 0.255139 (-0.056675)	0.211860 / 0.283200 (-0.071340)	0.078483 / 0.141683 (-0.063199)	1.721305 / 1.452155 (0.269150)	1.750566 / 1.492716 (0.257850)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.035889 / 0.037411 (-0.001522)	0.020833 / 0.014526 (0.006308)	0.023979 / 0.176557 (-0.152578)	0.079108 / 0.737135 (-0.658028)	0.077068 / 0.296338 (-0.219271)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.214033 / 0.215209 (-0.001176)	2.056861 / 2.077655 (-0.020793)	1.204725 / 1.504120 (-0.299395)	1.146148 / 1.541195 (-0.395046)	1.140197 / 1.468490 (-0.328294)	6.740767 / 4.584777 (2.155990)	6.108148 / 3.745712 (2.362436)	8.195293 / 5.269862 (2.925431)	7.257651 / 4.565676 (2.691974)	0.679182 / 0.424275 (0.254907)	0.011368 / 0.007607 (0.003760)	0.221453 / 0.226044 (-0.004591)	2.378028 / 2.268929 (0.109099)	1.673777 / 55.444624 (-53.770848)	1.563412 / 6.876477 (-5.313065)	1.602233 / 2.142072 (-0.539839)	6.947903 / 4.805227 (2.142675)	9.021711 / 6.500664 (2.521047)	8.918943 / 0.075469 (8.843474)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	121.617759 / 1.841788 (119.775971)	13.197972 / 8.074308 (5.123664)	14.357798 / 10.191392 (4.166406)	0.892160 / 0.680424 (0.211736)	0.292672 / 0.534201 (-0.241529)	0.858964 / 0.579283 (0.279681)	0.640776 / 0.434364 (0.206413)	0.801580 / 0.540337 (0.261242)	1.638385 / 1.386936 (0.251449)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.018547 / 0.011353 (0.007194)	0.017904 / 0.011008 (0.006895)	0.044173 / 0.038508 (0.005665)	0.032787 / 0.023109 (0.009678)	0.329578 / 0.275898 (0.053680)	0.358938 / 0.323480 (0.035458)	0.009088 / 0.007986 (0.001102)	0.005257 / 0.004328 (0.000928)	0.007477 / 0.004250 (0.003226)	0.044227 / 0.037052 (0.007174)	0.314710 / 0.258489 (0.056221)	0.365960 / 0.293841 (0.072119)	0.154113 / 0.128546 (0.025567)	0.128782 / 0.075646 (0.053136)	0.423974 / 0.419271 (0.004702)	0.420324 / 0.043533 (0.376791)	0.318135 / 0.255139 (0.062996)	0.319970 / 0.283200 (0.036771)	0.090008 / 0.141683 (-0.051675)	1.698016 / 1.452155 (0.245862)	1.782341 / 1.492716 (0.289625)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.039290 / 0.037411 (0.001879)	0.022032 / 0.014526 (0.007506)	0.024617 / 0.176557 (-0.151940)	0.080764 / 0.737135 (-0.656371)	0.046343 / 0.296338 (-0.249996)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.282156 / 0.215209 (0.066947)	2.967110 / 2.077655 (0.889456)	2.189141 / 1.504120 (0.685021)	2.108707 / 1.541195 (0.567512)	2.170131 / 1.468490 (0.701641)	6.785898 / 4.584777 (2.201121)	5.800769 / 3.745712 (2.055057)	8.292627 / 5.269862 (3.022765)	7.271448 / 4.565676 (2.705771)	0.746707 / 0.424275 (0.322432)	0.012323 / 0.007607 (0.004716)	0.339691 / 0.226044 (0.113646)	3.428409 / 2.268929 (1.159480)	13.473077 / 55.444624 (-41.971548)	3.525065 / 6.876477 (-3.351412)	2.314538 / 2.142072 (0.172465)	6.923184 / 4.805227 (2.117957)	2.203080 / 6.500664 (-4.297584)	0.025825 / 0.075469 (-0.049644)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	120.617234 / 1.841788 (118.775446)	14.093931 / 8.074308 (6.019623)	14.679461 / 10.191392 (4.488069)	0.859454 / 0.680424 (0.179030)	0.547555 / 0.534201 (0.013354)	0.829857 / 0.579283 (0.250574)	0.623130 / 0.434364 (0.188766)	0.770317 / 0.540337 (0.229980)	1.607311 / 1.386936 (0.220375)

github-actions · 2020-09-10T16:02:17Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.020115 / 0.011353 (0.008762)	0.016649 / 0.011008 (0.005640)	0.052360 / 0.038508 (0.013852)	0.034708 / 0.023109 (0.011599)	0.235314 / 0.275898 (-0.040584)	0.262319 / 0.323480 (-0.061161)	0.011406 / 0.007986 (0.003420)	0.005670 / 0.004328 (0.001341)	0.007787 / 0.004250 (0.003537)	0.051572 / 0.037052 (0.014520)	0.238301 / 0.258489 (-0.020188)	0.266587 / 0.293841 (-0.027254)	0.171471 / 0.128546 (0.042925)	0.139117 / 0.075646 (0.063471)	0.493063 / 0.419271 (0.073792)	0.568933 / 0.043533 (0.525400)	0.239224 / 0.255139 (-0.015915)	0.254524 / 0.283200 (-0.028675)	0.090973 / 0.141683 (-0.050710)	2.038702 / 1.452155 (0.586547)	2.086991 / 1.492716 (0.594274)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.042498 / 0.037411 (0.005087)	0.022706 / 0.014526 (0.008180)	0.061266 / 0.176557 (-0.115291)	0.093787 / 0.737135 (-0.643349)	0.028482 / 0.296338 (-0.267856)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.231180 / 0.215209 (0.015971)	2.373326 / 2.077655 (0.295672)	1.436626 / 1.504120 (-0.067494)	1.286577 / 1.541195 (-0.254618)	1.368968 / 1.468490 (-0.099522)	7.227862 / 4.584777 (2.643085)	6.448712 / 3.745712 (2.703000)	9.234240 / 5.269862 (3.964378)	7.999752 / 4.565676 (3.434076)	0.823721 / 0.424275 (0.399446)	0.014258 / 0.007607 (0.006651)	0.267484 / 0.226044 (0.041440)	2.872074 / 2.268929 (0.603145)	1.931501 / 55.444624 (-53.513124)	1.811371 / 6.876477 (-5.065106)	1.890573 / 2.142072 (-0.251499)	7.644046 / 4.805227 (2.838819)	6.367947 / 6.500664 (-0.132717)	7.942578 / 0.075469 (7.867108)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	131.615758 / 1.841788 (129.773970)	14.716714 / 8.074308 (6.642406)	16.196067 / 10.191392 (6.004675)	0.508164 / 0.680424 (-0.172260)	0.324609 / 0.534201 (-0.209592)	0.927963 / 0.579283 (0.348680)	0.706530 / 0.434364 (0.272166)	0.897164 / 0.540337 (0.356827)	1.897636 / 1.386936 (0.510700)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.019631 / 0.011353 (0.008278)	0.019097 / 0.011008 (0.008089)	0.052926 / 0.038508 (0.014418)	0.034757 / 0.023109 (0.011647)	0.406792 / 0.275898 (0.130894)	0.415520 / 0.323480 (0.092040)	0.010856 / 0.007986 (0.002870)	0.005648 / 0.004328 (0.001320)	0.007600 / 0.004250 (0.003350)	0.051793 / 0.037052 (0.014740)	0.389311 / 0.258489 (0.130822)	0.438551 / 0.293841 (0.144710)	0.161687 / 0.128546 (0.033141)	0.143155 / 0.075646 (0.067509)	0.486173 / 0.419271 (0.066901)	0.472172 / 0.043533 (0.428639)	0.405282 / 0.255139 (0.150143)	0.397337 / 0.283200 (0.114137)	0.102010 / 0.141683 (-0.039673)	2.055689 / 1.452155 (0.603535)	2.041724 / 1.492716 (0.549008)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.049320 / 0.037411 (0.011909)	0.024161 / 0.014526 (0.009635)	0.031122 / 0.176557 (-0.145435)	0.096137 / 0.737135 (-0.640999)	0.047844 / 0.296338 (-0.248495)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.329052 / 0.215209 (0.113843)	3.311940 / 2.077655 (1.234285)	2.318698 / 1.504120 (0.814578)	2.167429 / 1.541195 (0.626234)	2.234394 / 1.468490 (0.765904)	7.437238 / 4.584777 (2.852461)	6.403009 / 3.745712 (2.657296)	9.023030 / 5.269862 (3.753169)	7.935835 / 4.565676 (3.370158)	0.768734 / 0.424275 (0.344459)	0.012872 / 0.007607 (0.005265)	0.382426 / 0.226044 (0.156381)	3.901632 / 2.268929 (1.632704)	27.240068 / 55.444624 (-28.204556)	5.022757 / 6.876477 (-1.853720)	2.524514 / 2.142072 (0.382442)	7.507361 / 4.805227 (2.702134)	3.572879 / 6.500664 (-2.927785)	0.045419 / 0.075469 (-0.030050)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	136.508863 / 1.841788 (134.667075)	16.189645 / 8.074308 (8.115336)	17.373528 / 10.191392 (7.182136)	0.948859 / 0.680424 (0.268435)	0.667054 / 0.534201 (0.132853)	0.915534 / 0.579283 (0.336251)	0.685494 / 0.434364 (0.251131)	0.898263 / 0.540337 (0.357925)	1.864250 / 1.386936 (0.477314)

github-actions · 2020-09-10T16:21:21Z

Show benchmarks

PyArrow==0.17.1

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.018278 / 0.011353 (0.006925)	0.018655 / 0.011008 (0.007646)	0.051474 / 0.038508 (0.012966)	0.030857 / 0.023109 (0.007748)	0.227897 / 0.275898 (-0.048001)	0.245603 / 0.323480 (-0.077877)	0.010849 / 0.007986 (0.002864)	0.006287 / 0.004328 (0.001958)	0.008152 / 0.004250 (0.003902)	0.050434 / 0.037052 (0.013382)	0.221169 / 0.258489 (-0.037320)	0.258100 / 0.293841 (-0.035741)	0.170605 / 0.128546 (0.042059)	0.137303 / 0.075646 (0.061656)	0.473449 / 0.419271 (0.054177)	0.562509 / 0.043533 (0.518976)	0.321978 / 0.255139 (0.066839)	0.235552 / 0.283200 (-0.047647)	0.088137 / 0.141683 (-0.053546)	1.854300 / 1.452155 (0.402145)	1.891019 / 1.492716 (0.398303)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.039988 / 0.037411 (0.002577)	0.023647 / 0.014526 (0.009122)	0.296375 / 0.176557 (0.119819)	0.137627 / 0.737135 (-0.599508)	0.184518 / 0.296338 (-0.111821)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.214769 / 0.215209 (-0.000441)	2.130116 / 2.077655 (0.052462)	1.310703 / 1.504120 (-0.193416)	1.210377 / 1.541195 (-0.330818)	1.246437 / 1.468490 (-0.222054)	6.942688 / 4.584777 (2.357911)	6.040424 / 3.745712 (2.294712)	8.958592 / 5.269862 (3.688730)	7.798150 / 4.565676 (3.232474)	0.768507 / 0.424275 (0.344232)	0.012939 / 0.007607 (0.005332)	0.272560 / 0.226044 (0.046516)	2.672491 / 2.268929 (0.403563)	1.836744 / 55.444624 (-53.607880)	1.646636 / 6.876477 (-5.229841)	1.687295 / 2.142072 (-0.454777)	7.491775 / 4.805227 (2.686547)	6.553821 / 6.500664 (0.053157)	5.807233 / 0.075469 (5.731764)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	123.327349 / 1.841788 (121.485562)	14.810967 / 8.074308 (6.736658)	16.070809 / 10.191392 (5.879417)	0.905213 / 0.680424 (0.224790)	0.303394 / 0.534201 (-0.230807)	0.919602 / 0.579283 (0.340319)	0.646742 / 0.434364 (0.212378)	0.826928 / 0.540337 (0.286591)	1.661580 / 1.386936 (0.274644)

PyArrow==1.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.020187 / 0.011353 (0.008834)	0.017209 / 0.011008 (0.006201)	0.057188 / 0.038508 (0.018680)	0.033007 / 0.023109 (0.009898)	0.350680 / 0.275898 (0.074782)	0.385017 / 0.323480 (0.061537)	0.010227 / 0.007986 (0.002241)	0.004829 / 0.004328 (0.000501)	0.007631 / 0.004250 (0.003381)	0.046205 / 0.037052 (0.009152)	0.334735 / 0.258489 (0.076246)	0.389929 / 0.293841 (0.096088)	0.159298 / 0.128546 (0.030752)	0.133890 / 0.075646 (0.058244)	0.464886 / 0.419271 (0.045615)	0.438743 / 0.043533 (0.395210)	0.354771 / 0.255139 (0.099632)	0.379383 / 0.283200 (0.096183)	0.094074 / 0.141683 (-0.047609)	1.821776 / 1.452155 (0.369622)	1.997978 / 1.492716 (0.505262)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.047715 / 0.037411 (0.010304)	0.023494 / 0.014526 (0.008968)	0.027261 / 0.176557 (-0.149295)	0.090549 / 0.737135 (-0.646587)	0.039873 / 0.296338 (-0.256466)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.276239 / 0.215209 (0.061030)	2.848318 / 2.077655 (0.770663)	2.130219 / 1.504120 (0.626099)	1.986036 / 1.541195 (0.444841)	2.051952 / 1.468490 (0.583462)	7.122611 / 4.584777 (2.537834)	6.144563 / 3.745712 (2.398851)	8.916670 / 5.269862 (3.646808)	7.880003 / 4.565676 (3.314327)	0.728955 / 0.424275 (0.304680)	0.013369 / 0.007607 (0.005762)	0.320207 / 0.226044 (0.094163)	3.240598 / 2.268929 (0.971670)	17.819562 / 55.444624 (-37.625062)	4.051462 / 6.876477 (-2.825015)	2.324801 / 2.142072 (0.182728)	7.406239 / 4.805227 (2.601012)	2.775892 / 6.500664 (-3.724772)	0.044377 / 0.075469 (-0.031092)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	123.481526 / 1.841788 (121.639739)	14.480595 / 8.074308 (6.406287)	16.000892 / 10.191392 (5.809500)	0.805055 / 0.680424 (0.124631)	0.590393 / 0.534201 (0.056192)	0.883525 / 0.579283 (0.304242)	0.658490 / 0.434364 (0.224126)	0.817504 / 0.540337 (0.277166)	1.719845 / 1.386936 (0.332909)

* Changing the name * style + quality * update doc and logo * clean up * circle-CI on the branche for now * fix daily dialog dataset * fix urls Co-authored-by: Quentin Lhoest <[email protected]>

thomwolf added 3 commits Sep 10, 2020

Changing the name

Loading status checks…

4c79292

style + quality

Loading status checks…

62c5111

update doc and logo

Loading status checks…

f91b29f

clean up

Loading status checks…

d6fdebb

thomwolf added 2 commits Sep 10, 2020

circle-CI on the branche for now

Loading status checks…

cc0c0c4

fix daily dialog dataset

Loading status checks…

1d20285

thomwolf marked this pull request as ready for review Sep 10, 2020

fix urls

Loading status checks…

ac3bf60

Merge branch 'master' into datasets

Loading status checks…

6f6e44c

lhoestq deleted the datasets branch Sep 10, 2020

Nov	DEC	Jan
	02
2019	2020	2021

huggingface / datasets

Join GitHub today

GitHub is where the world builds software

Quick fix :) #606

Quick fix :) #606

Conversation

thomwolf commented Sep 10, 2020

This comment has been minimized.

github-actions bot commented on 4c79292 Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

This comment has been minimized.

github-actions bot commented on 62c5111 Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

This comment has been minimized.

github-actions bot commented on f91b29f Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

stefan-it commented Sep 10, 2020

This comment has been minimized.

github-actions bot commented on d6fdebb Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

This comment has been minimized.

github-actions bot commented on cc0c0c4 Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

This comment has been minimized.

github-actions bot commented on 1d20285 Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

This comment has been minimized.

github-actions bot commented on ac3bf60 Sep 10, 2020

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

This comment has been minimized.

github-actions bot commented on 6f6e44c Sep 10, 2020

github-actions bot commented on `4c79292` Sep 10, 2020

github-actions bot commented on `62c5111` Sep 10, 2020

github-actions bot commented on `f91b29f` Sep 10, 2020

github-actions bot commented on `d6fdebb` Sep 10, 2020

github-actions bot commented on `cc0c0c4` Sep 10, 2020

github-actions bot commented on `1d20285` Sep 10, 2020

github-actions bot commented on `ac3bf60` Sep 10, 2020

github-actions bot commented on `6f6e44c` Sep 10, 2020