FUSE micro-opt benchmarking #5110

ThomasWaldmann · 2020-04-13T14:23:22Z

If somebody has some time for FUSE benchmarking:

diff --git a/src/borg/fuse.py b/src/borg/fuse.py
index 429790e4..27ab1c1a 100644
--- a/src/borg/fuse.py
+++ b/src/borg/fuse.py
@@ -644,12 +644,13 @@ def read(self, fh, offset, size):
                 data = self.data_cache[id]
                 if offset + n == len(data):
                     # evict fully read chunk from cache
-                    del self.data_cache[id]
+                    pass # del self.data_cache[id]
             else:
                 data = self.key.decrypt(id, self.repository_uncached.get(id))
-                if offset + n < len(data):
+                if True: # offset + n < len(data):
                     # chunk was only partially read, cache it
                     self.data_cache[id] = data
+            #data = memoryview(data)
             parts.append(data[offset:offset + n])
             offset = 0
             size -= n

The first 2 changes remove selective caching only of partially read chunks and cache removal of fully read chunks. While this sounds obvious when thinking about sequential reads, it maybe is counterproductive for repeating chunks (like all-zero chunks).

The 3rd change tries to avoid creating a copy of data just for the sake of slicing it. Not sure if this helps (it only happens at first/last chunk within a read) or is counterproductive due to the additional line of code.

If someone wants to benchmark these (and maybe also try with a bigger sized self.data_cache), that would be helpful!

Try:

big files, small files
files with repeating chunks (like sparse [VM] disk images)
default chunksize, small chunksize

The text was updated successfully, but these errors were encountered:

enkore · 2020-04-13T20:34:28Z

data should be bytes, so memoryview(data) does not copy, while data[...] (without the memoryview) does make a copy.

ThomasWaldmann · 2020-04-13T22:22:31Z

Seems like Python is clever enough to not copy if the copy would be the whole bytestring:

>>> b = b'foobar'
>>> m = b[0:5]
>>> m
b'fooba'
>>> m is b
False
>>> m = b[0:6]
>>> m is b
True

And that is quite often the case in that code fragment.

ThomasWaldmann added easy good first issue help wanted labels Apr 13, 2020

Jun	JUL	Aug
	24
2020	2021	2022

borgbackup / borg

FUSE micro-opt benchmarking #5110

FUSE micro-opt benchmarking #5110

ThomasWaldmann commented Apr 13, 2020

enkore commented Apr 13, 2020 •

edited

ThomasWaldmann commented Apr 13, 2020

borgbackup / borg

Sponsor borgbackup/borg

FUSE micro-opt benchmarking #5110

FUSE micro-opt benchmarking #5110

Comments

ThomasWaldmann commented Apr 13, 2020

enkore commented Apr 13, 2020 • edited

ThomasWaldmann commented Apr 13, 2020

enkore commented Apr 13, 2020 •

edited