The Wayback Machine - https://web.archive.org/web/20210724223222/https://github.com/borgbackup/borg/issues/5110
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FUSE micro-opt benchmarking #5110

Open
ThomasWaldmann opened this issue Apr 13, 2020 · 2 comments
Open

FUSE micro-opt benchmarking #5110

ThomasWaldmann opened this issue Apr 13, 2020 · 2 comments

Comments

@ThomasWaldmann
Copy link
Member

@ThomasWaldmann ThomasWaldmann commented Apr 13, 2020

If somebody has some time for FUSE benchmarking:

diff --git a/src/borg/fuse.py b/src/borg/fuse.py
index 429790e4..27ab1c1a 100644
--- a/src/borg/fuse.py
+++ b/src/borg/fuse.py
@@ -644,12 +644,13 @@ def read(self, fh, offset, size):
                 data = self.data_cache[id]
                 if offset + n == len(data):
                     # evict fully read chunk from cache
-                    del self.data_cache[id]
+                    pass # del self.data_cache[id]
             else:
                 data = self.key.decrypt(id, self.repository_uncached.get(id))
-                if offset + n < len(data):
+                if True: # offset + n < len(data):
                     # chunk was only partially read, cache it
                     self.data_cache[id] = data
+            #data = memoryview(data)
             parts.append(data[offset:offset + n])
             offset = 0
             size -= n

The first 2 changes remove selective caching only of partially read chunks and cache removal of fully read chunks. While this sounds obvious when thinking about sequential reads, it maybe is counterproductive for repeating chunks (like all-zero chunks).

The 3rd change tries to avoid creating a copy of data just for the sake of slicing it. Not sure if this helps (it only happens at first/last chunk within a read) or is counterproductive due to the additional line of code.

If someone wants to benchmark these (and maybe also try with a bigger sized self.data_cache), that would be helpful!

Try:

  • big files, small files
  • files with repeating chunks (like sparse [VM] disk images)
  • default chunksize, small chunksize
@enkore
Copy link
Contributor

@enkore enkore commented Apr 13, 2020

data should be bytes, so memoryview(data) does not copy, while data[...] (without the memoryview) does make a copy.

@ThomasWaldmann
Copy link
Member Author

@ThomasWaldmann ThomasWaldmann commented Apr 13, 2020

Seems like Python is clever enough to not copy if the copy would be the whole bytestring:

>>> b = b'foobar'
>>> m = b[0:5]
>>> m
b'fooba'
>>> m is b
False
>>> m = b[0:6]
>>> m is b
True

And that is quite often the case in that code fragment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants