From 3d960842a7247632557e0bb05c07294e8e6b3570 Mon Sep 17 00:00:00 2001 From: Natalie Adams Date: Thu, 8 Apr 2021 20:34:34 +0000 Subject: [PATCH] --- GitLargeRepositories.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 GitLargeRepositories.md diff --git a/GitLargeRepositories.md b/GitLargeRepositories.md new file mode 100644 index 0000000..a77bf4b --- /dev/null +++ b/GitLargeRepositories.md @@ -0,0 +1,12 @@ +# Description +Limitations with large git repositories (Kernel size) + +The memory usage is not anymore a problem but we need to deal with some performance problems. See issue 103 for details. + +See issue 93 for some context information. + +With large git repositories, it is consuming a lot of memory to know at which commit a file has been created. This is used normally for each file in the tree view. To avoid completely crashing the system, indefero is only going back to the latest 5000 commits when looking at historical data to find the origin of a file. + +What should be done is that we should build over the time a database cache with the details. Maybe using only the sha1 for reference and simply storing in a compressed row the details to save space. That way the cache is built over the time and at the end we have all the data. An auxiliary script could be used to regenerate this cache. + +Note, **a large repository like the Linux kernel will require about 35MB**. This is due to the ls-tree git command which basically is listing the complete git tree and we just need a part of it to display the current folder. One way to avoid that would be to pipe the output to grep, but that would require the availability of grep. I we consider that git is anyway a posix tool, we can consider that to be ok.