Single predicate causing bottleneck in queries #42

sad-dev · 2020-08-17T08:14:01Z

codeql query run --threads=0 still results in only one core being utilized, which severely affects query speed. I typically see this in
my log file:

Creating executor with (many) threads.
...
(0s) Starting to evaluate predicate ...

With 100% usage on 1 core.

Is Visual Studio more performant for this?

sad-dev · 2020-08-17T09:02:44Z

I tried it in Visual Studio Code and things got worse - it still uses only one thread, and it does not seem to benefit from a disk cache unlike the CLI.

hmakholm · 2020-09-16T15:17:52Z

This is unfortunately often the case, especially when running a single query. The granularity of parallelism in the QL evaluator is a single predicate (or a single group of mutually recursive predicate). So if there is only one predicate ready to run (and everything else in the query depends directly or indirectly on the results of that), then only one thread will be working on it.

We do have an ambition of finding some parallelization opportunities within predicate evaluations, but making that work within the overall structure of the QL evaluator is a hard problem, so there's no timeline for when this will bear fruit.

sad-dev · 2020-09-28T06:31:22Z

Hi Makholm,

Thank you for your reply. To the best of my understanding, a query eventually decomposes into predicates of the form table(...). Just parallelizing those might lead to significant improvements and appears to be the easiest part to parallelize for (as compared to the complex logic needed for recursive predicates, fixed points and so on).

adityasharad · 2020-09-30T21:05:16Z

Hi @sad-dev. The QL evaluator will already evaluate the predicates that make up a query in parallel where possible, when the number of threads is configured as you have done. However, as @hmakholm describes above, it is only possible to evaluate two (or more) predicates in parallel when there are no dependencies between them.

In some cases when evaluating a query, there is a single predicate that is required by all the remaining predicates, so the evaluator has to finish evaluating that single predicate first, before it can parallelise any remaining work. This usually explains the bottleneck you observe. I expect that when evaluation of that first predicate completes, more of the remaining work will be done in parallel.

Could you share which query you are running, and the name of the single predicate you observe? This can be seen in the Starting to evaluate predicate log message from the CLI or the Running query progress message in VS Code. With that information, we can explain in a little more detail. In particular, that will help us suggest whether that predicate is a bottleneck dependency for the rest of the query, or whether there is actually room for us to do more in parallel.

github-actions bot added the CLI label Aug 17, 2020

hmakholm added Complexity: High enhancement labels Sep 18, 2020

Oct	NOV	Dec
	10
2019	2020	2021

github / codeql-cli-binaries

Single predicate causing bottleneck in queries #42

Single predicate causing bottleneck in queries #42

sad-dev commented Aug 17, 2020

sad-dev commented Aug 17, 2020

hmakholm commented Sep 16, 2020

sad-dev commented Sep 28, 2020 •

edited

adityasharad commented Sep 30, 2020

github / codeql-cli-binaries

Join GitHub today

Single predicate causing bottleneck in queries #42

Single predicate causing bottleneck in queries #42

Comments

sad-dev commented Aug 17, 2020

sad-dev commented Aug 17, 2020

hmakholm commented Sep 16, 2020

sad-dev commented Sep 28, 2020 • edited

adityasharad commented Sep 30, 2020

Essential cookies

Always active

Analytics cookies

sad-dev commented Sep 28, 2020 •

edited