The Wayback Machine - https://web.archive.org/web/20201110083136/https://github.com/github/codeql-cli-binaries/issues/42
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single predicate causing bottleneck in queries #42

Open
sad-dev opened this issue Aug 17, 2020 · 4 comments
Open

Single predicate causing bottleneck in queries #42

sad-dev opened this issue Aug 17, 2020 · 4 comments

Comments

@sad-dev
Copy link

@sad-dev sad-dev commented Aug 17, 2020

codeql query run --threads=0 still results in only one core being utilized, which severely affects query speed. I typically see this in
my log file:

Creating executor with (many) threads.
...
(0s) Starting to evaluate predicate ...

With 100% usage on 1 core.

Is Visual Studio more performant for this?

@github-actions github-actions bot added the CLI label Aug 17, 2020
@sad-dev
Copy link
Author

@sad-dev sad-dev commented Aug 17, 2020

I tried it in Visual Studio Code and things got worse - it still uses only one thread, and it does not seem to benefit from a disk cache unlike the CLI.

@hmakholm
Copy link
Contributor

@hmakholm hmakholm commented Sep 16, 2020

This is unfortunately often the case, especially when running a single query. The granularity of parallelism in the QL evaluator is a single predicate (or a single group of mutually recursive predicate). So if there is only one predicate ready to run (and everything else in the query depends directly or indirectly on the results of that), then only one thread will be working on it.

We do have an ambition of finding some parallelization opportunities within predicate evaluations, but making that work within the overall structure of the QL evaluator is a hard problem, so there's no timeline for when this will bear fruit.

@sad-dev
Copy link
Author

@sad-dev sad-dev commented Sep 28, 2020

Hi Makholm,

Thank you for your reply. To the best of my understanding, a query eventually decomposes into predicates of the form table(...). Just parallelizing those might lead to significant improvements and appears to be the easiest part to parallelize for (as compared to the complex logic needed for recursive predicates, fixed points and so on).

@adityasharad
Copy link
Contributor

@adityasharad adityasharad commented Sep 30, 2020

Hi @sad-dev. The QL evaluator will already evaluate the predicates that make up a query in parallel where possible, when the number of threads is configured as you have done. However, as @hmakholm describes above, it is only possible to evaluate two (or more) predicates in parallel when there are no dependencies between them.

In some cases when evaluating a query, there is a single predicate that is required by all the remaining predicates, so the evaluator has to finish evaluating that single predicate first, before it can parallelise any remaining work. This usually explains the bottleneck you observe. I expect that when evaluation of that first predicate completes, more of the remaining work will be done in parallel.

Could you share which query you are running, and the name of the single predicate you observe? This can be seen in the Starting to evaluate predicate log message from the CLI or the Running query progress message in VS Code. With that information, we can explain in a little more detail. In particular, that will help us suggest whether that predicate is a bottleneck dependency for the rest of the query, or whether there is actually room for us to do more in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.