Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upperf(ngcc): add support for parallel execution of tasks #32427
Conversation
This comment has been minimized.
This comment has been minimized.
mary-poppins
commented
Aug 30, 2019
|
You can preview 42b152c at https://pr32427-42b152c.ngbuilds.io/. |
This comment has been minimized.
This comment has been minimized.
hiepxanh
commented
Sep 1, 2019
|
do it bro, we need ngc be faster and multi thread. every second you saving even |
|
Nothing here that I think should block the merging of this PR. |
packages/compiler-cli/ngcc/test/dependencies/dependency_resolver_spec.ts
Show resolved
Hide resolved
...es/compiler-cli/ngcc/test/execution/cluster/package_json_updater_spec.ts
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
|
Rebased on master and addressed feedback (either by adding fixup commits or by commenting). I, also, created #32486 to capture ideas for future improvements/refactorings (as discussed here and on Slack). |
This comment has been minimized.
This comment has been minimized.
mary-poppins
commented
Sep 5, 2019
|
You can preview 0db6569 at https://pr32427-0db6569.ngbuilds.io/. |
|
reviewed the integration tests part and skimmed through the rest. one nit, otherwise lgtm |
This comment has been minimized.
This comment has been minimized.
|
thanks George for implementing this feature! |
This comment has been minimized.
This comment has been minimized.
|
For reference: |
This comment has been minimized.
This comment has been minimized.
mary-poppins
commented
Sep 5, 2019
|
You can preview ecb794c at https://pr32427-ecb794c.ngbuilds.io/. |
This comment has been minimized.
This comment has been minimized.
mary-poppins
commented
Sep 5, 2019
|
You can preview 077a637 at https://pr32427-077a637.ngbuilds.io/. |
…the overwritten one (#32427) In order to prevent `ngcc`'d packages (e.g. libraries) from getting accidentally published, `ngcc` overwrites the `prepublishOnly` npm script to log a warning and exit with an error. In case we want to restore the original script (e.g. "undo" `ngcc` processing), we keep a backup of the original `prepublishOnly` script. Previously, running `ngcc` a second time (e.g. for a different format) would create a backup of the overwritten `prepublishOnly` script (if there was originally no `prepublishOnly` script). As a result, if we ever tried to "undo" `ngcc` processing and restore the original `prepublishOnly` script, the error-throwing script would be restored instead. This commit fixes it by ensuring that we only back up a `prepublishOnly` script, iff it is not the one we created ourselves (i.e. the error-throwing one). PR Close #32427
…rface (#32427) To persist some of its state, `ngcc` needs to update `package.json` files (both in memory and on disk). This refactoring abstracts these operations behind the `PackageJsonUpdater` interface, making it easier to orchestrate them from different contexts (e.g. when running tasks in parallel on multiple processes). Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd PR Close #32427
…face (#32427) This change does not alter the current behavior, but makes it easier to introduce new types of `Executors` , for example to do the required work in parallel (on multiple processes). Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd PR Close #32427
Previously, `ngcc`'s programmatic API would run and complete synchronously. This was necessary for specific usecases (such as how the `@angular/cli` invokes `ngcc` as part of the TypeScript module resolution process), but not for others (e.g. running `ivy-ngcc` as a `postinstall` script). This commit adds a new option (`async`) that enables turning on asynchronous execution. I.e. it signals that the caller is OK with the function call to complete asynchronously, which allows `ngcc` to potentially run in a more efficient mode. Currently, there is no difference in the way tasks are executed in sync vs async mode, but this change sets the ground for adding new execution options (that require asynchronous operation), such as processing tasks in parallel on multiple processes. NOTE: When using the programmatic API, the default value for `async` is `false`, thus retaining backwards compatibility. When running `ngcc` from the command line (i.e. via the `ivy-ngcc` script), it runs in async mode (to be able to take advantage of future optimizations), but that is transparent to the caller. PR Close #32427
…ty processability (#32427) In the past, a task's processability didn't use to be known in advance. It was possible that a task would be created and added to the queue during the analysis phase and then later (during the compilation phase) it would be found out that the task (i.e. the associated format property) was not processable. As a result, certain checks had to be delayed, until a task's processing had started or even until all tasks had been processed. Examples of checks that had to be delayed are: - Whether a task can be skipped due to `compileAllFormats: false`. - Whether there were entry-points for which no format at all was successfully processed. It turns out that (as made clear by the refactoring in 9537b2f), once a task starts being processed it is expected to either complete successfully (with the associated format being processed) or throw an error (in which case the process will exit). In other words, a task's processability is known in advance. This commit takes advantage of this fact by moving certain checks earlier in the process (e.g. in the analysis phase instead of the compilation phase), which in turn allows avoiding some unnecessary work. More specifically: - When `compileAllFormats` is `false`, tasks are created _only_ for the first suitable format property for each entry-point, since the rest of the tasks would have been skipped during the compilation phase anyway. This has the following advantages: 1. It avoids the slight overhead of generating extraneous tasks and then starting to process them (before realizing they should be skipped). 2. In a potential future parallel execution mode, unnecessary tasks might start being processed at the same time as the first (useful) task, even if their output would be later discarded, wasting resources. Alternatively, extra logic would have to be added to prevent this from happening. The change in this commit avoids these issues. - When an entry-point is not processable, an error will be thrown upfront without having to wait for other tasks to be processed before failing. PR Close #32427
… types (#32427) Previously, `ngcc` needed to store some metadata related to the processing of each entry-point. This metadata was stored in a `Map`, in the form of `EntryPointProcessingMetadata` and passed around as needed. After some recent refactorings, it turns out that this metadata (with its only remaining property, `hasProcessedTypings`) was no longer used, because the relevant information was extracted from other sources (such as the `processDts` property on `Task`s). This commit cleans up the code by removing the unused code and types. PR Close #32427
This change does not alter the current behavior, but makes it easier to introduce `TaskQueue`s implementing different task selection algorithms, for example to support executing multiple tasks in parallel (while respecting interdependencies between them). Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd PR Close #32427
…32427) This commit adds a new `TaskQueue` implementation that supports executing multiple tasks in parallel (while respecting interdependencies between them). This new implementation is currently not used, thus the behavior of `ngcc` is not affected by this change. The parallel `TaskQueue` will be used in a subsequent commit that will introduce parallel task execution. PR Close #32427
`ngcc` supports both synchronous and asynchronous execution. The default mode when using `ngcc` programmatically (which is how `@angular/cli` is using it) is synchronous. When running `ngcc` from the command line (i.e. via the `ivy-ngcc` script), it runs in async mode. Previously, the work would be executed in the same way in both modes. This commit improves the performance of `ngcc` in async mode by processing tasks in parallel on multiple processes. It uses the Node.js built-in [`cluster` module](https://nodejs.org/api/cluster.html) to launch a cluster of Node.js processes and take advantage of multi-core systems. Preliminary comparisons indicate a 1.8x to 2.6x speed improvement when processing the angular.io app (apparently depending on the OS, number of available cores, system load, etc.). Further investigation is needed to better understand these numbers and identify potential areas of improvement. Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd Original design doc: https://hackmd.io/uYG9CJrFQZ-6FtKqpnYJAA?view Jira issue: [FW-1460](https://angular-team.atlassian.net/browse/FW-1460) PR Close #32427
…32427) This commit addresses the review feedback from angular#32052, which was merged before addressing the feedback there. PR Close angular#32427
…the overwritten one (angular#32427) In order to prevent `ngcc`'d packages (e.g. libraries) from getting accidentally published, `ngcc` overwrites the `prepublishOnly` npm script to log a warning and exit with an error. In case we want to restore the original script (e.g. "undo" `ngcc` processing), we keep a backup of the original `prepublishOnly` script. Previously, running `ngcc` a second time (e.g. for a different format) would create a backup of the overwritten `prepublishOnly` script (if there was originally no `prepublishOnly` script). As a result, if we ever tried to "undo" `ngcc` processing and restore the original `prepublishOnly` script, the error-throwing script would be restored instead. This commit fixes it by ensuring that we only back up a `prepublishOnly` script, iff it is not the one we created ourselves (i.e. the error-throwing one). PR Close angular#32427
…rface (angular#32427) To persist some of its state, `ngcc` needs to update `package.json` files (both in memory and on disk). This refactoring abstracts these operations behind the `PackageJsonUpdater` interface, making it easier to orchestrate them from different contexts (e.g. when running tasks in parallel on multiple processes). Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd PR Close angular#32427
…face (angular#32427) This change does not alter the current behavior, but makes it easier to introduce new types of `Executors` , for example to do the required work in parallel (on multiple processes). Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd PR Close angular#32427
Previously, `ngcc`'s programmatic API would run and complete synchronously. This was necessary for specific usecases (such as how the `@angular/cli` invokes `ngcc` as part of the TypeScript module resolution process), but not for others (e.g. running `ivy-ngcc` as a `postinstall` script). This commit adds a new option (`async`) that enables turning on asynchronous execution. I.e. it signals that the caller is OK with the function call to complete asynchronously, which allows `ngcc` to potentially run in a more efficient mode. Currently, there is no difference in the way tasks are executed in sync vs async mode, but this change sets the ground for adding new execution options (that require asynchronous operation), such as processing tasks in parallel on multiple processes. NOTE: When using the programmatic API, the default value for `async` is `false`, thus retaining backwards compatibility. When running `ngcc` from the command line (i.e. via the `ivy-ngcc` script), it runs in async mode (to be able to take advantage of future optimizations), but that is transparent to the caller. PR Close angular#32427
…ty processability (angular#32427) In the past, a task's processability didn't use to be known in advance. It was possible that a task would be created and added to the queue during the analysis phase and then later (during the compilation phase) it would be found out that the task (i.e. the associated format property) was not processable. As a result, certain checks had to be delayed, until a task's processing had started or even until all tasks had been processed. Examples of checks that had to be delayed are: - Whether a task can be skipped due to `compileAllFormats: false`. - Whether there were entry-points for which no format at all was successfully processed. It turns out that (as made clear by the refactoring in 9537b2f), once a task starts being processed it is expected to either complete successfully (with the associated format being processed) or throw an error (in which case the process will exit). In other words, a task's processability is known in advance. This commit takes advantage of this fact by moving certain checks earlier in the process (e.g. in the analysis phase instead of the compilation phase), which in turn allows avoiding some unnecessary work. More specifically: - When `compileAllFormats` is `false`, tasks are created _only_ for the first suitable format property for each entry-point, since the rest of the tasks would have been skipped during the compilation phase anyway. This has the following advantages: 1. It avoids the slight overhead of generating extraneous tasks and then starting to process them (before realizing they should be skipped). 2. In a potential future parallel execution mode, unnecessary tasks might start being processed at the same time as the first (useful) task, even if their output would be later discarded, wasting resources. Alternatively, extra logic would have to be added to prevent this from happening. The change in this commit avoids these issues. - When an entry-point is not processable, an error will be thrown upfront without having to wait for other tasks to be processed before failing. PR Close angular#32427
… types (angular#32427) Previously, `ngcc` needed to store some metadata related to the processing of each entry-point. This metadata was stored in a `Map`, in the form of `EntryPointProcessingMetadata` and passed around as needed. After some recent refactorings, it turns out that this metadata (with its only remaining property, `hasProcessedTypings`) was no longer used, because the relevant information was extracted from other sources (such as the `processDts` property on `Task`s). This commit cleans up the code by removing the unused code and types. PR Close angular#32427
…32427) This change does not alter the current behavior, but makes it easier to introduce `TaskQueue`s implementing different task selection algorithms, for example to support executing multiple tasks in parallel (while respecting interdependencies between them). Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd PR Close angular#32427
…ngular#32427) This commit adds a new `TaskQueue` implementation that supports executing multiple tasks in parallel (while respecting interdependencies between them). This new implementation is currently not used, thus the behavior of `ngcc` is not affected by this change. The parallel `TaskQueue` will be used in a subsequent commit that will introduce parallel task execution. PR Close angular#32427
`ngcc` supports both synchronous and asynchronous execution. The default mode when using `ngcc` programmatically (which is how `@angular/cli` is using it) is synchronous. When running `ngcc` from the command line (i.e. via the `ivy-ngcc` script), it runs in async mode. Previously, the work would be executed in the same way in both modes. This commit improves the performance of `ngcc` in async mode by processing tasks in parallel on multiple processes. It uses the Node.js built-in [`cluster` module](https://nodejs.org/api/cluster.html) to launch a cluster of Node.js processes and take advantage of multi-core systems. Preliminary comparisons indicate a 1.8x to 2.6x speed improvement when processing the angular.io app (apparently depending on the OS, number of available cores, system load, etc.). Further investigation is needed to better understand these numbers and identify potential areas of improvement. Inspired by/Based on @alxhub's prototype: alxhub/angular@cb631bd Original design doc: https://hackmd.io/uYG9CJrFQZ-6FtKqpnYJAA?view Jira issue: [FW-1460](https://angular-team.atlassian.net/browse/FW-1460) PR Close angular#32427
…tions (angular#32427) This gives an overview of how much time is spent in each operation/phase and makes it easy to do rough comparisons of how different configurations or changes affect performance. PR Close angular#32427
…ls (angular#32427) PR Close angular#32427
This comment has been minimized.
This comment has been minimized.
angular-automatic-lock-bot
bot
commented
Oct 10, 2019
|
This issue has been automatically locked due to inactivity. Read more about our automatic conversation locking policy. This action has been performed automatically by a bot. |


gkalpak commentedAug 30, 2019
•
edited
This is a continuation of the work started in #32052 to speed up
ngccby adding support for executing tasks in parallel (when async execution is acceptable).This PR includes numerous refactorings to clean up/simplify the code and prepare the introduction of parallel task execution. It also includes a tiny fix:
only back up the original `prepublishOnly` script and not the overwritten oneSee the individual commit messages for more details.
Notes on testing
I have added unit tests for all added/modified code except for the
ClusterMasterclass, because:The integration tests at ngcc/test/integration/ngcc_spec.ts, which are run under bazel, don't play well with spawning multiple processes, so they are only run on one process (which means they don't exercise the
ClusterExecutor).The integration tests at integration/ngcc/test.sh do run in parallel mode (and only in that mode, since that is the only option with
ivy-ngcc).Other than that, I have manually verified that
ngcccan successfully build angular.io in parallel mode (with various command-line argument combinations) and that the resultingnode_modules/are identical to those produced in sync mode with the following exceptions:package.json > __processed_by_ivy_ngcc__might have properties in different order.This is expected, since now the order is which tests are run/completed is different and non-deterministic.
data_r50vsdata_r450.Not sure why that is and whether it is problematic.
Notes on performance
For processing angular.io, I have observed speed improvement between 1.8x (on my Windows laptop - 7 workers) to 2.6x (on Linux CI - 8 workers). This was a little surprizing, since task processing (which runs on multiple processes) takes up 80%-90% and all workers are busy most of the time (i.e. there are rarely idle workers).
More investigation is needed to better understand/improve those numbers, but this can happen after the PR lands.
Fun fact:
There doesn't seem to be much improvement above 4 workers. Even with up to 20 workers on CI, the duration was the same (if not worse: 29s vs 23s).
TODO
Determine whether we need unit tests forClusterMaster.1: The differences are in "shared context vars" generated via BindingScope#generateSharedContextVar() by the TemplateDefinitionBuilder instantiated in compileComponentFromMetadata(). Since these are local to the template, we don't mind having duplicate names in different entry-points or formats.