This repo contains core algorithm implementations of ExploraCoder and the Torchdata-Manual benchmark we constructed for evaluating ExploraCoder and its baselines' performance on multi-API invocations. Some case studies from our paper regarding different approaches solving one example task are also provided at example.md.
In the provided Torchdata-Manual, MonkBeat-Eval benchmarks, the entries represent:
- task requirement (entry 'task')
- full programming problem. including task requirement and code context (entry 'prompt')
- canonical solution for the problem (entry 'canonical_solution')
- test csaes for problem (entry 'test')