Skip to content

Pinned Loading

  1. terminal-bench terminal-bench Public

    A benchmark for LLMs on complicated tasks in the terminal

    Python 1.8k 496

  2. harbor harbor Public

    Harbor is a framework for running agent evaluations and creating and using RL environments.

    Python 1.2k 854

  3. terminal-bench-2 terminal-bench-2 Public

    Shell 151 56

  4. terminal-bench-3 terminal-bench-3 Public

    🚧 Accepting Task Submissions 🚧

    Python 102 106

  5. terminal-bench-science terminal-bench-science Public

    Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal

    Python 49 31

  6. awesome-harbor awesome-harbor Public

    A curated list of awesome Harbor ecosystem projects

    22 2

Repositories

Showing 10 of 12 repositories
  • harbor Public

    Harbor is a framework for running agent evaluations and creating and using RL environments.

    harbor-framework/harbor’s past year of commit activity
    Python 1,214 Apache-2.0 854 82 172 Updated Apr 1, 2026
  • harbor-framework/terminal-bench-2’s past year of commit activity
    Shell 151 Apache-2.0 56 10 18 Updated Apr 1, 2026
  • benchmark-template Public template

    Harbor Benchmark Template

    harbor-framework/benchmark-template’s past year of commit activity
    Python 7 7 7 5 Updated Apr 1, 2026
  • terminal-bench-3 Public

    🚧 Accepting Task Submissions 🚧

    harbor-framework/terminal-bench-3’s past year of commit activity
    Python 102 106 0 68 Updated Apr 1, 2026
  • terminal-bench-science Public

    Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal

    harbor-framework/terminal-bench-science’s past year of commit activity
    Python 49 Apache-2.0 31 1 10 Updated Apr 1, 2026
  • harbor-docs Public
    harbor-framework/harbor-docs’s past year of commit activity
    MDX 2 9 0 6 Updated Mar 31, 2026
  • t-bench-docs Public
    harbor-framework/t-bench-docs’s past year of commit activity
    TypeScript 6 13 2 2 Updated Mar 31, 2026
  • skills Public

    Public agent skills catalog for Harbor

    harbor-framework/skills’s past year of commit activity
    3 Apache-2.0 1 0 0 Updated Mar 27, 2026
  • harbor-cookbook Public

    Realistic examples of building evals and optimizing agents with Harbor

    harbor-framework/harbor-cookbook’s past year of commit activity
    Python 33 Apache-2.0 3 0 0 Updated Mar 27, 2026
  • harbor-framework/terminal-bench-challenges’s past year of commit activity
    Shell 0 3 0 1 Updated Mar 26, 2026

Top languages

Loading…

Most used topics

Loading…