The Wayback Machine - https://web.archive.org/web/20200611225439/https://github.com/libgit2/libgit2/pull/5507
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add libgit2client and a command-line interface #5507

Open
wants to merge 67 commits into
base: master
from
Open

Conversation

@ethomson
Copy link
Member

ethomson commented May 10, 2020

This pull request expands the scope of libgit2 from just being the (relatively) low-level library for dealing with repositories to add two new pieces of functionality:

  • libgit2client: a "middleware" library that is focused on providing the functionality that client applications would find useful: for example, a subtransport that uses the ssh command-line, and a filter implementation that runs the commands specified in the attributes.

    Things like LFS and executing ssh have been well outside the scope of libgit2 since its implementation, however we've been forcing every tool that wants to build a client on top of us to deal with these problems themselves. We should provide this functionality for them.

    I propose putting this in a separate library since not everybody finds it valuable and - frankly - if I were running a git hosting provider, I would prefer not to have fork and exec in the codebase at all if I could avoid it.

  • git2_cli: a command-line interface that emulates git itself for testing. This allows us to test our client library, to eat our own dogfood, and - potentially - to start trying to use git's command-line tests against our own CLI to validate compatibility.

  • util: the shared bits of code (buffer manipulation, etc) that all the layers will want to use. This is not provided to end-users, just used by the two shared libraries and the CLI application.

@ethomson ethomson force-pushed the ethomson/cli branch 7 times, most recently from 5677c44 to fb41349 May 10, 2020
@pks-t
Copy link
Member

pks-t commented May 11, 2020

You finally got around to implementing your dreams 🎉 Quite a lot of changes, so I won't get to it today. I'll first prioritize some of the other PRs that we currently got lingering, but hope to get to it in thenot-too-distant-future.

@ethomson ethomson force-pushed the ethomson/cli branch from fb41349 to 9adba7a May 12, 2020
@ethomson
Copy link
Member Author

ethomson commented May 13, 2020

You finally got around to implementing your dreams 🎉 Quite a lot of changes, so I won't get to it today. I'll first prioritize some of the other PRs that we currently got lingering, but hope to get to it in thenot-too-distant-future.

Indeed, this is a lot of changes. I think that it might make sense to get some high-level discussion out of the way before actually reviewing the code. Here's what I'd propose as a reviewing strategy:

  1. Let's decide on whether this is a useful concept to add to the project. Do we want to take the project in this direction? I think that the answer is yes but it's a valuable place to start the discussion.

    It's obviously not impossible to do this in a new project, outside of libgit2. However, I think that it's both technically easier, easier for the developers, and more valuable to the users of the project.

    As far as the technical side, I did evaluate some other options. For example, having a new project that was the client and/or CLI pieces that simply used libgit2 instead of it being inside this project. But realistically any C-based CLI will have to do a lot of reimplementation of boilerplate utility code (vectors, string buffers) that is tedious, pointless and errorprone and it's just much easier to use the stuff that we've already written.

    I then thought about using libgit2 as a submodule of another project, and then pulling in just the utility code, but realistically, I think that requires a firmer split between utility code and business logic and a little untangling to be usable. This is a viable strategy, but it requires libgit2 being strict about that separation, and

    So I decided that actually having this as part of libgit2 makes the most sense.

  2. Does this separation of concerns make sense? In other words, does it make sense to have three separate components:

    • libgit2: the low-level git repository handling
    • libgit2client: the generic pieces for client libraries to re-use
    • git2_cli: an implementation using libgit2client

    And a shared utility library (implemented as a cmake object library) for those three pieces. Obviously we could put the libgit2client pieces in libgit2 itself which is easier in many regards, but I actually like having it separated out so that the people who don't need it won't have it.

  3. Based on the decisions in #2, let's decide on whether the split in the source and tests directories makes sense. Again, I think that it does, but there are arguments to be made about how that split occurs and where.

  4. After that, there are a handful of refactorings that drive us to a place where we can support this new project structure. I think many of them are useful changes in their own right, and could be reviewed completely independently of this direction. But I think that point 3 really informs these changes.

  5. Finally, the actually net-new pieces of code.

I think that for actually reviewing the code itself that I should cleave off pieces in reviewable chunks. I'll keep this PR open with the mega branch (mostly for visibility, so that anybody can play with the actual bits itself) but I don't think that we should actually be reviewing this PR as a whole for getting merged.

I'll cleave off the first piece soon-ish? There's some refactoring yet to do to get Windows working.

@ethomson ethomson force-pushed the ethomson/cli branch 5 times, most recently from 0ea2e10 to f10691b May 15, 2020
@libszen
Copy link

libszen commented May 26, 2020

A reference client library and CLI would indeed be invaluable! Thank you very much for championing this P.R.

A request: can the CLI be licensed with a more permissive license (public domain or Apache V2, ...), while retaining the current license for the client library? It will allow copy-pasting of CLI code without licensing issues by interested parties. This would also be consistent with existing examples being released in the public domain.

@ethomson
Copy link
Member Author

ethomson commented May 27, 2020

A request: can the CLI be licensed with a more permissive license (public domain or Apache V2, ...), while retaining the current license for the client library?

Huh. I hadn't considered this. It may be that the libgit2 license is not necessarily appropriate for a CLI, but IANAL and I'd want to talk to one before I made any changes here. But I'd like to better understand your motivations around copy/pasting code.

I'd like to have the business logic of dealing with client things in the libgit2client library. The CLI, IMO, would just be a way to parse the command line, and invoke pieces of libgit2 or libgit2client itself. How could we improve these two pieces to allow you to call into them instead of copy/pasting code out of the CLI project?

@ethomson ethomson force-pushed the ethomson/cli branch 5 times, most recently from 30cf040 to de774ce May 28, 2020
@ethomson ethomson force-pushed the ethomson/cli branch 3 times, most recently from 9815952 to c7c1ab6 May 30, 2020
@pks-t
Copy link
Member

pks-t commented Jun 1, 2020

This pull request expands the scope of libgit2 from just being the (relatively) low-level library for dealing with repositories to add two new pieces of functionality:

  • libgit2client: a "middleware" library that is focused on providing the functionality that client applications would find useful: for example, a subtransport that uses the ssh command-line, and a filter implementation that runs the commands specified in the attributes.
    Things like LFS and executing ssh have been well outside the scope of libgit2 since its implementation, however we've been forcing every tool that wants to build a client on top of us to deal with these problems themselves. We should provide this functionality for them.
    I propose putting this in a separate library since not everybody finds it valuable and - frankly - if I were running a git hosting provider, I would prefer not to have fork and exec in the codebase at all if I could avoid it.

I think having a higher-level API would be worthwhile to have and is one thing some people had been asking for in the past. I'm not sure I'm a huge fan of having it as a separate library, though, but personally I'd vote for having a build option for this instead.

The most important benefit would be that we can avoid having a separate "util" library and thus don't require things like the git_userbuf split. Also, by having a "util" library, I fear that some implementation details may leak from the libgit2 library into the CLI code. Having them as strictly separated as possible would be a huge win to me personally, as we start dogfeeding our own API and thus see the good, bad and ugly parts of it with more clarity.

One question I have about the middleware is how its interface should look like. I see that right now, it simply accepts an argv array and does the parsing internally. I wonder whether we want to split concerns here and have each command accept an options sturcture, where the struct's members strictly resemble the API options. The parsing logic would then reside in git2_cli instead of inside the middleware. The benefit would be that potential users of the middleware wouldn't need to guess (or read documentation, which is usually out of date anyway) as to which options are supported and which aren't, but can just take a look at the options struct itself.

  • git2_cli: a command-line interface that emulates git itself for testing. This allows us to test our client library, to eat our own dogfood, and - potentially - to start trying to use git's command-line tests against our own CLI to validate compatibility.

Yeah, this is something we've repeatedly discussed in the past and something I'm happy to have. I'd prefer to avoid the underscore and call it git2cli, but that's mostly cosmetics. I wonder whether we want to drop most of the examples in that case, as they mostly do the same thing and implement a subset of supported options of Git. Instead, we'd just keep "real" examples that don't have any use in and of themselves.

  • util: the shared bits of code (buffer manipulation, etc) that all the layers will want to use. This is not provided to end-users, just used by the two shared libraries and the CLI application.

As said above I'm not much of a huge fan of having a shared internal libary, mostly due to my desire to keep a strict boundary between libgit2 the library and libgit2 the command line interface, and sharing some internal utilities blurs the lines. Most processing that's done in the CLI should be trivial command line parsing and execution of commands as the heavy-lifting is performed by the middleware anyway. And as I'm of the opinion that the middleware should live inside of the libgit2 library, there wouldn't be any need for the util library.

@ethomson
Copy link
Member Author

ethomson commented Jun 1, 2020

I think having a higher-level API would be worthwhile to have and is one thing some people had been asking for in the past. I'm not sure I'm a huge fan of having it as a separate library, though, but personally I'd vote for having a build option for this instead.

I had not considered this route... it feels weird to me that you could have two very different versions of the library, one that has a bunch of functions that are more "client" facing functions, and ones that do not. Would these two different varieties of the library have different sonames? If not, that doesn't feel right, since you'd expect ABI compatibility. Or would we have stubs that just throw? That's sort of meh as well.

Also, by having a "util" library, I fear that some implementation details may leak from the libgit2 library into the CLI code. Having them as strictly separated as possible would be a huge win to me personally, as we start dogfeeding our own API and thus see the good, bad and ugly parts of it with more clarity.

I agree. The CLI should not be like libgit2_clar, where it actually has some insights into private functions. However, we would never want to reimplement git_buf or git_vector in two different places. Once is, surely, enough. And I think that even the most minimal CLI would want simple string and array handling.

But, putting that aside for a moment, it sounds like you're describing something a bit different than the direction that I've pursued: a high level API call that matches a git porcelain command. eg, git_client_checkout(). (I pick this example purposefully.)

But is this really useful? git checkout is a mess. It switches branches, creates branches, puts files on disk from the HEAD, or the index, or from a different branch altogether. I think that there's only one client that would want an API that emulated this monstrosity, and it's this CLI.

I concede, though, that this PR doesn't give people a lot to help in building a client off of. And, of course, I could be wrong. And the data structure that we parse command-line options into could just be the input to an API. And if it's useful for people, great, and if it's not, that's fine, too. There's no obligation for them to use it, but it does give people who want a simple mechanism to "run the CLI" that option.

It might be useful to know what the folks at TortoiseGit (@csware), GitKraken (@implausible) would want in a client API.

ethomson added 29 commits May 28, 2020
The `git_buf` type is now no longer a publicly available structure, and
the `git_buf` family of functions are no longer exported.

The deprecation layer adds a typedef for `git_buf` (as `git_userbuf`)
and macros that define `git_buf` functions as `git_userbuf` functions.
This provides API (but not ABI) compatibility with libgit2 1.0's buffer
functionality.

Within libgit2 itself, we take care to avoid including those deprecated
typedefs and macros, since we want to continue using the `git_buf` type
and functions unmodified.  Therefore, a `GIT_DEPRECATE_BUF` guard now
wraps the buffer deprecation layer.  libgit2 will define that.
`git_strarray` is a public-facing type.  Chagne
`git_buf_text_common_prefix` to not use it, and just take an array of
strings instead.
Our options parsing system can also be used as the basis for displaying
command-line usage.  Add usage information, using knowledge of the
console (if we're attached to one) for wrapping nicely.
Set up a framework for subcommands, and introduce the first, "help".

Help will display the commands available, and information about the
help command itself.

Commands are expected to provide their own usage and help information,
which the help command will proxy to when necessary.
Provide a helper function to copy a number of strings from the source to
the target.
SSH paths come in a variety of formats, either URLs (ssh://user@host/path)
or SCP style (user@host:path).  Provide a mechanism to parse them.
Provide a class that will display progress information to the console.
Initially, it contains callbacks for fetch progress and checkout
progress.
As we consume parts of the libgit2 utility functions (like `git_buf`),
we will inevitably need to allocate.  Since we re-use the libgit2
allocation functions - but linked into our application - we'll need
to configure our allocation strategy ahead of time.
Add a new source directory, `util`, that contains utility functions like
buffers, vectors, etc, that that are general purpose and not necessarily
part of libgit2 itself.  These utility functions can be used by
additional projects.
Introduce libgit2client, a client "middleware" library.  This is an
experimental set of utility functions and classes for client software
that builds on top of libgit2.

This library might contain - for example - code that invokes filters or
other tools.  This is incredibly useful to share and reuse among
consumers but should be excluded from libgit2 itself.  Users may,
understandably, not want code that executes arbitrary other commands in
libgit2 itself.
 Introduce a command-line interface for libgit2.  The goal is to be
 git-compatible, so that:

 1. By creating a git client ourselves, we can understand the needs of
    git clients and produce a common "middleware" for commonly-used
    pieces of client functionality.  For example: interacting with
    other command-line tools, like filter drivers or merge drivers.
    This can assist other git clients.

 2. We can benefit from git's unit tests, running their test suite
    against our own CLI to ensure correct behavior.

 3. We can easily benchmark ourselves against git to understand where we
    are poorly performing, by running identical commands between git and
    ourselves.

 4. We can easily A/B test ourselves against git, at least for read-only
    operations, which will ensure that we are producing identical output.

 This commit introduces a simple infrastructure for the CLI.
The functions exported by libgit2 should stay in the libgit2 directory.
By putting them in `util`, they'll be further exported by any other tool
that uses the util library.
The test tree should - ideally - match the source tree.  The libgit2
specific tests should move into tests/libgit2.

The name of the resulting test binary is now 'libgit2_tests' for ease of
consumption of new contributors.
Other subsystems (like libgit2client) likely want to add testing.
Move clar into its own directory so that it can be reused and not
duplicated.
Put the common clar test function into the tests/CMakeLists.txt for
reusability.
Provide a mechanism to add a signal handler for Unix or Win32.
Provide functions to search through string arrays.
Add a function to add a search/replaced string to a git_buf.
Introduce a helper method to quote a string in a shellsafe manner.  This
wraps the entire buffer in single quotes, escaping single-quotes and
exclamation points.
Provide a mechanism for the client library to report a unique
error class.  This will not be used by libgit2 directly.
@ethomson ethomson force-pushed the ethomson/cli branch from c7c1ab6 to 05c336b Jun 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.