Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign up[TF 2.0 API Docs] tf.custom_gradient #26270
Comments
@tsbertalan - Thank you so much for mentioning this, and for creating a gist as an example. I agree that the API documentation for A few things:
|
Can i also work on updation of docs for tf.custom_gradient !? |
Thanks for the comments! I'd love to have improved documentation for tf.custom_gradient, and would love to review such a pull request. One minor nit though is that when you say "the sum of the gradient over examples is not "the gradient"--only the per-example gradient truly deserves that name" I do not quite agree; TensorFlow only differentiates scalar loss functions, so "the gradient" is shorthand for "the gradient of the sum / average of per-example losses" and then because the gradient of a sum is the sum of the gradients then I think the text there is mostly correct. That said I agree with you that correct is not enough and that we need to clarify these comments so others don't make the same mistake. |
My original post is below, this is only because it might help others trying to get custom_gradient to work with keras in more complicated situations... OK, might be a documentation issue, but probably I was not reading enough documentation at all. If you use custom_gradient you are only allowed to use limited python statements, mainly tensorflow operations:
I am not sure, if I have a documentation related issue, or maybe it is a bug. I created a custom layer and I need to add a custom gradient: The problem is: My layer is included in a LSTM network and therefore has changing input and output shapes. Initially it has e.g. (?,?,153) e.g. which is (1, variable size, 153) during calculation. my keras layer is defined as:
if s1 is None: is not what I want. I tried to return a tf tensor placeholder with correct dimensions, but this dimension is not changed to a correct one during evaluation of the net too. Later it does not change the shape?! Therefore I get the Error InvalidArgumentError: shape of dy was [1,324,153] instead of [1,262,153] when I do have a changed dimension in one training example. I have the full source of the file on github: I would love to get any suggestion what the problem might be, and of cause would help in case it is a bug to fix. |
Is this still open? If so, I would like to take on this. |
For me it is closed, I learned to use only limited python statements, but did not find any detailed documentation, what is allowed.... |
@tsbertalan @dynamicwebpaige Has the issue been resolved? |
@bharatnishant I think, actually, the documentation that needs this clarification first is that for the regular I'd prefer more predictable behavior, with an exception raised rather than an automatic sum in the cases where a Jacobian-like computation is not possible, but, failing that, it seems reasonable at least to have documentation of when to expect that such an automatic sum will be added silently. As with convolution, we should be able to predict a-priori the shape of the gradient tensor(s) for all the possible input shapes for @alextp I get your point about how I think that a distinction should be made (and, in some fashion, is made) between a batch index and a true index, where computation for each batch index value is conceptually independent (except for some final aggregation), whereas true indices interact in everyday linear algebraic computations. I think the point about For that matter, it would be nice to know what the real performance tradeoffs are for computing a Jacobian. Another issue As for clarification of the I also see now that this is tagged as API v 2, though I'm still using the 1.x versions. I think that Sorry for the long-winded response. TensorFlow has been in a confusing state of flux over the past few versions, and so I (obviously) have many points of confusion on this. |
I think this issue is still open. If it is, I would like to work on it and improve the documentation. |
@tsbertalan @dynamicwebpaige @alextp Since the issue is still open, can I work on it and do the required changes in documentation? |
Sure, please send a PR!
…On Fri, Jan 3, 2020 at 4:05 AM Saumil-Agarwal ***@***.***> wrote:
@tsbertalan <https://github.com/tsbertalan> @dynamicwebpaige
<https://github.com/dynamicwebpaige> @alextp <https://github.com/alextp>
Since the issue is still open, can I work on it and do the required changes
in documentation?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26270?email_source=notifications&email_token=AAABHRO2AEAY7EZOOOUAH7DQ34SZVA5CNFSM4G3ICUN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIA7TRY#issuecomment-570554823>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAABHROYGTKNGE2RREF4AGDQ34SZVANCNFSM4G3ICUNQ>
.
--
- Alex
|
bump |
is this issues still occur? |
Hello, it's been quite some time since the issue has been open and looks like there aren't any active linked PR. If this issue is still open I will begin to work on it.
Feel free to correct me if I'm wrong as my knowledge of gradients is fairly limited. Since it is also my first time working on an open-source issue I would greatly appreciate it if anyone could point me towards the guidelines for PR and making documentation changes. |
@Harsh188 I am open for this or any other type of work that needs improvement. I am just begining out my journey as an open-source cotributor and hence I would appreciate any pointers or useful tips in order to get started as a contributor. Please connect with me if you find a suitable time for the same. |
I am beginner and I would like to improve it, please help meto fix it |
Same here, I am also a beginner and would like to contribute. Is there any places where I can help? |
Hey there, |
I'm not sure if this is what you're looking for, but the first thing is to clone the repository ( |
Please make sure that this is a documentation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:doc_template
System information
I've been having some difficulty using
custom_gradient
recently. Before implementing the op I actually have in mind, I'm trying to make a simple polynomial op withcustom_gradient
in an effort to get my head around the requirements.. The documentation is not at all clear on how summing over batch and output indices should be done, and (until recently) wasn't clear on whether we should return the full Jacobian of the operation, or just the vector-Jacobian-product ("VJP") function. (I did notice some explanation of the latter issue made it in in the 1.12 -> 1.13 update, which was helpful).I think this problem applies both the derivative WRT the op's input, and WRT to the parameters (called
grad_xs
andgrad_vars
in the documentation). However, when I incorporated the VJP tidbit in my sample code, I got correct results for the former, but not the latter, I believe because the implicit sum over batch was already present in the suppliedgrad_ys
, and because the VJP also collapses out the index acrossgrad_ys
.So, the lack of clarity about sums (accidentally) only hit me when trying to write the
grad_vars
part. In my gist linked above, I had to use areduce_sum
when computing $dy/dp$, which, for a polynomial, is the corresponding powers of $x$. (Since $dy/dx$ doesn't depend on $x$, this issue didn't appear when writing thegrad_xs
part.)In addition to simply saying what you mean in this documentation (the sum of the gradient over examples is not "the gradient"--only the per-example gradient truly deserves that name! But I suppose that particular fight is a lost cause.), it would be good if there were at least some examples that exercised the
grad_vars
part and thevariables=None
keyword argument.