Skip to content

Renamer updates#1710

Merged
garyb merged 1 commit intomasterfrom
renamer-updates
Dec 15, 2015
Merged

Renamer updates#1710
garyb merged 1 commit intomasterfrom
renamer-updates

Conversation

@garyb
Copy link
Copy Markdown
Member

@garyb garyb commented Dec 13, 2015

Resolves #1697

I also took the opportunity to tweak two other aspects of codegen:

  • instead of generating x$prime for x' it will now generate a name like x1 (or x2 if x1 is in scope, or x3 if x1 and x2 are in scope, etc.)
  • the generated class dictionary names are improved, so instead of __dict_Functor_0 it will generate dictFunctor now, and then start numbering if there are multiple in scope

I understand if you object to the x' change, as it's slightly rewriting the input, but I think it's not a very difficult rule to understand, and generates much nicer names IMO.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

I was thinking about the prime thing the other day actually. I'd be in favor of just getting rid of primes altogether actually. JS doesn't allow them, and now we have the new operator rules, there's not much reason for not using JS identifiers IMO (perhaps reserving $ for internal use).

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

This looks great to me.

Let's just chat a bit about the prime thing before merging, since now would be the time to break it.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 13, 2015

Well, I'd argue very strongly against that personally. I understand the "better javascript" angle, but I'd also like PS to be a language that I actually like, and I much prefer priming to numbering variables - you can use them as semantically different things, a1 and a2 are different things entirely although perhaps in the same category of thing, where a and a' are representing different "versions" of a.

Also there are still lots of other things we'd have to deal with, given all the reserved and built-in names.

I forgot to mention, this prime-renaming rule only applies to local variables. If you name something foo' in a module it will still be exported as such.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

What about a -> a_? I consider that to be like priming, although I'd be wary of attaching too much meaning to a naming scheme anyway.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

Actually, come to think of it, why not just rename ticks to underscores? It's deterministic, and unlikely to clash. And on the off chance it does, we can give the user a nice error message.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

Or for that matter, what about ' becoming just a $. It can't clash, at least not if we always generate $s at the front of an identifier.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 13, 2015

I was trying to improve the readability of the JS as well, which is why eliminated the underscores (and numbers where possible) from dictionary names. Both a___ and a$$$ look pretty bad compared to a3 IMO, but yeah, I think that would work too if you want to do it?

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

I definitely see a lot of value in numbering identifiers and I agree underscores and dollars look bad in generated code.

I'd just like to see a$prime removed from generated code, ideally. I'm not sure any of my suggestions above are good solutions. Choosing to use JS identifiers is like the path of least resistance, but maybe not ideal if people really like using primes. I personally don't find myself wanting them that much.

How would you name the equivalent of a' in idiomatic JavaScript?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for using v?

Didn't you want to try the lettering scheme instead here? a, b, c, ..., aa, ab, ... etc.?

I'm not too bothered either way, this is obviously a big improvement.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that, yeah, I couldn't figure out an elegant way to generate identifers like that though. If you have one I'll definitely use that instead!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

toIdent :: Int -> ShowS
toIdent = showIntAtBase 26 (chr . (ord 'a' +))

perhaps?

It's not quite right, but not terribly far off.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 13, 2015

I would probably be updating a var, so wouldn't "need" them, as I don't tend to write code SSA-style in JS. Numbering would be my second choice, hence my go-to here.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 13, 2015

Damn, tests failed as I removed some constraints. I thought base-compat was supposed to allow that?

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

Yeah, I ran into that issue the other day too. I'm not sure, I think it might be a transformers thing?

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

If we just use JS identifiers, we could use a unicode prime instead.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 13, 2015

Sorry, this one works.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 14, 2015

Any thoughts on unicode?

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

I'm slightly hesitant, as I have low expectations of what JS engines will accept, but is way nicer than the alternatives we've discussed so far, so I like it (also it should work, as it is in the standard). It does mean we're definitely producing ES5 JS if we go with it though, so we could update a few other things too, however this is un-shimmable (although I guess it could be post-processed to remove them).

I'd suggest we rewrite ' to ʹ however, and not expect our users to have to reach for a unicode reference every time they want to prime a name 😉

@garyb garyb force-pushed the renamer-updates branch 2 times, most recently from 1dc5c7f to 4b26130 Compare December 15, 2015 12:18
@tfausak
Copy link
Copy Markdown
Contributor

tfausak commented Dec 15, 2015

I'd be in favor of just getting rid of primes altogether actually.

I'd be in favor of that too. I think that primes are too hard to notice compared to numbers or underscores. And they quickly get out of control if you have more than one.

@5outh
Copy link
Copy Markdown
Contributor

5outh commented Dec 15, 2015

👍 on removing primes from me as well; even in Haskell code nowadays I use _ over ' because it's easier to spot the difference with more than one.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

So this is obviously a bit of a point of contention, but the only reason I want to figure out what we're doing with it, is that we're at a good point in time to break things. I think if we can aim to use JS identifiers in 0.9 (no mangling at all), that would be a good thing. And if we break this in 0.8 now, we only have to finish off operator aliases to make that be the case.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Not mangling is something we don't have the option of doing, unless we want to be subject to the arbitrary banning of various other words due to existing keywords and globally available values in JS. I really don't want that.

@jdegoes
Copy link
Copy Markdown

jdegoes commented Dec 15, 2015

I must have prime. But I'm fine with unicode prime. Would be cool if ordinary prime were rewritten to unicode prime to escape it.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Not mangling is something we don't have the option of doing

So what I'm proposing is:

  • Every PS identifier is a valid JS identifier
  • $ is not a valid character in a PS identifier, but everything else is fine.
  • We use the $xyz... "namespace" for generated names
  • Reserved keywords which are fine as PS identifiers simply get a $ prepended to them.

I'm fine with making an exception for the ' character, which can be turned into a unicode prime.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Okidoke, I'm totally cool with that - I think the only thing we need to make that happen is start requiring aliases, so they can disappear entirely during codegen? (and a decision on primes)

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Yes I think so. So here are some options for primes:

  • Get rid completely (unpopular, but an option)
  • Turn them into _s (ES3)
  • Turn them into unicode primes (ES5)

Any others?

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Oh wait! I assume this keeps the current starts-with-lowercase-or-underscore rule for PS identifiers too?

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Ah right, yes.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

The ES5 unicode primes solution would be my preference - we've been talking about ES5ifying some of our other codegen too, like for traversing the keys of records when copying, etc.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Let's discuss "get rid completely" though. Given that converting primes to unicode primes can only be described as a hack, then if the issue is the difficulty of typing a unicode prime, isn't the solution really to set up the editor in such a way as to make the conversion automatic? It would seem that unicode operators will make editor support necessary eventually anyway.

@michaelficarra
Copy link
Copy Markdown
Contributor

So I'm guessing we won't ever be allowing unicode primes in PS identifiers then to avoid the collision?

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

@michaelficarra I'm interpreting this as "they'll be equivalent".

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Yeah, that's the way I'd see it too - a' and would be treated as the same identifier.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

I mean, any editor worth it's salt should be able to bind CTRL+' to a unicode prime.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

You could say the same about removing \... -> and requiring λ... -> instead...

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Speaking of which, it'd be nice if that worked 😄

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Well yes, but there's not really a good reason for that IMO. Closer compatibility with JS identifiers and cleaner codegen is pretty worthwhile I think.

Edit: I mean, I see the value of unicode syntax, but it can be implemented with sufficiently nice tooling since it exists only before compilation.

@tfausak
Copy link
Copy Markdown
Contributor

tfausak commented Dec 15, 2015

a' and would be treated as the same identifier

I don't like that idea at all. A few reasons:

  1. If you want to find where a value is used, now you have to search for two values (a' and ) instead of only one.
  2. I think this sets a bad precedent. Which other characters or strings will be considered equivalent? For instance, can I write a'' as a″?
  3. I typically only write ASCII in my source files. Obviously this is just a preference, but it makes everything easier because you don't really have to worry about character encodings.
  4. I hate aliases. Pick one way to do something and do it that way.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Prime as a symbol seems common enough that that automatic conversion seems worthwhile to me.

My only other argument is things kinda freak out when you use it at the moment when messages are printed:

> whoopsʹ = true

No type declaration was provided for the top-level declaration of whoopspsc.exe: <stderr>: hPutChar: invalid argument (invalid character)

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

@tfausak no, a'' would be aʹʹ. The conversion would just be a 1:1 character remapping.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

I'm really on the fence. I don't use primes much, but I see their appeal. On the other hand, anything to do with automatic replacement of characters seems unpleasant.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

There are more primed identifiers used in the compiler than there were lines in the initial commit now... I'm sure they can't all be me 😄

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Ok 😄 I'll say this - I don't use them at the top level much, and for intermediate variables, there are better choices when compiling to JS IMO.

Given we warn on all shadowed names, there needs to be some way of saying "like variable X, but not variable X" in where clauses etc., so in that sense, I see why primes are useful. I just don't think they're worth a hack like character replacement in code gen, so there must be a better approach.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Well, there's always a P suffix too, but the use of that in types has already confused quite a few people.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

Oh, that's what P stands for! 😆

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Haha, oh dear 😄

That's probably only obvious if you compare the types in Haskell lens and PureScript lens. In fact, maybe nobody else at all gets it other than Hardy and I, as I think I've probably had to explain the P to anyone I've ever talked to about it.

I guess my concern with the unicode approach is: it will almost certainly be used at the top level at some point, and when it happens it will confuse people that trying to call or foo' or foo’ or foo‘ or foo՚... results in an unknown value. Maybe that isn't our problem, but... it seems less than ideal also. If prime wasn't common, I wouldn't be concerned.

Also if we sort this rule out it would be really nice to enable its use in types too to avoid the P thing 😉

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

The fact there are many things it can be mistaken for is part of the problem too: that's a can of worms unicode opens for almost any character, but given that it's a common symbol it would perhaps be unexpected that when it appears in code it's not the most easy-to-hand variant of it.

@natefaubion
Copy link
Copy Markdown
Contributor

I vote 👎 on unicode prime because I think its confusing and is barely distinguishable from a lot of other quote-like characters. I think the current approach is fine.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

So unicode prime is already possible, since any unicode letter is fine. I'd just like to decide what to do with regular primes, since they currently get codegen'd as $prime, which I don't think is very nice for either generated code or the FFI.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

There shouldn't be any issue with adding primes to the letters allowed in type names, I think, since they don't exist in generated code. Data constructor and class names are different, of course.

@paf31
Copy link
Copy Markdown
Contributor

paf31 commented Dec 15, 2015

In any case, I think we should merge this, and open a "Discussion" issue to discuss options while we're testing the 0.8 release candidate.

@garyb
Copy link
Copy Markdown
Member Author

garyb commented Dec 15, 2015

Ok, cool

garyb added a commit that referenced this pull request Dec 15, 2015
@garyb garyb merged commit f5ed125 into master Dec 15, 2015
@garyb garyb deleted the renamer-updates branch December 15, 2015 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants