-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What if canonical source representation was binary encoded like wasm itself #2
Comments
Hey, just saw this! Thanks for taking the time to make this issue. First of all, I really like this idea, and it aligns very well with what I'm going for. In my brain, the canonical syntax would be something that the project can choose. You could create a binary syntax, and then use it as your canonical representation. One reason why I wouldn't necessarily want to always require a binary representation as canon is that I want the canon source to be the only actual code files checked into the repository, with the user's preferred syntax available through a FUSE virtual directory. The intention here is to support stuff like Is that a problem you've considered for your use case? Storing packages in IPFS is an amazing idea that I hadn't considered. I was probably just going to piggyback off npm in the short term, as they have really great tools, but longer term if we look into hosting infrastructure for a package manager I love that idea. I'm interested to see what you're working on as well, could you send me a link? |
Thanks for responding @Widdershin, I'll respond inline below:
Of course! Forest seems well aligned with what I'd like to get myself so starting a discussion seemed like a right think to do, if there is enough overlap in goals we could possibly share our efforts.
👍
Here is my argument:
I don't think use of binary representation makes in any more difficult here in fact it makes it easier as alternative syntax would not need to parse canonical source and then translate it, it would just read binary AST representation and project it in alternative syntax, in other words would avoid parse phase and by consequence becomes free of haskell dependency or whatever language parser might be rewritten in.
I agree that having human readable syntax and files is important for tools like git, which is really unfortunate and it seems like it would be really difficult to move past human readable files without reinventing a toolchain. On the other hand I suspect that users would either choose to use canonical presentation so that existing tools in form of git would work as expected or they would just keep syntax they want to use in source control system so they can see diffs in the syntax they understand. In other words it's a 🐔 🥚 problem, you can't expect syntax be truly a user choice unless tools they use fundamentally support this. Sadly git does not do that and there is no way around that. I think what would make sense is to develop a Sure that would not work across all the tools but that kind of supports "no second class syntax" goal, meaning that tools that don't work would not work across all the tools and fixing support for some tool would fix it across all the syntax flavors.
I'm kind of hoping to move from "human readable files" to a "content addressable code" and I'm not entirely sure how
Right now it's just a set of notes describing my wish list for this and some ideas how those can be realized. I could possibly compile those into some document describing this hypothetical language that I'd like to build. Here is also somewhat abstract vision that touches on some of the topics above: I envision something like Mathematica Notebook interface in browser where you describe problem and solution in markdown format and have code blocks in a syntax which fits best problem domain (I have this early experimenting in terms of this interface https://gozala.github.io/allusion/). That document is essentially your package and code blocks are saved in that binary AST representation so you could switch syntax live in place. You could reference other such documents / packages as you would refer to other papers except they are content-addressed and there for reader could navigate to those as well. I would imagine there to be a canonical package registry along with other domain specific ones, as registry is essentially just a set of content-addresses (I imagine it would be stored in ipfs network). Textual syntax representation of the code is going to be just one of the projections that I expect to become less relevant over time. I am interested in having a visual representations in the vein of Flow based programming that I think would be more relevant explosion of new mediums tablets, VR, AR where keyboard input is inconvenient. Most recently I discovered http://www.luna-lang.org/ which is more or less how I imagined it, there is also https://noflojs.org/ but I think it's far less interesting given the lack of type system which is essential IMO. I think wasm as a compile target is a natural choice. I want inferred static types, Automatic and deterministic memory management but via Rust like ownership system rather than RC which can be build on in fact that's what RC is in Rust. I hope build a higher level layer with Immutable data structures on top so you could have something like Elm language where you don't deal with memory management since you only work with immutable data and no way of creating cycles. I am leaning towards type system found in [Carp][https://github.com/carp-lang/Carp/blob/master/docs/Presentation.md) that is fully inferred and where choice is ambiguous for compiler it just asks to be more specific and specify which of the compatible entities you meant (I think that presents some really compelling opportunities in visual editor). I am also somewhat inspired by pony language that marries ADTs and Actor model, if you throw ownership system & content-addressing of the program constructs in to the mix would allow treating any function as an actor but is far from being fleshed out in my mind. Sadly I lack expertise in many areas to pull this off, most pressing one is lack of one in type theory, which is what I'm mostly trying to learn about now. I did some experiments in generating wasm modules by adapting scheme compiler base on https://github.com/namin/inc where there I created JS API to build up an AST that compiler would generate wasm moudle with binaryen idea was that that AST could be then encoded into binary format with either flatbuffers or protocol-buffers library (which should make creating a syntax easy as both libraries have pretty wide language support and that would essentially eliminate need for writing a binary representation parser / encoder as both libraries just generate one from schema definition). But as I was exploring this I realized that type checker would significantly affect generated wasm code so I'm trying to learn enough type theory to be able to write a type checker (any references would be more then welcome). I also would very much like to team up with someone more experienced in this 😅 Given some overlap in goals with forest I thought I'd see if we could converge if nothing else I could probably get some informative feedback. Thanks & sorry it end up quite long |
Sorry, I think I was not clear. There will be no blessed canonical syntax for Forest as a language. No one syntax will be held above others. I think that a As much as possible, syntax is a userland concern. The syntax currently in development will eventually be a package, as will every other syntax. I'm currently designing with the assumption that the user will be using source control similar to git, SVN or Mercurial. This is because I use source control for all of my work, personally and professionally. From this assumption, I conclude that we should have exactly one representation of the source code on disk, in whatever syntax the collaborators choose. If you want, you can choose to use a minimal binary representation of the source code. I would be open to storing packages in a bytecode syntax with name mappings, but when it comes to source code, I think that we're not yet ready to move away from human readable files in source control. I agree that there is value in moving fully away from textual representation in source control, but I think this would hamper adoption. I think that a progress enhancement strategy is more pragmatic. If people start using Forest in part because it doesn't require a huge workflow change, but they see that there is power in representations other than text, then that's a huge win. Once we're there, we can think about abandoning text on disk altogether.
Yep, totally agree that we should build these tools, along with web versions for code snippets and pull requests.
This is a feature I have been planning on but have yet to document. I want libraries to support translation into different languages, so storing names separately makes sense. I like the idea of using code hashes as the name in the representation, I hadn't considered that. This feature ties in strongly with some plans I have for the type system, but I will write that up another time.
I'm not sure I understand. How will the representation be reprinted without first parsing into a common data structure? Would the printer take the bytecode? For the record, my current plan is to eventually reimplement the Forest compiler and syntaxes in Forest, so they can more easily be used in a web browser.
I'm not sure that I could say I have any experience in this field. This is the first traditional compiler I've written, and I'm also quite new to Haskell. However, I'd still love to collaborate, even if it's just feedback and bouncing ideas off one another for the moment.
No worries, your enthusiasm is infectious! Thanks for taking the time to respond 😄 |
I have recently discovered http://unisonweb.org/ which might be interesting as they use Abstract Binding Trees for language representation. And there is a lot of other overlaps with the goals of the forest. |
I just discovered this project and this great conversation in particular and wanted to let you know that I enjoyed reading it. There's a lot of ideas in it that I'd like to see come to life! There's a community of people interested in projects and ideas like this at https://futureofcoding.org/community and if you like, you can join and discuss with us. I'm looking forward to see where this is going, thanks for sharing! |
Hi,
I just came across forst-lang as I was exploring some of my own programming language ideas. Idea of decoupling syntax choice from from language choice especially resonates with me, at the end of the day it's not more important that choice of an editor or pallet for syntax highlighting. In fact John McCarthy envisioned Lisp as such and expected different representation could have being used based on domain of the problem program would be aimed to solve. Sadly that vision did not materialized.
Another angle at which I have being thinking about this is of a language where modules could directly be saved / hosted in content addressable distributed network like https://ipfs.io/ from that angle it would make so much sense for language representation to be agnostic of spacing or definition order to make content addressing free of human factors like sense of esthetics. For example unfortunately exactly same
JSON
data could end up having different content address depending on how it was formatted or sorted.In my own exploration I end up concluding that choosing binary representation over textual would organically make syntax a user choice and alternative presentation of it could be created. I thought I throw this out here especially since concept of forst-lang seems to align closely with my what I'd like from the language.
The text was updated successfully, but these errors were encountered: