[ art / civ / cult / cyb / diy / drg / feels / layer / lit / λ / q / r / sci / sec / tech / w / zzz ] archive provided by lainchan.jp

lainchan archive - /λ/ - 9620

File: 1442750574870.png (1.3 MB, 200x300, 1433792574974.png)


I failed to see a topic on this already, so I thought I would start one on programming language design and implementation. It seems the Compiler discussion atrophied so I thought I would add implementation into the mix.

== Resources ==

> LLVM Language Implementation Tutorial


> Implementing Programming Languages


> CPython internals: A ten-hour codewalk through the Python interpreter source code


> Compilers: Principles, Techniques, and Tools (2nd ed)


> Essentials of Programming Languages (3rd ed)


> Modern Compiler Implementation in C


> Compiler Optimization and Code Generation


> Garbage Collection Algorithms


More to come


File: 1442754002361.png (11.14 KB, 200x119, repl.png)

>CPython internals

Structure and Interpretation of Computer Programs

Programming Languages: Application and Interpretation

Object-Oriented Programming Languages: Application and Interpretation

The Lambda Calculus for Absolute Dummies


Types and Programming Languages

Good Ideas, Through the Looking Glass

Virtual Machine Showdown: Stack Versus Registers

The Implementation of Functional Programming Languages

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management


File: 1442780312243.png (1.97 MB, 200x150, monkey.gif)

Not really related to topic:
How much of a moron must one be to build a LL(*) parser using C++ OOP?

Joking aside, the basic premise is the following: While walking the parse tree the interpreter changes behavior by switching its strategy for handling the input based on past output.

It works as expected but this whole construct feels like kicking dead whales.
Would it be better to just scrap that idea and stick to parser generation tools?


Tools like bison/flex are nice and all mainly due to the fact that you don't need to focus on the dispatch tables and such, seeing as they implement the lexing and parsing using one big state-machine. But for the most part when you use a tool such as that you are locked into the workflow of the output of that tool in the case of bison/flex it is rather grimy.

On the other hand you could use a tool like re2c where you have a bit more control as it only generates the minimal code needed for lexing and everything else is up to you.

I've used bison/flex for a basic interpreter but the bootstrapping needed to get it to do anything was much more of a pain in the ass that hand-writing one. Although the code would arguably run faster.

Also on the implementation side, using pure C++ OOP (even though C++ is not a pure OOP language) parser written by hand seems kinda fuzzy.

I would recommend rather than using a massive switch-case or if-else ladder try with computed gotos for two reasons.

1) Computed gotos are optimized by the compiler better than a massive switch-case lookup table

2) Even though "GOTOS ARE EVIL ZOMG" when used properly you can get good control flow out of it if you know what you are doing.

I would experiment with a parser generator on your problem but if you do want to milk it for efficiency then a hand-written one is much more tuneable than a canned / runtime solution, even if it seems more painful than it should.


File: 1442825489081.png (1000.44 KB, 200x146, hexshit.gif)


Thanks for the pointers.
The interpreter I think is solid enough to keep since it works great (as in good enough), it all comes down to a bunch of interchangeable FSMs.

Lexing+parsing though could use a good amound of work since its implementation is dirty as fucĸ.

re2c reminds me of Ragel, I think I'll use either of those for lexical analysis.


No worries!

on the note of re2c, people have been swapping that and lemon for the bison/flex pair,so its something to look into if you still want to use a canned alternative, although re2c is less canned than flex.


As an exercise, I would like to try to implement a "programming" language (I think more exactly a DSL) that should give the possibility to implement UIs in the most clean and efficient way.
The language should not compile to machine code or something. Instead, the compilation process should involve specifying a language and toolkit and the compiler would output files defining the UI in the given language.
The "programming" part should stay very minimalist.

I would like to implement this from scratch. Any recommendations except the dragon book ?


Modern Compilers in C (Linked Above) is a good start.

As for the "compilation" process, it seems like it's mostly a translation as opposed to compilation, so one would think that you have a template in the target language of sorts that you can just pop in values and then write out the resulting code.


>Tools like bison/flex are nice and all mainly due to the fact that you don't need to focus on the dispatch tables and such, seeing as they implement the lexing and parsing using one big state-machine.
It's still insane to think about this.

Every language I use has an incredibly simple syntax and whatnot.


File: 1442886835205.png (3.73 MB, 200x21, 1438973201142.jpg)

I've written a few "programming languages" a year or two ago, mostly languages that resembled lisp if it was written in reverse polish rotation since it was easy to parse. Played around with writing my own lexer / parser in C, and then made 2 different version of the language, one compiled into bytecode and run by an interpreter and the other processed into x86 ASM and fed to an assembler / linker.

That was a while ago and I haven't written anything or programmed in close to a year so I remember close to nothing and I'm so I'm rusty as fuarrrk.

Going to try and rewrite a lexer/parser in C and then write a lisp-like interpreted language as a practice, getting back into programming project.

Does anyone have a good resource for the specifications of an incredibly simple, barebones lisp that I could use as the basis of my language to build off of? I programmed in lisp 3 years ago but I don't remember much.


I'm not a lisper so I can't say but some searching found http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/

It seems to be more of a how-to than a spec though.

It does however link to this paper that seems to be more of a spec: https://drive.google.com/file/d/0B0ZnV_0C-Q7IOTRkNzVjZjMtMWE1NC00YzQ3LTgzMWEtM2UwY2I1YzdmNmM5/view?ddrp=1&pli=1&hl=en#


Yeah, it's crazy to think how intricate that soykaf is.

I found a paper from '68 that was a graduate thesis on implementing an efficient parser in n^3 or best case n^2.

I would think that ideally you can get it down to at least n, but hey it's complicated stuff.



File: 1442908548591.png (52.28 KB, 200x179, complexity.png)

This inspired me to put my current project under a quick test. Could surely be a tad better.

Interesting stuff by the way, the part on prediction I find curious.


I thought that some of you might be interested in this:

It's a type theory reading group that is forming. If you are interested, you can signup here:


I need to get back to learning more about compilers and interpreters. I created a little languages inspired by the C family of languages using LLVM. I have been spending a lot of time learning other programming languages, I can't believe just how much I have missed by not learning about these languages. So right now I am thinking about getting back to creating programming languages and to try to work on open source compilers and interpreters. My next project will probably be a scheme interpreter based on the R7RS standard.


File: 1442982755778.png (582.65 KB, 200x113, J3G0KkJ.gif)

That's good, working on projects is a good way to learn, I wish you the best.

I, myself, am working on developing a language, mainly for fun but also to try to be better than nim (seeing as I have an acquaintance who works on the project and it's some `friendly` competition).

So far I've only decided on the name (Atrophy) and some basic language features such as static typing and reflection. I still need to work out a sane syntax and some other language features. Still a fair deal of work do be done.


Curious, what did you use to make this plot? My first thought was R or GNUPlot.


I guess Y is time and X is size, but why don't you label them?



Gnumeric actually.


> Gnumeric actually.
Hrmm, that would make sense, more so than my ideas I suppose.


I don't understand you. Compilation is just a specific translation.


Yeah, but compilation also implies explicit optimization such as loop unrolling and such, at least that falls under my definition of a compiler.

At that point it's just a formality of terminology.


File: 1443032471399.png (18 KB, 200x200, i320.jpeg)

Thanks again on pointing in the right direction.

Just built a simple arithmetic parser, working great so far.

It's also the first time in years I used a stack to evaluate the resulting RPN of an expression, it's pretty swell.


Don't sweat it!

Stacks do indeed work, I'm more of a fan of trees, but each has its place.

Stacks would be more ideal for contiguous blocks of memory than trees though so faster access times.



Well to be honest I parse the expression into RPN using the shunting yard algorithm,
then create an abstract syntax tree by reverse iterating through it which then in turn gets evaluated later using a stack by traversing it postfix, which gets me exactly where I was before actually come to think of it.

Seems awful and contrived, especially since the code uses a dynamic upcast while visiting the nodes, but this is giving me more options I presume.

I just love multi-dispatches too much to not do such shít.
But hey, I can print trees infix and add new operations and leaf-types later, that must count for something.


Here's an excellent paper that takes you through the process of writing a simple Scheme compiler:


Do you know anything good on compiling logic languages, like Prolog?


File: 1443111134486.png (50.58 KB, 200x193, feels.jpg)

Have a look at mercury's papers section.



>if (no_gf == true)
>feels ++

kimochi warui


File: 1443114078022.png (151.67 KB, 193x200, 1437265677595.jpg)

Some really cool papers, thanks!


File: 1443127141315.png (950.04 KB, 200x194, 1434461938315.gif)

This is fuarrrk ing awesome! But the link at the end is dead. Do you know where can I find the extended tutorial and the test cases?


File: 1443279486065.png (150.7 KB, 200x177, 1442993983634.png)

Those papers look really quite interesting, thanks for sharing.


Hi. I developed an interpreter for a simple language, one or two years ago. Example:
succ: x+1
pred: x-1

test: [
number(0).succ == 1,
number(0).pred == -1

The first line is short for
number(x): this

It behaves like the following JS code:
function() {
var number = function(x) {
var succ = function() { return x+1; };
var pred = function() { return x-1; };
var this_ = {"succ": succ, "pred": pred};
return this_;

var test = function() {
var this_ = {};
return [number(0).succ == 1, number(0).pred == -1];

var this_ = {"number": number, "test": test};
return this_;

The language has no way to express side-effects yet, so I imagine it to be embedded in a bigger language.

timeout(sec, cond): stateful(State(null), cond).timeouted
impulse(i): State(start and i ? start : now)
timeouted: now - start >= sec

The idea here ^ is that cond is a data-source of true and false values. And `timeout(5, data)` will be `true` three seconds after the last `false` message on the channel.


What was the idea behind the syntax? It looks like python mixed with JSON.

Also, what was it implemented in? Did you roll your own parser/lexer or did you use a parser generator? So many questions so little information.


It was implemented in python. I made the parser using code from http://effbot.org/zone/simple-top-down-parsing.htm

It was the result of an attempt to have a compact DSL to describe game rules for FPS. From my notes:
obj ctf(game):   // obj is a macro, forgot what it was used for
world: game.world // shorthand
flags: map(flaglogic, game.teams) // ensures there's one flag per team

overlay player: // makes player.can_carry visible within 'cfg'
can_carry: alive // player.can_carry is the same as player.alive

obj flaglogic(team):
// helper function hold(f, g) will hold a value x from g as long as f(x) is true
// the VM will ensure that g is reevaluated only when f(x) makes a downwards transition
// $.attr is a syntax for λx: x.attr
// first(list, key) = sorted(list, key)[0]
carrier: hold($.can_carry, first(eligble_carriers, key=distance))
home: world.lookup((:ctf, "home", team.tag))
// wasn't sure whether to allow python list-comprehensions or use functions like 'filter'
eligible_carriers: filter($.can_carry, colliders)
provide transform:
? ( carrier.attachment("flag") or carrier.attachment("over head") )
: ( expired
? home.location
// for simplephysics the VM was supposed to take the derivative of the
// transform and continue moving the object along trajectory
: simplephysics(collisions) )

expired: timeout(10, not carrier)


That link looks really good, I can't wait to have a play with the code in that page. I've been doing a parser tutorial in Haskell but its hard to keep track of the concepts and learn a new language at the same time


File: 1443981849383.png (2.58 MB, 200x113, Ru77fq7.gif)

Anyone happen to have a ACM SIGPLAN subscription? There are some papers there that would be nice to add to the resources list, but money.


Wouldn't sci-hub work?


I had no idea about that, I was looking into https://scirate.com/ as well.


What papers are you looking for? I can help hunting them down.


File: 1444223591679.png (556.25 KB, 200x82, 2Pe6lht.gif)

It was just some things on LALR(1) and LL(*) Parsers and the like, I found some good ones, thanks anyway.


> http://compcert.inria.fr/doc/index.html
> CompCert is a compiler that generates PowerPC, ARM and x86 assembly code from CompCert C, a large subset of the C programming language. The particularity of this compiler is that it is written mostly within the specification language of the Coq proof assistant, and its correctness --- the fact that the generated assembly code is semantically equivalent to its source program --- was entirely proved within the Coq proof assistant.


File: 1446343508231.png (50.83 KB, 200x151, 12003216_1683435058555344_3045465365182809952_n.jpg)

CompCert has had some undefined behaviour and compiler bugs as of late if I recall. But the fact it was written in Coq astounds me.


Damn awesome

I really want to get into these proving languages but I can't wrap my head around their concepts. (Looking into Microsoft's LEAN right now)

For reverse engineering for example, it would be great to prove that the reconstructed source code does the same as the original

I can imagine jobs in that area to be very well paid.

Also, >Coq Present Day, Present Time! AHAHAHAHAHA!


>>11323 (cont.) so it seems the ebin meme face gets word-filtered. well that's fine with me.


File: 1446470449119.png (276.2 KB, 200x185, coq-tan.png)


Lean is cool. Its syntax is very similar to Coq's and it's even built on the same logic.

>For reverse engineering for example, it would be great to prove that the reconstructed source code does the same as the original

I'm not sure this would be the right job for manual verification. You tend to run into a lot of repetitive and tedious proofs when proving these things from the bottom up. It's a different matter when you already have high-level models of both sides and want to prove specific equivalences.

What you want is to prove functional correspondence of two programs. Depending on the level of abstraction, what you want is something like symbolic execution or model checking, which is a mostly automated (depending on how you obtain your model) method for proving correctness of programs. There are many tools for this, too.

One of them is SAW (http://saw.galois.com/index.html), which uses SMT and symbolic execution to verify properties of C and Java programs. The guys from Galois, Inc. had a talk recently about verification of cryptographic functions using SMT; the slides are on GitHub, for those interested: https://github.com/GaloisInc/sat2015-crypto

>I can imagine jobs in that area to be very well paid.

I can confirm. They're also not the easiest and you must know tons of theory to be efficient and innovative, though.

Here's another one: http://ilyasergey.net/pnp/
I have tons of links to other resources, but I'd be too embarrassed to publicize them in their current form.


BTW, I was considering writing a short article (perhaps even series) about some topic from this area, but I'm not sure how useful it would be given the amount of good resources that have already been written.


That would be really cool. Maybe you could get it into the lainzine?


Oh of course, I meant to write "a short article for the zine", but it got lost somewhere between editing.


Considering that reconstructing C source is already tedious and has to done function by function (even with help of a decompiler) adding proofs seems like a linear overhead.
I don't know the exact distinction between all these terms (model checking, etc.) but I think you can always convert there to a proof later.

I imagine that one could develop tools to automate this work: Mark a few correspondences in source code and binary flow graph, or source variables and registers, and let some algorithm figure out the rest.

>One of them is SAW (http://saw.galois.com/index.html), which uses SMT and symbolic execution to verify properties of C and Java programs.

I wonder how well it copes with memory accesses (cryptographic function are mostly pure).


What's a good language to write a compiler for if I want to learn?


I wrote one in python and that was quite satisfying. Be ready to spend a few weeks though if you're targeting some assembly language.

Basically use any powerful language that you are really good in.


Not what to write it in, but what to compile?


why is it SO HARD?

I want to do this the most but it's so fuarrrking hard. I feel like giving up sometimes but I want to succeed at making my own language and compiler.


making a compiler and making a kernel are probably the two hardest projects there are. If you're a beginner, do something else.


I politely disagree.


If you are a beginner to programming or to C, a gitbook known as Build Your Own Lisp may be of interest to you.


mfw no haskell-chan :(


File: 1449867846391.png (174.9 KB, 200x200, just sayaka.png)


Works for me thanks! ^^


There's a few in different styles.


File: 1451171383735.png (1.17 MB, 200x200, wink.gif)

This is a great article for your langanons:


I found some interesting presentations here: http://pauillac.inria.fr/~fpottier/slides/

The language Mezzo is mentioned on a few of these, but before you get too deep into that, know that one of the co-creators (http://jonathan.protzenko.fr/) has moved on to F*.

Btw, update on >>11323 I now understand LEAN much better.
>>11345 tell me more about what you do

And now, something completely different, I think there should be type-checks at loading time. Not something built on top of the usual string-matching symbol resolution. Not like C++ which will give you a "symbol not found" if you don't know the exact signature. Makes me wonder what typesystem is expressive enough to cover all languages, and still be fast enough (though that can be cached). Something unification based? Something based on those dependently typed languages?

>>12997 One of the consequences of Midori's design is exactly that actually. I always wanted the project to succeed, but no. Glad to see these articles.


>tell me more about what you do
Not much currently. I'm studying but I'd like to get a career in formal verification or something.


do elaborate, then.


I find static verification of a program to be boring.
A program that can be statically verified is good for automating a task, but not much else.
Code that is difficult or even impossible to statically verify, by modifying itself as it runs in numerous or infinite ways or through other means is very interesting.
I am considering the skeleton of a language designed with this goal in mind, along with some other goals I have for it.
One of the consequences of this is the difficulty or impossibility of compilation.
I don't particularly believe that compiler introspection has merit. Individuals should write code that will be efficient regardless.
I am not saying that the primitive constructs or runtime of a language should not be optimized, such as, say, garbage collection, but this is very different from the optimization of arbitrary programs.


I can't speak about kernels, but you can write a simple language completely from the bottom up – lexer, parser, compiler, interpreter – even as a beginner. It won't do much and it might be ugly, but it's definitely doable. It's even easier if you use parser generator and a simple target language.


there are lots of possible shortcuts for making a compiler on modern systems, sure. flex and bison cut down on lots of boilerplate, and compiling to C cuts out machine-specific implementations of nasty register juggling. i've put together a "full language" like that in just a few thousand lines of c.

if you don't take any of the shortcuts, though, and work to actually optimise your output, it turns nightmarish pretty quickly>>13059


>flex and bison cut down on lots of boilerplate
why would you write the code to parse and scan your language when you can simply write down a specification?
That part isn't even the most interesting part of a compiler imho, optimizing the code generated and adding new features to the language is much more rewarding


Why would you do a thing that the smartest people ever alive have dedicated their lives to and countless dumber ones since done over a thousand times?


is that retort supposed to prove it's an unworthy excercise?


Sometimes the generated code is not good enough. I agree otherwise.


flex and bison/(b)yacc have nothing to do with the final generated code.
Or am I missing something in your post?
yes it is. Just study Thompson's algorithm and LL(1), LALR(1) and any other type of parser for grammars you enjoy.


The code generated by the parser generator.


flex just gets you tokens and the parser just tells you what they mean.
They have absolutely nothing to do with how good the final code will be.
They produce exactly what construct you tell them to produce, nothing more nothing less


I'm not talking about the code generated by the compiler, I'm talking about the parser generated by the parser generator.


File: 1451581083181.png (178.24 KB, 200x139, 1442732209372.png)


I'm with you on this, there are cases where the code produced by the parser generator is not up to the task in terms of speed or efficiency and a hand-written parser and lexer would be better suited to the task. I've had cases where the code produced by bison/flex was leaky as all hell and slow as sin, in this case it was a side-effect of the grammar but the point still stands, there is no "magic key" to producing a lexer/parser.


>in this case it was a side-effect of the grammar but the point still stands
no it doesn't, flex and byson only tell you there is a keyword here or a number or a string here, and they're in a declaration, or this is an if-block, nothing more.
If they generate poor code it's because you haven't minimized your grammar. Which is your fault, not the tool.
All the machine specific and regular optimizations are the ones that get you a performance increase. Not your "muh super hand written reinvented parser/scanner"


they can; gcc switched to writing their own at some point because it could be made faster and more flexible.


If your grammar is complicated enough for you to even consider automatically generating the devices for it, then it is too complicated.


Tell that to the designers of C++.


> flex and byson only tell you there is a keyword here or a number or a string here, and they're in a declaration, or this is an if-block, nothing more.

That's not true, it does a hell of a lot more than that, and as a side effect of all that it is slow and rigid, as >>13140 said GCC switched from flex/bison to a custom built parser due to the limitations and downsides of the generators. Clang from the LLVM project also implements the parser and lexer themselves. As nice as compiler compilers are, they still have limitations, as I said prior, there is no "magic key", if you want to get the parser done quickly then flex/bison, or recc/lemon might be the way to go, but if you really need to optimize it until it bleeds custom written is the only way you can get that.


Hey, forgive me, but I'm an idiot. Are there any pages or books or something that give a more or less step by step guide through flex/bison? I need them for a project but they've been giving me so much trouble I've seriously considered just writing my own parser just out of frustration. I've found a lot of books on things in general about it but nothing really substantive.


There's always the official GNU manual. It's even discounted:

They also sell Flex reference cards:

Have you already looked at the official documentation?


Yeah I did. I mostly just forget little things and panic. Like I forgot how to actually compile them into a single program. Thank you.


For a while I've been reading esolangs.org, and considering what might be good for an esoteric language. Some of my ideas sound to me like they might actually be useful, though either way I'd like to try actually implementing them.

Something I'm interested in is a system which uses a form of automatic variable ( https://en.wikipedia.org/wiki/Automatic_variable ) in which all variables are defined at the start of a block, and the assignment is treated as an if statement which holds true if the assignment succeeds and false if it fails. The idea is to basically have something a bit like manual memory management, but more statically checkable and less prone to undefined behavior.

A related idea is to have memory addresses are controlled in a manner similar to network subnetting: All pointer addresses would be ANDed with a submemory mask and ORed with its memory prefix before use. In this way a program would be able to sandbox parts of itself recursively, with all array overflows being confined to the data structure they were immediately related to. I figure this might be slightly faster than proper bounds checking while still limiting the scope of an overflow's effects. Though it would also make every data structure exactly a power of two in size, which would be costly.

An entirely different line of thought started with me wanting to code in a way that felt more like working with physical phenomena. Somehow this got me thinking of Fourier transforms, and how a DCT can be used to approximate arbitrary wavelike functions. This seemed to me like it'd be good for a trail-and-error approach that could feel like carving way at a program, by starting with lower frequency elements of the functions and refining with higher frequencies. The outputs of these wavelike functions can then be used as inputs for others. Multidimensional functions would be simple compositions of one dimensional functions.

I guess with finite resolution I could also just use anything that can generate a bitmap for 2D functions, which might be interesting as well. Aside from that, it's so far essentially a subset of most any language that supports a cos function, but I'd like to see what I can get from playing around with the idea. Now to start attempting to make a reference implementation again, but not in javascript this time.


If your project isn't a personal one, I'd argue not to do write a parser system from scratch.

I had done it and now I'm backpeddaling into flex+bison because it is easier to justify and write about.

I got initially frustrated at ANTLR refusing to work, anyone ever done something using it?


I used ANTLR in Java to write a parser for a previously regexp-based wiki language. It was really awkward to do because I had to try to mimic the behaviour of the interaction of these horrific regexps, whilst still producing a useful AST and fixing behaviours that everyone agreed were bugs.

It was a fun challenge.


Currently working through PLAI after finishing SICP. Am I crazy for thinking PLAI is harder? SICP had a lot more structure and built ideas up incrementally, as well as the advantage of being able to check my solutions against those that have been posted all over the internet. PLAI just seems to throw exercises at you, then makes no mention of them again with no way to check answers. I haven't not been able to do anything yet, but the fact it just throws out these perhaps half-subjective questions with no introduction, or the exercises which ramp up in difficulty rapidly. It goes from a trivial calculator with addition and multiplication to 'now add conditional statements to the language' with 0 hints or discussion of what they even mean by that. If I hadn't done SICP first then this would be extremely challenging.


>now add conditional statements to the language
Sounds the kind of stuff LiSP (Lisp in Small Pieces) can help you with.
Haven't read PLAI but the goals of the book suggest that LiSP might suit you better.
Oh and there are also video lectures for PLAI. Maybe it is made to read along watching them like you did with SICP?


But that means Sayaka :: CuteGirl! It's cute though so who cares.


Diving in head first with a Scheme to JavaScript compiler, feels like I finally got things rolling today!
#;31> (define (expand form)
(macroexpand form))
#;32> (define exp
(define (cons a b)
(let ((o (empty-object)))
(object-set! o "car" a)
(object-set! o "cdr" b)
(object-set! o "type" "cons")
(cons 1 (cons 2 (cons 3 4)))))
#;43> (display (expand exp))
var s1 = (function (s2, s3) {
return ((function (s4) {
s4["car"] = s2;
s4["cdr"] = s3;
s4["type"] = "cons";
return s4;


(s1)(1, (s1)(2, (s1)(3, 4)));
It's certainly ugly output, but it's "readable" in some sense, and more importantly it works.

(Sorry if this is cringey, I know I don't know what I'm doing but I'm having fun. I know Biwa already exists I just a toy compiler project)


I read the SICP about a year ago but didn't really find its approach compelling because it cared about too many "low level" details. I don't want to read the full source of the compiler, I want to understand the concepts and ideas behind it.
I also recently read Luca Cardelli's "Compiling a Functional Language"[1] and found that paper a lot more insightful. Instead of elaborating all of the compiler, he picks out only the details he found the most interesting or innovative. I found that I learned a lot more that way, not caring about most of the implementation details.
Is there a full book that takes a similar approach in that it explains compilers in a language agnostic way instead of walking through the source code of one?

1: http://lucacardelli.name/Papers/CompilingML.A4.pdf


The source code and ideas expressed in the book are two forms of the same thing, you must read both.

"Programs are meant to be read by humans and only incidentally for computers to execute."

SICP is just as much about the practice of programming as it is about how to implement languages. If you want it to be strictly one or another you won't understand it correctly.


Oh god these are so cute I feel horrible for hating many of those languages.

SICP was originally for being used in CS intro courses, so it covers a lot of ground...


>>16898 again.
After a little bit of searching around the internet, I found a book that looks like what I was looking for: http://www.diku.dk/~torbenm/Basics/basics_lulu2.pdf
It's a high level, language agnostic overview of the techniques involved in building a compiler.


C++ is attractive.


>I don't want to read the full source of the compiler
Why not?


Because the full source code of any program has tons of stuff that obscures the central concepts like glue-code, user interaction or handling of errors.


File: 1466323157803.png (2.14 MB, 200x113, o6PfJHZ.gif)

So on this topic I decided to attempt to implement a Malbolge compiler, I know there have been interpreters written, and that's no fun, so I decided to actually generate machine code.

One issue that I've found with that is that it is implemented on a ternary VM by spec so there need to be some bootstrap/boilerplate assembly to handle the ternary logic, but other than that I think this will turn out to be fun project, or am I utterly insane?


There's a reason there are interpreters, but no compilers for a language. Some languages are simply too dynamic, to make it possible to compile them to machine code without huge amount of boilerplate (or including VM/interpreter in the binary).

Another thing, you can't "handle" ternary logic. If you want to compile for binary architecture, you have to translate source to work on binary architecture (at any point of compilation, it's up to you when you do this). The job of VM is to emulate ternary on binary, so you don't have to translate program to it during compilation.

Basically what you want to do is to write VM that will handle ternary (your "bootstrap/boilerplate"), pack it into compiled binary and run on the built-in interpreter (possibly low level VM). Just worded differently.

You are not insane, rather wasting your time, if you decide to start writing it. But go ahead and research the topic, you should find good arguments why it's not a good idea.


I've often heard that functional languages (especially the ML-likes) are very good for building compilers. Do any of you know why?
The only thing I could come up with so far is that it's easy to construct an AST from a context free grammar using algebraic data types and manipulate it.


I think it's the pattern matching. Not related to compilers, but when we were learning about red-and-black trees in an imperative setting, rebalancing seemed like some black magic that somebody once came up with, probably under the influence of heavy drugs, and the only way to learn it was memorizing what to rotate in what order. Meanwhile with pattern matching you can just write down the four imbalanced cases and rearrange them to a single balanced one:


I'm designing a DSL for representing rpg dice notation and sheets. Of course, I need a `d` operator for the rolls, but the lexer is considering inputs like `d20` as an identifier called `d20` instead as `d` and `20`.

Any ideas of how the problem of operators like the `d` one is handled?


Don't lexer generators allow you to set precedences for the different kinds of expressions? You'd just have to make sure the d-operator has precedence over normal identifiers.


Indeed. But as 'd20' is larger than 'd', the lexer chooses the former.


File: 1467674915319-0.png (63.77 KB, 200x112, wat.png)

File: 1467674915319-1.png (332.67 KB, 200x200, Semantics and algorithms for data-dependent grammars.pdf)

This Earley-style parsing algorithm seems top-notch and I wanted to implement it.

I managed to implement Earley but I have no idea how to make sense of this... Anyone know?


What are you doing? If you only wanted to implement the Earley algorithm, you don't need this. On the left, these are rules for translating regular expressions to finite automata, similar to Thompson algorithm. On the right, they seem to be laying out a declarative specification of the parsing algorithm so that it's easier to reason about.


I'm trying to understand and implement that Yakker algorithm. It's based on Earley's elegant algorithm.

How do I read and understand those translation rules as well as that declarative specification? I've seen writing of that form in several papers and I don't know where it comes from or what it means.


File: 1468224866114.png (1.1 MB, 200x200, Practical Foundations for Programming Languages.pdf)

Read the first part of 'Practical Foundations for Programming Languages'.





I wrote an interpreter in Golang.

I'm not really done with it, but it's Turing complete already. It features list, numbers, loops, a fuckload of conditionals (if, else if, also if, else, also), strings, GET requests, file IO, and basically everything I could think of that I might need off the top of my head. The current way "functions" work is through a shitty hack using the lazySet and run functions, but I'm implementing a proper defun{} and f(x) = function{} syntax right now.


This thread is pretty much dead, but here is an update -
defun() is done, as is penis(x) = { return:x }
Comments are '#' and ';;' now, lots of other shit. Check the git commits.

Anyways, what's the best way to format a language's documentation?


That's actually very interesting to me. How does it work? I'm assuming its different from else.


Inspired by >>18979

How would you design your language's module system?


I think I'd do something like Python's modules, but without all that idiotic package and __init__ bullsoykaf.

Also, it'd be a lot easier to develop and use modules. It took only god knows how many years for Python to improve its package manager thingy, and its still soykaf compared to pretty much any other dynamic language out there. It's super simple to create a Ruby gem or a JS module but in Python you have to figure out a whole bunch of completely nonsensical bullsoykaf


>Does anyone have a good resource for the specifications of an incredibly simple, barebones lisp that I could use as the basis of my language to build off of?
There is a book called "Zen Style Programming" by Nils M Holm that shows how to program a Scheme in C


I think you mean Scheme 9 , Zen style shows how to use minimal Scheme to get things done


I would suggest this:
read A Gentle Introduction to Symbolic Computation, write your lisp based on the language as it is presented chapter by chapter. (starting from chapter 3 or 4, when cons cells are introduced).
You'd start off with a very simple parser (if you need help with recursion read The Little Schemer), and an implementation for cons cells. And would be gradually building up from that


It's just the opposite of an else. It only gets evaluated if the last conditional was true. It sort of functions like putting your code directly into the 'if' block does, but because of the fact that conditionals are functions and not structs in DeviousYarn, you can throw some statements in between. You can also use the alf/'also if' to function sort of like a conditional within another conditional would. I think I've got them all written up in the wiki on the Github, so if you can compile go programs, feel free to give it a try.