EBNF Grammar for ANSI C (+ Guide on reading EBNF)

ChubakPDP11+TakeWithGrainOfSalt@programming.dev · 9 months ago

EBNF Grammar for ANSI C (+ Guide on reading EBNF)

OmnipotentEntity@beehaw.org · edit-2 9 months ago

Are digraphs and trigraphs deprecated?

Did you reference the standard?

ChubakPDP11+TakeWithGrainOfSalt@programming.dev · 9 months ago

I think digraphs and trigraphs are part of the preprocessor? I did not add any preprocessor stuff to this grammar. I am adding them to the new version I am working on.

I have read the C17 standard fully and I did recall it from memory from time to time but it seems like I had forgotten a lot of stuff. I am redefining it, and I am redigning my AWK grammar too.

I am hoping I could perhaps make a Github pages website called Internet Grammar Database and have all sorts of grammar inside it. Thoughts?

navigatron@beehaw.org · 9 months ago

I love grammars. It’s like an API or a data schema, but for a language. This would be very cool and I would love to see it!

ChubakPDP11+TakeWithGrainOfSalt@programming.dev · 9 months ago

Cool! I will make it.

OmnipotentEntity@beehaw.org · 8 months ago

Trigraphs are handled by the preprocessor, so if you’re not handling that, then that’s fine. Digraphs are handled by the tokenizer, however.

ChubakPDP11+TakeWithGrainOfSalt@programming.dev · 8 months ago

Cool, I am making a second version were things are cleaner, I will add digraphs, trigraphs and preprocessor directives to the gramamr as wel. Thanks.

sim642@lemm.ee · 8 months ago

I am currently writing a C compiler, with my own backend (and hopefully, frontend) in OCaml.

But why write your own C frontend? It’s much more of a pain than people imagine. I maintain a C frontend implemented in OCaml (the project itself goes back 25 years) and it’s still not on par with GCC or Clang.

For any other language, sure, but C has so many “wonderful” features, starting with the lexer hack. Your grammar conveniently overlooks this issue but it’s something you’ll have to deal with to actually implement it. So it simply won’t be as nice as theory suggests.

ChubakPDP11+TakeWithGrainOfSalt@programming.dev · 8 months ago

You’re right yeah. Hand-implementing lexers and parsers is kind of ‘inane’. I’m not saying it’s stupid. For a small grammar it makes sense. But for a big grammar, just use a PEG generator, or Yacc/Lex. Rust has Lalrpop and Java has ANTLR. There’s truly no need to implement a parser from scratch. But people on the internet really seem to think using lexer and parser generators ‘limits’ them. There are some hacks involed in most Lex/Yacc or PEG specs, but at the end people should keep in mind that LR parsers MUST be generated!

Maybe implement the scanner? Even that is kinda stupid. Unless you do what Rob Pike says: https://www.youtube.com/watch?v=HxaD_trXwRE

solrize@lemmy.world · 9 months ago

Is there a parser generator you’re going to use with that grammar? Why not C23?

ChubakPDP11+TakeWithGrainOfSalt@programming.dev · 9 months ago

Not with this grammar. There’s this parser-generator-immedate called BNFC that uses it’s own flavor of BNF (Labeled BNF) to generate Yacc/Lex (or ANTLR when can), an abstract syntax tree, etc, but I don’t like it. There are no EBNF parser generators AFAIK. One could, possibly, feed this to ChatGPT and ask for a Yacc/Lex pair in return, or even a manual parser! I may do that, but I first have to clean this up and add stuff that aren’t there.

ChatGPT has changed langdev a lot for me. I automate a good portion of the processo with it. But one needs solid specs to feed to it.

As I said I wish to implement the frontend myself, basically the lexer/parser. But I kinda get bored with LP because it’s too time-consuming. Plus LR(1) can only be generated, it’s only LL(1) which can be hand-written. I have not decided yet. I wish to focus more on the backend, because that is where you can do innovative shit and perhaps, write a paper on it.

Also, I’m going to leave C23 to people who have years of experience. ANSI C is the lower denomniator of C. I am using C99 standard, which should be able to compile a good portion of code bases. C99 is the last required POSIX standard for C. That’s when C went under ISO.

Thanks.

EBNF Grammar for ANSI C (+ Guide on reading EBNF)

EBNF Grammar for ANSI C (+ Guide on reading EBNF)

EBNF Grammar for C