This is EBNF grammar for ANSI C (C99) and it contains almost every rule. It may be missing stuff, please tell me if you notice something missing.

I am writing a C compiler, with my backend and hopefully my own frontend in OCaml. That is why I wrote this grammar. I also have written the AWK grammar, but it’s not uploaded anywhere. Tell me if you want it.

Thanks.

    • I think digraphs and trigraphs are part of the preprocessor? I did not add any preprocessor stuff to this grammar. I am adding them to the new version I am working on.

      I have read the C17 standard fully and I did recall it from memory from time to time but it seems like I had forgotten a lot of stuff. I am redefining it, and I am redigning my AWK grammar too.

      I am hoping I could perhaps make a Github pages website called Internet Grammar Database and have all sorts of grammar inside it. Thoughts?

  • sim642@lemm.ee
    link
    fedilink
    arrow-up
    1
    ·
    8 months ago

    I am currently writing a C compiler, with my own backend (and hopefully, frontend) in OCaml.

    But why write your own C frontend? It’s much more of a pain than people imagine. I maintain a C frontend implemented in OCaml (the project itself goes back 25 years) and it’s still not on par with GCC or Clang.

    For any other language, sure, but C has so many “wonderful” features, starting with the lexer hack. Your grammar conveniently overlooks this issue but it’s something you’ll have to deal with to actually implement it. So it simply won’t be as nice as theory suggests.

    • You’re right yeah. Hand-implementing lexers and parsers is kind of ‘inane’. I’m not saying it’s stupid. For a small grammar it makes sense. But for a big grammar, just use a PEG generator, or Yacc/Lex. Rust has Lalrpop and Java has ANTLR. There’s truly no need to implement a parser from scratch. But people on the internet really seem to think using lexer and parser generators ‘limits’ them. There are some hacks involed in most Lex/Yacc or PEG specs, but at the end people should keep in mind that LR parsers MUST be generated!

      Maybe implement the scanner? Even that is kinda stupid. Unless you do what Rob Pike says: https://www.youtube.com/watch?v=HxaD_trXwRE

    • Not with this grammar. There’s this parser-generator-immedate called BNFC that uses it’s own flavor of BNF (Labeled BNF) to generate Yacc/Lex (or ANTLR when can), an abstract syntax tree, etc, but I don’t like it. There are no EBNF parser generators AFAIK. One could, possibly, feed this to ChatGPT and ask for a Yacc/Lex pair in return, or even a manual parser! I may do that, but I first have to clean this up and add stuff that aren’t there.

      ChatGPT has changed langdev a lot for me. I automate a good portion of the processo with it. But one needs solid specs to feed to it.

      As I said I wish to implement the frontend myself, basically the lexer/parser. But I kinda get bored with LP because it’s too time-consuming. Plus LR(1) can only be generated, it’s only LL(1) which can be hand-written. I have not decided yet. I wish to focus more on the backend, because that is where you can do innovative shit and perhaps, write a paper on it.

      Also, I’m going to leave C23 to people who have years of experience. ANSI C is the lower denomniator of C. I am using C99 standard, which should be able to compile a good portion of code bases. C99 is the last required POSIX standard for C. That’s when C went under ISO.

      Thanks.