= Hacking the compiler :camel: This document is a work-in-progress attempt to provide useful information for people willing to inspect or modify the compiler distribution's codebase. Feel free to improve it by sending change proposals for it. If you already have a patch that you would like to contribute to the official distribution, please see link:CONTRIBUTING.md[]. === Your first compiler modification 1. Create a new git branch to store your changes. + ---- git checkout -b my-modification ---- Usually, this branch wants to be based on `trunk`. If your changes must be on a specific release, use its release branch (*not* the release tag) instead. For example, to make a fix for 4.11.1, base your branch on *4.11* (not on *4.11.1*). The `configure` step for the compiler recognises a development build from the `+dev` in the version number (see file `VERSION`), and release tarballs and the tagged Git commits do not have this which causes some important development things to be disabled (ocamltest and converting C compiler warnings to errors). 2. Consult link:INSTALL.adoc[] for build instructions. Here is the gist of it: + ---- ./configure make ---- If you are on a release build and need development options, you can add `--enable-ocamltest` (to allow running the testsuite) and `--enable-warn-error` (so you don't get caught by CI later!). 3. Try the newly built compiler binaries `ocamlc`, `ocamlopt` or their `.opt` version. To try the toplevel, use: + ---- make runtop ---- 4. Hack frenetically and keep rebuilding. 5. Run the testsuite from time to time. + ---- make tests ---- 6. You did it, Well done! Consult link:CONTRIBUTING.md[] to send your contribution upstream. See also our <>, for example on how to <> to test your modified compiler. === What to do There is always a lot of potential tasks, both for old and newcomers. Here are various potential projects: * https://github.com/ocaml/ocaml/issues[The OCaml bugtracker] contains reported bugs and feature requests. Some changes that should be accessible to newcomers are marked with the tag link:++https://github.com/ocaml/ocaml/issues?q=is%3Aopen+is%3Aissue+label%3Anewcomer-job++[ newcomer-job]. * The https://github.com/ocamllabs/compiler-hacking/wiki/Things-to-work-on[OCaml Labs compiler-hacking wiki] contains various ideas of changes to propose, some easy, some requiring a fair amount of work. * Documentation improvements are always much appreciated, either in the various `.mli` files or in the official manual (See link:manual/README.md[]). If you invest effort in understanding a part of the codebase, submitting a pull request that adds clarifying comments can be an excellent contribution to help you, next time, and other code readers. * The https://github.com/ocaml/ocaml[github project] contains a lot of pull requests, many of them being in dire need of a review -- we have more people willing to contribute changes than to review someone else's change. Picking one of them, trying to understand the code (looking at the code around it) and asking questions about what you don't understand or what feels odd is super-useful. It helps the contribution process, and it is also an excellent way to get to know various parts of the compiler from the angle of a specific aspect or feature. + Again, reviewing small or medium-sized pull requests is accessible to anyone with OCaml programming experience, and helps maintainers and other contributors. If you also submit pull requests yourself, a good discipline is to review at least as many pull requests as you submit. == Structure of the compiler The compiler codebase can be intimidating at first sight. Here are a few pointers to get started. === Compilation pipeline ==== The driver -- link:driver/[] The driver contains the "main" function of the compilers that drive compilation. It parses the command-line arguments and composes the required compiler passes by calling functions from the various parts of the compiler described below. ==== Parsing -- link:parsing/[] Parses source files and produces an Abstract Syntax Tree (AST) (link:parsing/parsetree.mli[] has lot of helpful comments). See link:parsing/HACKING.adoc[]. The logic for Camlp4 and Ppx preprocessing is not in link:parsing/[], but in link:driver/[], see link:driver/pparse.mli[] and link:driver/pparse.ml[]. ==== Typing -- link:typing/[] Type-checks the AST and produces a typed representation of the program (link:typing/typedtree.mli[] has some helpful comments). See link:typing/HACKING.adoc[]. ==== The bytecode compiler -- link:bytecomp/[] ==== The native compiler -- link:middle_end/[] and link:asmcomp/[] === Runtime system === Libraries link:stdlib/[]:: The standard library. Each file is largely independent and should not need further knowledge. link:otherlibs/[]:: External libraries such as `unix`, `threads`, `dynlink`, `str` and `bigarray`. Instructions for building the full reference manual are provided in link:manual/README.md[]. However, if you only modify the documentation comments in `.mli` files in the compiler codebase, you can observe the result by running ---- make html_doc ---- and then opening link:./ocamldoc/stdlib_html/index.html[] in a web browser. === Tools link:lex/[]:: The `ocamllex` lexer generator. link:yacc/[]:: The `ocamlyacc` parser generator. We do not recommend using it for user projects in need of a parser generator. Please consider using and contributing to link:http://gallium.inria.fr/~fpottier/menhir/[menhir] instead, which has tons of extra features, lets you write more readable grammars, and has excellent documentation. === Complete file listing BOOTSTRAP.adoc:: instructions for bootstrapping Changes:: what's new with each release CONTRIBUTING.md:: how to contribute to OCaml HACKING.adoc:: this file INSTALL.adoc:: instructions for installation LICENSE:: license and copyright notice Makefile:: main Makefile Makefile.common:: common Makefile definitions Makefile.tools:: used by manual/ and testsuite/ Makefiles README.adoc:: general information on the compiler distribution README.win32.adoc:: general information on the Windows ports of OCaml VERSION:: version string. Run `make configure` after changing. asmcomp/:: native-code compiler and linker boot/:: bootstrap compiler build-aux/: autotools support scripts bytecomp/:: bytecode compiler and linker compilerlibs/:: the OCaml compiler as a library configure:: configure script configure.ac: autoconf input file debugger/:: source-level replay debugger driver/:: driver code for the compilers flexdll/:: git submodule -- see link:README.win32.adoc[] lex/:: lexer generator man/:: man pages manual/:: system to generate the manual middle_end/:: the flambda optimisation phase ocamldoc/:: documentation generator ocamltest/:: test driver otherlibs/:: several additional libraries parsing/:: syntax analysis -- see link:parsing/HACKING.adoc[] runtime/:: bytecode interpreter and runtime systems stdlib/:: standard library testsuite/:: tests -- see link:testsuite/HACKING.adoc[] tools/:: various utilities toplevel/:: interactive system typing/:: typechecking -- see link:typing/HACKING.adoc[] utils/:: utility libraries yacc/:: parser generator [#tips] == Development tips and tricks === Keep merge commits when merging and cherry-picking Github PRs Having the Github PR number show up in the git log is very useful for later triaging. We recently disabled the "Rebase and merge" button, precisely because it does not produce a merge commit. When you cherry-pick a PR in another branch, please cherry-pick this merge-style commit rather than individual commits, whenever possible. (Picking a merge commit typically requires the `-m 1` option.) You should also use the `-x` option to include the hash of the original commit in the commit message. ---- git cherry-pick -x -m 1 ---- [#opam-switch] === Testing with `opam` If you are working on a development version of the compiler, you can create an opam switch from it by running the following from the development repository: ----- -opam switch create . --empty -opam install . ----- If you want to test someone else's development version from a public git repository, you can build a switch directly (without cloning their work locally) by pinning: ---- opam switch create my-switch-name --empty # Replace $VERSION by the trunk version opam pin add ocaml-variants.$VERSION+branch git+https://$REPO#branch ---- === Useful Makefile targets Besides the targets listed in link:INSTALL.adoc[] for build and installation, the following targets may be of use: `make runtop` :: builds and runs the ocaml toplevel of the distribution (optionally uses `rlwrap` for readline+history support) `make natruntop`:: builds and runs the native ocaml toplevel (experimental) `make partialclean`:: Clean the OCaml files but keep the compiled C files. `make depend`:: Regenerate the `.depend` file. Should be used each time new dependencies are added between files. `make -C testsuite parallel`:: see link:testsuite/HACKING.adoc[] Additionally, there are some developer specific targets in link:Makefile.dev[]. These targets are automatically available when working in a Git clone of the repository, but are not available from a tarball. === Automatic configure options If you have options to `configure` which you always (or at least frequently) use, it's possible to store them in Git, and `configure` will automatically add them. For example, you may wish to avoid building the debug runtime by default while developing, in which case you can issue `git config --global ocaml.configure '--disable-debug-runtime'`. The `configure` script will alert you that it has picked up this option and added it _before_ any options you specified for `configure`. Options are added before those passed on the command line, so it's possible to override them, for example `./configure --enable-debug-runtime` will build the debug runtime, since the enable flag appears after the disable flag. You can also use the full power of Git's `config` command and have options specific to particular clone or worktree. === Speeding up configure `configure` includes the standard `-C` option which caches various test results in the file `config.cache` and can use those results to avoid running tests in subsequent invocations. This mechanism works fine, except that it is easy to clean the cache by mistake (e.g. with `git clean -dfX`). The cache is also host-specific which means the file has to be deleted if you run `configure` with a new `--host` value (this is quite common on Windows, where `configure` is also quite slow to run). You can elect to have host-specific cache files by issuing `git config --global ocaml.configure-cache .`. The `configure` script will now automatically create `ocaml-host.cache` (e.g. `ocaml-x86_64-pc-windows.cache`, or `ocaml-default.cache`). If you work with multiple worktrees, you can share these cache files by issuing `git config --global ocaml.configure-cache ..`. The directory is interpreted _relative_ to the `configure` script. === Bootstrapping The OCaml compiler is bootstrapped. This means that previously-compiled bytecode versions of the compiler and lexer are included in the repository under the link:boot/[] directory. These bytecode images are used once the bytecode runtime (which is written in C) has been built to compile the standard library and then to build a fresh compiler. Details can be found in link:BOOTSTRAP.adoc[]. === Speeding up builds Once you've built a natively-compiled `ocamlc.opt`, you can use it to speed up future builds by copying it to `boot`: ---- cp ocamlc.opt boot/ ---- If `boot/ocamlc` changes (e.g. because you ran `make bootstrap`), then the build will revert to the slower bytecode-compiled `ocamlc` until you do the above step again. === Using merlin During the development of the compiler, the internal format of compiled object files evolves, and quickly becomes incompatible with the format of the last OCaml release. In particular, even an up-to-date merlin will be unable to use them during most of the development cycle: opening a compiler source file with merlin gives a frustrating error message. To use merlin on the compiler, you want to build the compiler with an older version of itself. One easy way to do this is to use the experimental build rules for Dune, which are distributed with the compiler (with no guarantees that the build will work all the time). Assuming you already have a recent OCaml version installed with merlin and dune, you can just run the following from the compiler sources: ---- ./configure # if not already done make clean && dune build @libs ---- which will do a bytecode build of all the distribution (without linking the executables), using your OCaml compiler, and generate a .merlin file. Merlin will be looking at the artefacts generated by dune (in `_build`), rather than trying to open the incompatible artefacts produced by a Makefile build. In particular, you need to repeat the dune build every time you change the interface of some compilation unit, so that merlin is aware of the new interface. You only need to run `configure` once, but you will need to run `make clean` every time you want to run `dune` after you built something with `make`; otherwise dune will complain that build artefacts are present among the sources. Finally, there will be times where the compiler simply cannot be built with an older version of itself. One example of this is when a new primitive is added to the runtime, and then used in the standard library straight away, since the rest of the compiler requires the `stdlib` library to build, nothing can be build. In such situations, you will have to either live without merlin, or develop on an older branch of the compiler, for example the maintenance branch of the last released version. Developing a patch from a release branch can later introduce a substantial amount of extra work, when you rebase to the current development version. But it also makes it a lot easier to test the impact of your work on third-party code, by installing a local <>: opam packages tend to be compatible with released versions of the compiler, whereas most packages are incompatible with the in-progress development version. === Continuous integration ==== Github's CI: Travis and AppVeyor The script that is run on Travis continuous integration servers is link:tools/ci/travis/travis-ci.sh[]; its configuration can be found as a Travis configuration file in link:.travis.yml[]. For example, if you want to reproduce the default build on your machine, you can use the configuration values and run command taken from link:.travis.yml[]: ---- CI_KIND=build XARCH=x64 bash -ex tools/ci/travis/travis-ci.sh ---- The scripts support two other kinds of tests (values of the `CI_KIND` variable) which both inspect the patch submitted as part of a pull request. `tests` checks that the testsuite has been modified (hopefully, improved) by the patch, and `changes` checks that the link:Changes[] file has been modified (hopefully to add a new entry). These tests rely on the `$TRAVIS_COMMIT_RANGE` variable which you can set explicitly to reproduce them locally. The `changes` check can be disabled by including "(no change entry needed)" in one of your commit messages -- but in general all patches submitted should come with a Changes entry; see the guidelines in link:CONTRIBUTING.md[]. ==== INRIA's Continuous Integration (CI) INRIA provides a Jenkins continuous integration service that OCaml uses, see link:https://ci.inria.fr/ocaml/[]. It provides a wider architecture support (MSVC and MinGW, a zsystems s390x machine, and various MacOS versions) than the Travis/AppVeyor testing on github, but only runs on commits to the trunk or release branches, not on every PR. You do not need to be an INRIA employee to open an account on this jenkins service; anyone can create an account there to access build logs and manually restart builds. If you would like to do this but have trouble doing it, please email ocaml-ci-admin@inria.fr. To be notified by email of build failures, you can subscribe to the ocaml-ci-notifications@inria.fr mailing list by visiting https://sympa.inria.fr/sympa/info/ocaml-ci-notifications[its web page.] ==== Running INRIA's CI on a publicly available git branch If you have suspicions that your changes may fail on exotic architectures (they touch the build system or the backend code generator, for example) and would like to get wider testing than github's CI provides, it is possible to manually start INRIA's CI on arbitrary git branches even before opening a pull request as follows: 1. Make sure you have an account on Inria's CI as described before. 2. Make sure you have been added to the ocaml project. 3. Prepare a branch with the code you'd like to test, say "mybranch". It is probably a good idea to make sure your branch is based on the latest trunk. 4. Make your branch publicly available. For instance, you can fork OCaml's GitHub repository and then push "mybranch" to your fork. 5. Visit https://ci.inria.fr/ocaml/job/precheck and log in. Click on "Build with parameters". 6. Fill in the REPO_URL and BRANCH fields as appropriate and run the build. 7. You should receive a bunch of e-mails with the build logs for each slave and each tested configuration (with and without flambda) attached. ==== Changing what the CI does INRIA's CI "main" and "precheck" jobs run the script tools/ci-build. In particular, when running the CI on a publicly available branch via the "precheck" job as explained in the previous section, you can edit this script to change what the CI will test. For instance, parallel builds are only tested for the "trunk" branch. In order to use "precheck" to test parallel build on a custom branch, add this at the beginning of tools/ci-build: ---- OCAML_JOBS=10 ---- === The `caml-commits` mailing list If you would like to receive email notifications of all commits made to the main git repository, you can subscribe to the caml-commits@inria.fr mailing list by visiting https://sympa.inria.fr/sympa/info/caml-commits[its web page.] Happy Hacking!