A JIT compiler for PHP

How does PHP run?

PHP is a "scripting language", which means it is not directly compiled into machine language.

When you start a PHP program, the Zend Engine will parse the code into an abstract syntax tree (AST) and translate it to opcodes. The opcodes are execution units for the Zend Virtual Machine (Zend VM). Opcodes are pretty low-level, so much faster to translate to machine code than the original PHP code. PHP has an extension named OPcache in core, to cache these opcodes.

The first run of a PHP program has to go from PHP code to opcodes, but the next runs will use the cached opcodes. For obious reasons, always make sure Zend OPcache is loaded and enabled.

So, that's the current state of PHP execution, in a nutshell.

php.jpg

 

Can they make it faster?

September 1, 2016 Dmitry Stogov posted a message on the PHP internals mailing list that work was started to build a new JIT compiler for PHP targeting release 8.

A just-in-time (JIT) compiler will take the output of the opcodes and instead of interpreting these, it will compile them into machine code and invoke that object code instead. A JIT should overcome the inefficiency of interpreting opcodes every time a program runs. Sounds interesting right?

Note that the JVM (Java), CLR (.net) and HHVM (Facebook's PHP) all take a JIT approach.

The code is available at ZendTech's jit-dynasm branch https://github.com/zendtech/php-src/tree/jit-dynasm/ext/opcache/jit, so not available in the official PHP repo yet.

The basics to support JIT on at least some 32 and 64 bit platforms should be there. They are using DynASM project for code generation. The goal is now to research different JIT approaches and how they can benefit PHP.

The approaches differ in what to compile, how often, and the granularity. Some JIT's compile code only once, resulting in a whole object at a time. Others may compile one method at a time, etc.

All values have a data type declared at compile time, limiting the values a particular expression can take on at run-time. Because a JIT works at run time, it can do a far better job at things like type-inference because it can do inner-procedure analysis. In other words, it knows more about the runtime than ahead of time compilation AOT does, because it works at runtime. At the other hand, the first execution of a JIT is probably slower than an interpreter because of the extra translation steps it has to process.

I'm not an expert in this field, and it all seems very complicated matter, but it's very exciting to see these kinds of things moving in the PHP world. I'm very curious where this path is going to lead.

 

JIT security

A JIT will compile opcodes to machine code and execute them. This is done in memory. The problems is that for security reasons, memory should be either writable or executable (W^X). But never both at the same time.

The current PHP implementation disables writing into the JIT buffer during execution, using the mprotect() systemcall. That means it will compile code and write it to memory and protect it to make it non-writable during execution, preventing all kinds of possible exploits.

There are currently 2 PHP core extensions that violate the W^X principle. Phar and PCRE's JIT. But the new PHP JIT in opcache takes W^X into account from the beginning, which is nice.

I tested the current work on OpenBSD 6.0, which has W^X enabled by default and everything seems to work just fine. No violations. Note that SELinux also enables these kind of protections.

 

Speed

As the mailinglist post notes, no real performance improvements were made yet. It's possible to test the JIT on the PHP benchmark test like this:


php -d opcache.jit_buffer_size=32M Zend/bench.php

References

Tom Van Looy
Door Tom Van Looy