As is too often the case, it doesn't follow what I consider to be a fundamental OCR rule: the input document type should also be a possible output document type. So, PDF in should mean that composited PDF output is possible. This is necessary for groundtruthing.