Implementation Details
In order to be able to concentrate on the language and paradigm conversion
part of the problem rather than on the problem of parsing C/C++ code, I used
IBM's VisualAge C++. It comes with an API (CodeStore API) that lets a tool
developer query the compiler about the code it compiles. This way I didn't
have to write my own C/C++ parser, but I am also bound to some limitations of
IBM's VisualAge C++ and the CodeStore API:
- Availability: The CodeStore API used to be available only under a
non-disclosure agreement on all platforms supported by VisualAge C++ 4.0
(i.e. OS/2, Windows NT, and AIX). With version 5.0 of VisualAge, the
CodeStore API is included and documented with the main package, but IBM
dropped support for OS/2 and Windows NT. So, you will probably only
be able to run my converter if you have access to an AIX machine with
IBM VisualAge C++ 5.0 installed.
- The CodeStore API lets you query information on C++ code only, i.e.,
if you need to convert plain C code, you need to get it to compile
under C++ first (there are tools available to help you in this task).
- If there's a problem with the parser (i.e. VisualAge C++), I won't be able
to fix it - I can only hope that IBM will eventually reply to my bug
reports.
- VisualAge C++ has some extensions that might cause problems when compiling
C++ code written for other compilers. They are usually minor and easy
to fix. Usually the extensions are more helpful than harmful.
The conversion tool handles most of the C and C++ languages, except for parts
that are hard to convert automatically and are only infrequently used. Also,
manual conversions might sometimes be more efficient than the conversion the
tool chooses. Some of these issues are:
- gotos are not yet converted. I have an algorithm to convert
most gotos, but I have not had a chance to implement it.
- Type casts between completely unrelated data types will not be handled.
My opinion is that they are bad and should be avoided. A tool is
provided that detects these type casts and provides suggestions on
how to change the data structures involved to avoid these type casts.
- The conversion of unions is not optimal.
- Templates are converted inefficiently. For every instantiation of the
template, a class is created. This is what a C++ compiler does, so there
is no disadvantage as far as performance is concerned, but maintenance
of the generated code becomes far more difficult (as opposed to maintaining
one template, you will have to maintain all the instances). Ideally,
templates should be converted to classes operating on Java's Object class.
This task is left for some future version of Ephedra.
That's it for cons. On the pro side:
- Ephedra produces plain Java. The generated code is human readable and
it compiles with Sun's JDK. It does not use the Java Reflection API,
which gives compilers and virtual machines a better chance to optimise
code.
- Ephedra supports pointer arithmetic.
- Ephedra supports function pointers.
- Ephedra does not circumvent Java's type or runtime safety. The JVM
still takes care of checking type conversions, security, and storage
access.
- Ephedra converts C/C++ data types to plain Java classes. Thus, mainstream
Java programs can access and use these data types.
Links