Login | Register
My pages Projects Community openCollabNet

Ephedra - A C to Java Migration Environment



As part of my Ph.D. thesis, I developed a C/C++ to Java migration tool. The tool reads C/C++ source code and produces Java source code. Though it can convert most kinds of C/C++ source code, the focus is on C/C++ libraries that do not use any or much GUI code.

The goals of the transliteration are:

  • readability of the generated Java source code,
  • easy integration and interfacing with native Java code,
  • need for little or no user interaction during the transliteration process,
  • good performance,
  • basic C++ support (see below for details).


Well, the code of Ephedra is the ultimate and most exact source of information on Ephedra, but I have published some papers that will give you an overview.
  • My dissertation documents most of the transformations Ephedra does.
    Johannes Martin. Ephedra - A C to Java Migration Environment. Ph.D. Dissertation, University of Victoria, Kanada, April 2002.
  • The tool to detect type casts between unrelated data types and suggested improvements is documented in the following book chapter:
    Johannes Martin, Hausi Müller. Discovering Implicit Inheritance Relations in Non Object-Oriented Code. In Advances in Software Engineering: Comprehension, Evaluation, and Evolution. Edited by Erdogmus / Tanir (e), Bell Canada, Springer Verlag, New York, December 2001, ISBN 0-387-95109-1.
  • Some of the conversion techniques and strategies of the language conversion tool are explained in the following paper:
    Johannes Martin, Hausi Müller. Strategies for Migration from C to Java. In Proceedings of the 5th European Conference for Software Maintenance and Reengineering, Lisbon, Portugal, 2001.
    This paper also compares Ephedra with some of the other current C/C++ to Java conversion strategies.
  • Some of the experiences using Ephedra are documented in the following paper:
    Johannes Martin, Hausi Müller. C to Java Migration Experiences. In Proceedings of the 6th European Conference for Software Maintenance and Reengineering, Budapest, Hungary, March 2002.


Implementation Details

In order to be able to concentrate on the language and paradigm conversion part of the problem rather than on the problem of parsing C/C++ code, I used IBM's VisualAge C++. It comes with an API (CodeStore API) that lets a tool developer query the compiler about the code it compiles. This way I didn't have to write my own C/C++ parser, but I am also bound to some limitations of IBM's VisualAge C++ and the CodeStore API:

  • Availability: The CodeStore API used to be available only under a non-disclosure agreement on all platforms supported by VisualAge C++ 4.0 (i.e. OS/2, Windows NT, and AIX). With version 5.0 of VisualAge, the CodeStore API is included and documented with the main package, but IBM dropped support for OS/2 and Windows NT. So, you will probably only be able to run my converter if you have access to an AIX machine with IBM VisualAge C++ 5.0 installed.
  • The CodeStore API lets you query information on C++ code only, i.e., if you need to convert plain C code, you need to get it to compile under C++ first (there are tools available to help you in this task).
  • If there's a problem with the parser (i.e. VisualAge C++), I won't be able to fix it - I can only hope that IBM will eventually reply to my bug reports.
  • VisualAge C++ has some extensions that might cause problems when compiling C++ code written for other compilers. They are usually minor and easy to fix. Usually the extensions are more helpful than harmful.

The conversion tool handles most of the C and C++ languages, except for parts that are hard to convert automatically and are only infrequently used. Also, manual conversions might sometimes be more efficient than the conversion the tool chooses. Some of these issues are:

  • gotos are not yet converted. I have an algorithm to convert most gotos, but I have not had a chance to implement it.
  • Type casts between completely unrelated data types will not be handled. My opinion is that they are bad and should be avoided. A tool is provided that detects these type casts and provides suggestions on how to change the data structures involved to avoid these type casts.
  • The conversion of unions is not optimal.
  • Templates are converted inefficiently. For every instantiation of the template, a class is created. This is what a C++ compiler does, so there is no disadvantage as far as performance is concerned, but maintenance of the generated code becomes far more difficult (as opposed to maintaining one template, you will have to maintain all the instances). Ideally, templates should be converted to classes operating on Java's Object class. This task is left for some future version of Ephedra.

That's it for cons. On the pro side:

  • Ephedra produces plain Java. The generated code is human readable and it compiles with Sun's JDK. It does not use the Java Reflection API, which gives compilers and virtual machines a better chance to optimise code.
  • Ephedra supports pointer arithmetic.
  • Ephedra supports function pointers.
  • Ephedra does not circumvent Java's type or runtime safety. The JVM still takes care of checking type conversions, security, and storage access.
  • Ephedra converts C/C++ data types to plain Java classes. Thus, mainstream Java programs can access and use these data types.