A and use, as well as its
A Review of Clang/LLVMEmbeddedSYstemsConor Twomey | R00121583Louise Walsh | R00128425IntroductionThis project will give a comprehensive review of the Clangand LLVM compiler technologies, with particular emphasis being given to theAbstract Syntax Tree, in terms of definition and use, as well as its differentvariations. It will also discuss the advantages of Clang over GCC. Clang andLLVM are compiler technologies designed mainly for the C family of programminglanguages.
As part of this introduction, firstly, a definition of each termwill be given. Clang is a C languagefamily frontend, which uses LLVM as a backend. Clang is built over LLVM and wasbuilt to replace GNU Compiler Collection (GCC). Clang was originally developedby Apple, as GCC didn’t offer sufficient support for Objective-C.
Itscontributors include Apple, Microsoft and Google, as well as more, with Apple makingextensive use of LLVM in many of its systems, including the iPhone’s SDK, andIDE. It is an open source software.The LLVM Project is a collection of modular and reusablecompiler and toolchain technologies. LLVM is a library. It is used toconstruct, and produce optimized intermediate/binary machine code.
There areseveral sub-projects of LLVM, with Clang being one of them. Clang and LLVM are both written in C++, and they produce anAbstract Syntax Tree (AST), which will also be a focus of this review. AnAbstract Syntax Tree, or just syntaxtree, is a representation, using a tree-like structure, of the abstractsyntactic structure of code of a given language. A tree-structure has manynodes, with each node denoting a construct occurring the given source code.
What makes the tree ‘abstract’, is that not every detail that occurs within thegiven source code is represented. Abstract trees are used to aid in theanalysis of programs, and, in program transformation systems.This review will begin with a comprehensive overview ofClang. In this overview, the origins of Clang will be discussed, including thereason behind its development. The advantages and disadvantages of Clang overGCC will also be reviewed and discussed.
ClangWe will now discuss Clang, in terms of origin andcompetitiveness. Clang is a front end for LLVM, itis a compiler for languages such as C, C++, Objective-C, OpenMP and CUDA. Clang was originally released as open source by Apple inJuly 2007. It was mainly developed as a replacement for GCC. The developers atApple had originally tried to use GCC’s frontend, but discovered that the GCCsource code was large and cumbersome to work with, as Apple worked extensivelywith Objective-C. This led to the development of Clang, a new LLVM frontend,that supports more C-based languages. It featured a quick development time, andwas able to compile a Linux kernel within 3 years of being open sourced. Thecombination of LLVM and Clang led to a comprehensive toolchain that couldreplace the full GCC stack.
However, this means that the Clang front end isstill relatively new.With regards to advantages of using Clang, they will bediscussed in terms of End-User features, Utilities and applications, andinternal design and implementation. The Clang frontend compiles quickly, withvery low memory usage, under a series of different tests. As competitionbetween Clang and GCC became more heated, these compilation times became morecompetitive also.Clang has a modular library based architecture, which isextremely flexible and easy to extend. A modular based architecture is moreintuitively flexible for developers to use. Also, as Clang was developed towork better with the Apple IDEs, it allows for tighter integrations with IDEs.In terms of internal design and implementations, Clang has asingle unified parser for C, C++ and objective C, with conformances forvariations of C.
Clang is also straightforward to use, with an easy code base.This makes clang more intuitive to use than GCC. Clang also supports GCC.
Next,this review will discuss the differences between GCC and Clang. GCC Vs CLANGIn this section we will discuss the differences of Clang, andGCC and how differences in goals can lead to strengths and weaknesses indifferent front-end compilers.In terms of language support, GCC does support more languagesthan Clang, including languages such as Java, Fortran etc. GCC also supportsmany more language extensions than Clang.
However, Clangs support of C++ ismore pliant than GCCs.One of the more practical improvements with Clang is that theerror messages and design are more understandable for any developers with a basicunderstanding of the languages being used, and with a basic understanding ofcompilers. Alternatively, the GCC codebase is very old, which can prove to be asteep learning curve for any new developers hoping to make use of it.Conceptually, and from its inception, Clang has been designedas an API, which means any source analysis tools can utilize it easily, as wellas the likes of IDEs, and as well for code generation. GCC in comparison, isstatic, and is extremely difficult to use as an API, which means theintegration with other tools can be difficult.
This also makes it difficult todecouple the front-end from the rest of the compiler.Due to the modular design and architecture of Clang, it iseasy to reuse. GCC, however, due to its basic design, is very difficult toreuse, and very difficult to modify. It also uses a custom garbage collectorand uses global variables extensively, and is also not multi-threadable. Thisleads to further issues, including memory issues, that Clang doesn’texperience.
Clang, in the pursuit of creating a more developer friendlysystem, includes much more clear and concise error messages and diagnostics,with more support for these diagnostics. Some newer versions of GCC haveincorporated some of these Clang features to try to become more useable, butGCC still has progress to make in this area.Clang was also much faster at compilation briefly, before GCCattempted to decrease its compilation times and started a healthy competition.Both compilers now have a much faster compilation time from when Clang was started,and both support a wide range of languages.
GCC still compiles many morelanguages and is still considered more of a standard, but Clang rivals veryclosely for the C family of languages. Clang does feature faster compile times and a lower memory footprint;however, it is not consistently the leader. GCC has become a closer rival overtime, and it depends entirely on the program that is being compiled which willdecide the winner for compile times and memory footprints. To summarise this section, it is clear to see that when itcomes to determining which to use, GCC or Clang, the developer must first lookat their own competencies, the language and requirements of the system.Abstract Syntax Tree (AST)”An abstract syntax tree is a tree representation of theabstract syntactic structure of source code written in a program language.” The front end, i.e. Clang, is responsible for parsing thesource code.
It then checks for errors and turns the input code into an AST. Anabstract syntax tree is used to aid the comprehension of some programs, as whathappens in each line of code may not be exactly what is expected, where the ASTwill show exactly what happens behind the scenes. This helps the traversal ofcodes during reviews and transformations. To use this tree structure, the treeneeds to be traversed efficiently and effectively. The tree is inherently moreconvenient to analyse and modify than any text based analysis. An AST is a tree with a structure of source code that isabstracted from the syntax of its original programming language, allowing aneasy conversion from one language to another, by taking the AST developed by aprogram in one language, and reading it back into source code of anotherlanguage.
The development of an AST is through a method of parsing the givensource code.The AST also shows more details about the fully compiledcode, with a closer representation to the actual program, whilst abstractingthe minute details. It goes into further detail about the minutiae behind aloop for example, and will explain verbosely what happens at each stage of theprogram. An example of this would apply to dynamically typed languagessuch as python and JavaScript. The AST would show in detail when a variablechanges type of value in these languages (e.g. change from int to stringvalues) which would not be as obvious in the source code (e.g.
initial_value =new_value). Clang Abstract Syntax Tree (CAST)The Clang Abstract Syntax Tree is the specific version of theAbstract Syntax Tree, used by Clang, which supports the C family of languages.The CAST is different from traditional ASTs produced by other compilers, as itclosely resembles the written C++ code and the C++ standard.
The nodes in aCAST can closely resemble class hierarchy’s. The CAST uses threecore groups of classes, statements, declarations and types. These threeclasses form the base of a range of specializations. Each of these core groupsdo not inherit from one single base class so each node type requires adifferent interface to visit.
Therefore, each of these nodes have a dedicated traversalmethod, to navigate the tree.CAST can be used with command line arguments that willeffectively reproduce the same source code that it was given, however, it willprint a more explicit version. This will include changes such as prepending”this->” to references to the class variables, which may be implicit in theoriginal source code. This makes comprehension of the code easier, as it isimmediately understood that it is a reference to a class variable, but isusually an unnecessary addition for most programmers. Conclusion