G53CMP-E1 The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 3 MODULE, AUTUMN SEMESTER 2014–2015 COMPILERS ANSWERS Time allowed TWO hours Candidates may complete the front cover of their answer book and sign their desk card but must NOT write anything else until the start of the examination period is announced. Answer ALL THREE questions No calculators are permitted in this examination. Dictionaries are not allowed with one exception. Those whose first language is not English may use a standard translation dictionary to translate between that language and English provided that neither language is the subject of this examination. Subject-specific translation directories are not permitted. No electronic devices capable of storing and retrieving text, including electronic dictionaries, may be used. Note: ANSWERS G53CMP-E1 Turn Over 2 G53CMP-E1 Knowledge classification: Following School recommendation, the (sub)questions have been classified as follows, using a subset of Bloom’s Taxonomy: K: Knowledge C: Comprehension A: Application Note that some questions are closely related to the coursework. This is intentional and as advertised to the students; the coursework is a central aspect of the module and as such partly examined under exam conditions. Question 1 (a) Explain and give examples of the following kinds of compile-time error: • lexical error • syntax error (context-free) • contextual error (6) Answer: [C] • A lexical error occurs when the input does not conform to the lexical syntax of a language. Examples include encountering a character that cannot be part of any valid lexeme, or an ill-formed string or numerical literal. • A syntax error occurs when the input does not conform to the context-free syntax of a language. A typical example would be unbalanced parentheses, or missing terminating keyword, such as a repeat without an until. • A contextual error occurs when contextual constraints are violated. Examples include type errors, such as adding a Boolean to an integer in a language that requires the terms of addition to be of the same type, and a missing declaration of a variable in a language that requires declaration before use. G53CMP-E1 3 G53CMP-E1 (b) Draw the parse (or derivation) tree for the following MiniTriangle fragment. The relevant grammar is given in Appendix A. Start from the production for “Command”. if x[i] > 10 then putint(k) else i := (i) - 1 (9) Answer: [A] Command if Expression then Command 1 else Command 3 2 1 Expression BinaryOperator Expression PrimaryExpression > PrimaryExpression VarExpression VarExpression [ Expression Identifier PrimaryExpression x VarExpression IntegerLiteral ] 10 Identifier i G53CMP-E1 Turn Over 4 G53CMP-E1 2 VarExpression Expressions ( Identifier Expressions1 putint Expression ) PrimaryExpression VarExpression Identifier k 3 VarExpression Expression := Identifier Expression BinaryOperator Expression i PrimaryExpression - PrimaryExpression ( Expression PrimaryExpression VarExpression Identifier i G53CMP-E1 ) IntegerLiteral 1 5 G53CMP-E1 (c) Consider the following context-free grammar (CFG): S → AB | BC A → Aa | c B → bbB | d C → c S, A, B, and C are nonterminal symbols, S is the start symbol, and a, b, c, and d are terminal symbols. The DFA below recognizes the viable prefixes for this CFG: I2 I0 S → AB · I1 S → ·AB S → ·BC A → ·Aa A → ·c B → ·bbB B → ·d B I3 S →A·B A→A·a B → ·bbB B → ·d A b a A → Aa · b I4 d B → b · bB d I7 c I11 b B → d· B A→c· I9 I8 C S →B·C C → ·c c b d S → BC · I10 B → bb · B B → ·bbB B → ·d B I5 I6 C →c· B → bbB · Show how an LR(0) shift-reduce parser parses the string caabbbbd by completing the following table (copy it to your answer book; do not write on the examination paper): State I0 I11 .. . Stack ǫ c .. . Input caabbbbd aabbbbd .. . Move Shift Reduce by A → c .. . S ǫ Done (10) G53CMP-E1 Turn Over 6 G53CMP-E1 Answer: [A] State I0 I11 I1 I3 I1 I3 I1 I4 I5 I4 I5 I7 I6 I6 I2 G53CMP-E1 Stack ǫ c A Aa A Aa A Ab Abb Abbb Abbbb Abbbbd AbbbbB AbbB AB S Input caabbbbd aabbbbd aabbbbd abbbbd abbbbd bbbbd bbbbd bbbd bbd bd d ǫ ǫ ǫ ǫ ǫ Move Shift Reduce Shift Reduce Shift Reduce Shift Shift Shift Shift Shift Reduce Reduce Reduce Reduce Done by A → c by A → Aa by A → Aa by by by by B→d B → bbB B → bbB S → AB 7 G53CMP-E1 Question 2 (a) Consider the following expression language: e → | | | n x (e,e) if e then e else e expressions: natural numbers, n ∈ N variables, x ∈ Name pair constructor conditional where Name is the set of variable names. The types are given by the following grammar: t → | | Nat Bool t×t types: natural numbers Booleans pair (product) type The ternary relation Γ ⊢ e : t says that expression e has type t in the typing context Γ. It is defined by the following typing rules: Γ ⊢ n : Nat (T-NAT) x:t∈Γ Γ⊢x:t (T-VAR) Γ ⊢ e1 : t1 Γ ⊢ e2 : t2 Γ ⊢ (e1 ,e2 ) : t1 × t2 (T-PAIR) Γ ⊢ e1 : Bool Γ ⊢ e2 : t Γ ⊢ e3 : t Γ ⊢ if e1 then e2 else e3 : t (T-COND) A typing context, Γ in the rules above, is a comma-separated sequence of variable-name and type pairs, such as x : Nat, y : Bool, z : Nat or empty, denoted ∅. Typing contexts are extended on the right, e.g. Γ, z : Nat, the membership predicate is denoted by ∈, and lookup is from right to left, ensuring recent bindings hide earlier ones. (i) Use the typing rules given above to formally prove that the expression if b then p else (b,7) G53CMP-E1 Turn Over 8 G53CMP-E1 has type Bool × Nat in the typing context Γ1 = b : Bool, p : Bool × Nat The proof should be given as a proof tree. Answer: [A] b : Bool ∈ Γ1 T-VAR Γ1 ⊢ b : Bool (9) p : Bool × Nat ∈ Γ1 T-VAR below Γ1 ⊢ p : Bool × Nat Γ1 ⊢ (b,7) : Bool × Nat T-COND Γ1 ⊢ if b then p else (b,7) : Bool × Nat b : Bool ∈ Γ1 T-VAR T-NAT Γ1 ⊢ b : Bool Γ1 ⊢ 7 : Nat T-PAIR Γ1 ⊢ (b,7) : Bool × Nat (ii) The expression language defined above is to be extended with projection functions on pairs and with an operator for addition as follows: e → ... | | | ... fst e snd e e+e expressions: ... Projection of first component Projection of second component Addition Provide a typing rule for each of the new expression constructs in the same style as the existing rules. The projection functions must be applied to a value of product type and returns the first and second component, respectively, and the arguments to the addition operator must both be natural numbers and the result is a natural number. (6) Answer: [A] Γ ⊢ e : t1 × t2 Γ ⊢ fst e : t1 (T-FST) Γ ⊢ e : t1 × t2 Γ ⊢ snd e : t2 (T-SND) Γ ⊢ e1 : Nat Γ ⊢ e2 : Nat Γ ⊢ e1 + e2 : Nat (T-ADD) (b) Explain what is meant by “well-typed programs do not go wrong”. Your answer should define any key technical terms used, clarify the relation to the notions of progress and preservation, and should specifically discuss the possibility of run-time errors and non-termination. (10) Answer: [C] G53CMP-E1 9 G53CMP-E1 “Well-typed programs do not go wrong” means that it is guaranteed that a well-typed program will not get stuck; i.e., end up in an ill-defined semantic configuration. This breaks down into two parts: progress, which means that a well-typed program either has been evaluated to a value or that it can be evaluated further, and preservation, which means that well-typedness is invariant under evaluation. However, commonly used type systems cannot in general rule out all kinds of errors. For example, division by zero and out-of-bound array indices are usually errors that are only caught at run-time, after type checking. In order for progress and preservation to hold, the semantics has to explicitly account for those remaining kinds of errors by introducing some form of exception and explaining how evaluation is to progress in the event of exceptions. It is then still the case that well-typed programs do not go wrong in the sense of getting stuck in an ill-defined semantic configuration. Also, there are, in general, no termination guarantees: the evaluation of a well-typed program may loop in that a value (or exception) may never be reached. Marking: 5 marks for the first part, 5 marks for good discussion on possibility of run-time errors and non-termination. G53CMP-E1 Turn Over 10 G53CMP-E1 Question 3 Consider the language given by the following abstract syntax: C → | | | | E skip C;C x := E if E then C else C while E do C Commands: Do nothing Sequencing Assignment Conditional command while-loop → | | | Expressions: Literal integer Variable Addition Comparison n x E+E E=E For this question, you will develop code generation functions for the above language, targeting the Triangle Abstract Machine (TAM). See appendix B for a specification of the TAM instructions. Assume conventional (imperative) semantics for the above language constructs, along with the following: • x is the syntactic category of variable identifiers, ranging over the 26 names a, b, . . . , z. They refer to 26 global variables stored at SB + 0 (a) to SB + 25 (z). • The while-loop has the following semantics: the loop expression E is evaluated; if the result is true, the loop body C is executed next and then the process is repeated from the evaluation of the loop expression; otherwise execution continues after the loop. The code generation functions should be specified through code templates in the style used in the lectures. Assume a function addr (x) that returns the address (of the form [SB + d]) for a variable x. Further, you will have to consider generation of fresh labels. Assume a monadic-style operation l ← fresh to bind a variable l to a distinct label that then can be used in jumps and as jump targets. For example: execute [[ if E then C1 else C2 ]] = l1 : G53CMP-E1 l1 ← fresh ... JUMP l1 ... ... 11 G53CMP-E1 (a) Write a code generation function evaluate that generates TAM code for evaluating an expression. The first case should start like: evaluate [[ n ]] = ... (4) Answer: [A] The following function generates code for the specified expressions: evaluate [[ n ]] evaluate [[ x ]] evaluate [[ E1 + E2 ]] = = = evaluate [[ E1 = E2 ]] = LOADL n LOAD addr (x) evaluate E1 evaluate E2 ADD evaluate E1 evaluate E2 EQL Marking: 1 mark for each case. G53CMP-E1 Turn Over 12 G53CMP-E1 (b) Write a code generation function execute that generates stack machine code for executing commands. It should handle the five forms of commands specified by the abstract syntax above. (12) Answer: [A] execute [[ skip ]] execute [[ C1 ; C2 ]] = = execute [[ x := E ]] = execute [[ if E then C1 else C2 ]] = else: endif : execute [[ while E do C ]] = loop: ǫ execute C1 execute C2 evaluate E STORE addr (x) else ← fresh endif ← fresh evaluate E JUMPIFZ else execute C1 JUMP endif execute C2 loop ← fresh out ← fresh evaluate E JUMPIFZ out execute C JUMP loop out: Marking: 1 mark each for skip, sequence; 2 for assignment; 4 marks for conditional; 4 marks for while-loop. G53CMP-E1 13 G53CMP-E1 (c) Now assume we wish to extend the language with the commands break and continue: C → | | | Commands: ... break continue Terminate innermost loop Continue with next loop iteration The semantics is that break will terminate the innermost loop, with execution continuing immediately after the loop, while continue will skip whatever remains of the loop body, and continue execution directly with the next loop iteration. Modify and extend execute to generate code for the extended language. Note that execute will need (an) extra argument(s) for contextual information to keep track of the current innermost loop. You may assume that using break or continue outside any loop is a static error. Thus your code generator does not need to handle that case. Your answer should include the modified execute cases for if and while, as well as the cases for the two new commands. (9) Answer: [A] execute bl cl [[ if E then C1 else C2 ]] = else: endif : execute bl cl [[ while E do C ]] = loop: else ← fresh endif ← fresh evaluate E JUMPIFZ else execute bl cl C1 JUMP endif execute bl cl C2 loop ← fresh out ← fresh evaluate E JUMPIFZ out execute out loop C JUMP loop out: execute bl cl [[ break ]] execute bl cl [[ continue ]] = = JUMP bl JUMP cl Marking: 3 marks for getting the general idea; 2 marks for correct calls to execute in case for if; 2 marks for correct call to execute in case for while; 1 mark each for correct code for break and continue. G53CMP-E1 Turn Over 14 G53CMP-E1 Appendix A: MiniTriangle Grammars This appendix contains the grammars for the MiniTriangle lexical, concrete, and abstract syntax. The following typographical conventions are used to distinguish between terminals and non-terminals: • nonterminals are written like this • terminals are written like this • terminals with variable spelling and special symbols are written like this MiniTriangle Lexical Syntax: Program → (Token | Separator )∗ Token → Keyword | Identifier | IntegerLiteral | Operator | , | ; | : | := | = | ( | ) | [ | ] | eot Keyword → begin | const | do | else | end | fun | if | in | let | out | proc | then | var | while Identifier → Letter | Identifier Letter | Identifier Digit except Keyword IntegerLiteral → Digit | IntegerLiteral Digit Operator → ^ | * | / | + | - | < | <= | == | != | >= | > | && | || | ! Letter → A | B | ...| Z | a | b | ...| z Digit → 0|1|2|3|4|5|6|7|8|9 Separator → Comment | space | eol Comment → // (any character except eol )∗ eol G53CMP-E1 15 G53CMP-E1 MiniTriangle Concrete Syntax: Program → Command Commands → | Command Command ; Commands Command → | | VarExpression := Expression VarExpression ( Expressions ) if Expression then Command else Command while Expression do Command let Declarations in Command begin Commands end | | | Expressions → | ǫ Expressions1 Expressions1 → | Expression Expression , Expressions1 Expression → | PrimaryExpression Expression BinaryOperator Expression PrimaryExpression → | | | | | IntegerLiteral VarExpression UnaryOperator PrimaryExpression VarExpression ( Expressions ) [ Expressions ] ( Expression ) VarExpression → | Identifier VarExpression [ Expression ] BinaryOperator → ^ | * | / | + | - | < | <= | == | != | >= | > | && | || UnaryOperator → -|! G53CMP-E1 Turn Over 16 G53CMP-E1 Declarations → | Declaration Declaration ; Declarations Declaration → | | | | const Identifier : TypeDenoter = Expression var Identifier : TypeDenoter var Identifier : TypeDenoter := Expression fun Identifier ( ArgDecls ) : TypeDenoter = Expression proc Identifier ( ArgDecls ) Command ArgDecls → | ǫ ArgDecls1 ArgDecls1 → | ArgDecl ArgDecl , ArgDecls1 ArgDecl → | | | Identifier : TypeDenoter in Identifier : TypeDenoter out Identifier : TypeDenoter var Identifier : TypeDenoter TypeDenoter → | Identifier TypeDenoter [ IntegerLiteral ] Note that the productions for Expression makes the grammar as stated above ambiguous. Operator precedence and associativity for the binary operators as defined in the following table is used to disambiguate: Operator ^ * / + < <= == != >= > && || Precedence 1 2 3 4 5 6 Associativity right left left non left left A precedence level of 1 means the highest precedence, 2 means second highest, and so on. G53CMP-E1 17 MiniTriangle Abstract Syntax: G53CMP-E1 Name = Identifier ∪ Operator . Program → Command Program Command → | | | Expression := Expression Expression ( Expression ∗ ) begin Command ∗ end if Expression then Command else Command while Expression do Command let Declaration ∗ in Command CmdAssign CmdCall CmdSeq CmdIf | | CmdWhile CmdLet Expression → | | | | IntegerLiteral Name Expression ( Expression ∗ ) [ Expression ∗ ] Expression [ Expression ] ExpLitInt ExpVar ExpApp ExpAry ExpIx Declaration → const Name : TypeDenoter = Expression var Name : TypeDenoter ( := Expression | ǫ ) fun Name ( ArgDecl ∗ ) : TypeDenoter = Expression proc Name ( ArgDecl ∗ ) Command DeclConst | | | DeclVar DeclFun DeclProc ArgDecl → ArgMode Name : TypeDenoter ArgDecl ArgMode → | | | ǫ in out var ByValue ByRefIn ByRefOut ByRefVar TypeDenoter → → Name TypeDenoter [ IntegerLiteral ] TDBaseType TDArray G53CMP-E1 Turn Over 18 G53CMP-E1 Appendix B: Triangle Abstract Machine (TAM) Instructions Meta variable a b ca d l m, n x, y xn Address [SB + [SB [LB + [LB [ST + [ST Instruction LABEL l LOADL n LOADCA l LOAD a LOADA a LOADI d STORE a STOREI d G53CMP-E1 form d] d] d] d] d] d] Meaning Address: one of the forms specified by table below when part of an instruction, specific stack address when on the stack Boolean value (false = 0 or true = 1) Code address; address to routine in the code segment Displacement; i.e., offset w.r.t. address in register or on the stack Label name Integer Any kind of stack data Vector of n items, in this case any kind Description Address given by contents of register SB (Stack Base) +/− displacement d Address given by contents of register LB (Local Base) +/− displacement d Address given by contents of register ST (Stack Top) +/− displacement d Stack effect Description Label Pseudo instruction: symbolic location Load and store . . . ⇒ n, . . . Push literal integer n onto stack . . . ⇒ addr(l), . . . Push address of label l (code segment) onto stack . . . ⇒ [a], . . . Push contents at address a onto stack . . . ⇒ a, . . . Push address a onto stack a, . . . ⇒ [a + d], . . . Load indirectly; push contents at address a + d onto stack n, . . . ⇒ . . . Pop value n from stack and store at address a a, n, . . . ⇒ . . . Store indirectly; store n at address a+d — 19 G53CMP-E1 Instruction Stack effect Description Block operations LOADLB m n . . . ⇒ mn , . . . Push block of n literal integers m onto stack LOADIB n a, . . . ⇒ Load block of size n indirectly [a + (n − 1)], . . . , [a + 0], . . . STOREIB n a, xn , . . . ⇒ . . . Store block of size n indirectly m n m POP m n x , y , ... ⇒ x , ... Pop n values below top m values Arithmetic operations ADD n2 , n1 , . . . ⇒ n1 + n2 , . . . Add n1 and n2 , replacing n1 and n2 with the sum SUB n2 , n1 , . . . ⇒ n1 − n2 , . . . Subtract n2 from n1 , replacing n1 and n2 with the difference MUL n2 , n1 , . . . ⇒ n1 · n2 , . . . Multiply n1 by n2 , replacing n1 and n2 with the product DIV n2 , n1 , . . . ⇒ n1 /n2 , . . . Divide n1 by n2 , replacing n1 and n2 with the (integer) quotient NEG n, . . . ⇒ −n, . . . Negate n, replacing n with the result Comparison & logical operations (false = 0, true = 1) LSS n2 , n1 , . . . ⇒ n1 < n2 , . . . Check if n1 is smaller than n2 , replacing n1 and n2 with the Boolean result EQL n2 , n1 , . . . ⇒ n1 = n2 , . . . Check if n1 is equal to n2 , replacing n1 and n2 with the Boolean result GTR n2 , n1 , . . . ⇒ n1 > n2 , . . . Check if n1 is greater than n2 , replacing n1 and n2 with the Boolean result AND b2 , b1 , . . . ⇒ b 1 ∧ b2 , . . . Logical conjunction of b1 and b2 , replacing b1 and b2 with the Boolean result OR b2 , b1 , . . . ⇒ b 1 ∨ b2 , . . . Logical disjunction of b1 and b2 , replacing b1 and b2 with the Boolean result NOT b, . . . ⇒ ¬b, . . . Logical negation of b, replacing b with the result G53CMP-E1 Turn Over 20 Instruction JUMP l JUMPIFZ l JUMPIFNZ l CALL l CALLI RETURN m n PUTINT PUTCHR GETINT GETCHR HALT G53CMP-E1 G53CMP-E1 Stack effect Description Control transfer — Jump unconditionally to location identified by label l n, . . . ⇒ . . . Jump to location identified by label l if n = 0 (i.e., n is false) n, . . . ⇒ . . . Jump to location identified by label l if n 6= 0 (i.e., n is true) . . . ⇒ pc + 1, lb, 0, . . . Call global subroutine at location identified by location l, setting up activation record by pushing static link (0 for global level), dynamic link (value of LB), and return address (PC+1, address of instruction after the call instruction) onto the stack ca, sl , . . . ⇒ Call subroutine indirectly; pc + 1, lb, sl , . . . address of routine (ca) and static link to use (sl ) on top of the stack; activation record as for CALL m n x , pc, lb, 0, y . . . Return from subroutine, ⇒ xm , . . . replacing activation record by result and restoring LB Input/Output n, . . . ⇒ . . . Print n to the terminal as a decimal integer n, . . . ⇒ . . . Print the character with character code n to the terminal . . . ⇒ n, . . . Read decimal integer n from the terminal and push onto the stack . . . ⇒ n, . . . Read character from the terminal and push its character code n onto the stack TAM Control — Stop execution and halt the machine End

© Copyright 2017 ExploreDoc