재귀 하향 파서

"오늘의AI위키"는 AI 기술로 일관성 있고 체계적인 최신 지식을 제공하는 혁신 플랫폼입니다.
"오늘의AI위키"의 AI를 통해 더욱 풍부하고 폭넓은 지식 경험을 누리세요.

1. 개요
2. 재귀 하향 파서의 기본 원리
- 2.1. 예측 파싱
- 2.2. 파싱 과정
3. 예제 파서
- 3.1. C 구현
- 3.2. 파스칼 구현
4. 재귀 하향 파서 생성기
5. 종류
6. 참고 문헌
참조

1. 개요

재귀 하향 파서는 백트래킹이 없는 경우 예측 파서라고 불리며, 문맥 자유 문법의 LL(k) 문법 클래스에서만 가능하다. 예측 파서는 선형 시간에 동작하며, LL(k) 문법을 LL(k) 형식으로 변환하는 것보다 LR법이나 LALR법의 파서를 만드는 경우가 많다. 재귀 하향 파서는 니클라우스 비르트의 PL/0 언어와 같은 문법을 처리하며, 각 비종단 기호에 대한 프로시저가 존재한다. C, 파스칼 등의 언어로 구현된 예제가 있으며, TMG, JavaCC, ANTLR 등이 재귀 하향 파서 생성기로 사용된다.

2. 재귀 하향 파서의 기본 원리

니클라우스 비르트가 저서 ''알고리즘 + 자료구조 = 프로그램''에서 제시한 PL/0 프로그래밍 언어의 LL(1) 형식 문법은 다음과 같다. 이는 EBNF와 유사하다.

program = block "." .

block =

["const" ident "=" number {"," ident "=" number} ";"]

["var" ident {"," ident} ";"]

{"procedure" ident ";" block ";"} statement .

statement =

ident ":=" expression

| "call" ident

| "begin" statement {";" statement } "end"

| "if" condition "then" statement

| "while" condition "do" statement .

condition =

"odd" expression

| expression ("="|"#"|"<"|"<="|">"|">=") expression .

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor =

ident

| number

| "(" expression ")" .

여기서 따옴표로 표시된 것은 터미널이며, ''ident''와 ''number''는 암묵적으로 정의된 것으로 간주되어 예외이다. 각 비터미널은 문법 규칙에 의해 정의된다.

2. 1. 예측 파싱

백트래킹이 없는 재귀 하향 파서를 '''예측 파서'''(predictive parser)라고 한다. 예측 구문 분석은 문맥 자유 문법의 일종인 LL(k) 문법 클래스에서만 가능하다. 어떤 양의 정수 k가 존재하여 재귀 하향 구문 분석에서 다음에 사용해야 할 생성 규칙을 선택하는 데 k개의 토큰을 조사하여 결정할 수 있다. LL(k) 문법은 모호성이 없고 좌재귀도 포함하지 않는다. 문맥 자유 문법은 좌재귀가 없는 형식으로 변환 가능하지만, 좌재귀를 제거했다고 해서 LL(k) 문법이 되는 것은 아니다. 예측 파서는 선형 시간에 동작한다.

백트래킹이 있는 재귀 하향 구문 분석에서는 각 생성 규칙을 매번 시도하여 적용해야 할 생성 규칙을 결정한다. 백트래킹이 있는 재귀 하향 구문 분석은 LL(k) 문법 이외에도 적용할 수 있지만, LL(k) 이외의 문법에서 유한 시간 내에 구문 분석이 완료될지는 보장되지 않는다. 또한 완료되었다 하더라도, 백트래킹이 있는 재귀 하향 구문 분석은 지수 시간을 필요로 한다.

예측 파서는 자주 사용되지만, 문법을 LL(k) 형식으로 변환하는 것보다 LR법이나 LALR법의 파서를 만드는 경우도 많다.

2. 2. 파싱 과정

EBNF와 유사한 문법으로 작성된 LL(1) 형식은 다음과 같다. (\[\[니클라우스 비르트]]의 PL/0 프로그래밍 언어, ''알고리즘 + 자료구조 = 프로그램''에서 가져옴)

program = block "." .

block =

["const" ident "=" number {"," ident "=" number} ";"]

["var" ident {"," ident} ";"]

{"procedure" ident ";" block ";"} statement .

statement =

ident ":=" expression

| "call" ident

| "begin" statement {";" statement } "end"

| "if" condition "then" statement

| "while" condition "do" statement .

condition =

"odd" expression

| expression ("="|"#"|"<"|"<="|">"|">=") expression .

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor =

ident

| number

| "(" expression ")" .

터미널은 따옴표로 표시된다. 각 비터미널은 문법의 규칙에 의해 정의되며, 암묵적으로 정의된 것으로 간주되는 ''ident''와 ''number''는 예외이다.

백트래킹이 없는 재귀 하향 파서를 '''예측 파서'''(predictive parser)라고 한다. 예측 구문 분석은 문맥 자유 문법의 일종인 LL(k) 문법 클래스에서만 가능하다. LL(k) 문법은 어떤 양의 정수 k가 존재하여 재귀 하향 구문 분석에서 다음에 사용해야 할 생성 규칙을 선택하는 데 k개의 토큰을 조사하여 결정할 수 있다. LL(k) 문법은 모호성이 없고 좌재귀도 포함하지 않는다. 문맥 자유 문법은 좌재귀가 없는 형식으로 변환 가능하지만, 좌재귀를 제거했다고 해서 LL(k) 문법이 되는 것은 아니다. 예측 파서는 선형 시간에 동작한다.

백트래킹이 있는 재귀 하향 구문 분석에서는 각 생성 규칙을 매번 시도하여 적용해야 할 생성 규칙을 결정한다. 백트래킹이 있는 재귀 하향 구문 분석은 LL(k) 문법 이외에도 적용할 수 있지만, LL(k) 이외의 문법에서 유한 시간 내에 구문 분석이 완료될지는 보장되지 않는다. 또한 완료되었다 하더라도, 백트래킹이 있는 재귀 하향 구문 분석은 지수 시간을 필요로 한다.

예측 파서는 자주 사용되지만, 문법을 LL(k) 형식으로 변환하는 것보다 LR법이나 LALR법의 파서를 만드는 경우도 많다.

다음은 LL(1) 형식의 EBNF의 예시 문법이다(니클라우스 비르트의 PL/0 언어). 단순화를 위해, ''ident''와 ''number''는 종단 기호로 간주된다.

```ebnf

program = block "." .

block =

["const" ident "=" number {"," ident "=" number} ";"]

["var" ident {"," ident} ";"]

{"procedure" ident ";" block ";"} statement .

statement =

[ident ":=" expression

| "call" ident

| "begin" statement {";" statement} "end"

| "if" condition "then" statement

| "while" condition "do" statement

] .

condition =

"odd" expression

| expression ("="|"#"|"<"|"<="|">"|">=") expression

.

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor = ident | number | "(" expression ")" .

```

종단 기호는 따옴표로 묶여 있다(''ident''와 ''number'' 제외). 각 비종단 기호는 문법 규칙에 정의되어 있다.

다음은 위 문법을 정확히 반영하는 예측 파서이다. 각 비종단 기호에 해당하는 프로시저가 있다. 구문 분석은 상향식(top-down) 방식으로 수행되며, 모든 비종단 기호가 처리될 때까지 계속된다. 입력 코드(프로그램의 일부)의 첫 번째 단어는 전역 변수 ''sym''에 저장된다. 그리고 전역 함수 ''getsym''을 사용하여 ''sym''을 업데이트한다.

```c

typedef enum {ident, number, lparen, rparen, times, slash, plus,

minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,

endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,

varsym, procsym, period, oddsym} Symbol;

Symbol sym;

void getsym(void);

void error(const char msg[]);

void expression(void);

int accept(Symbol s) {

if (sym == s) {

getsym();

return 1;

}

return 0;

}

int expect(Symbol s) {

if (accept(s))

return 1;

error("expect: unexpected symbol");

return 0;

}

void factor(void) {

if (accept(ident)) {

;

} else if (accept(number)) {

;

} else if (accept(lparen)) {

expression();

expect(rparen);

} else {

error("factor: syntax error");

getsym();

}

}

void term(void) {

factor();

while (sym == times || sym == slash) {

getsym();

factor();

}

}

void expression(void) {

if (sym == plus || sym == minus)

getsym();

term();

while (sym == plus || sym == minus) {

getsym();

term();

}

}

void condition(void) {

if (accept(oddsym)) {

expression();

} else {

expression();

if (sym == eql || sym == neq || sym == lss ||

sym == leq || sym == gtr || sym == geq) {

getsym();

expression();

} else {

error("condition: invalid operator");

getsym();

}

}

}

void statement(void) {

if (accept(ident)) {

expect(becomes);

expression();

} else if (accept(callsym)) {

expect(ident);

} else if (accept(beginsym)) {

do {

statement();

} while (accept(semicolon));

expect(endsym);

} else if (accept(ifsym)) {

condition();

expect(thensym);

statement();

} else if (accept(whilesym)) {

condition();

expect(dosym);

statement();

}

}

void block(void) {

if (accept(constsym)) {

do {

expect(ident);

expect(eql);

expect(number);

} while (accept(comma));

expect(semicolon);

}

if (accept(varsym)) {

do {

expect(ident);

} while (accept(comma));

expect(semicolon);

}

while (accept(procsym)) {

expect(ident);

expect(semicolon);

block();

expect(semicolon);

}

statement();

}

void program(void) {

getsym();

block();

expect(period);

}

3. 예제 파서

다음은 LL(1) 형식의 EBNF 예시 문법(니클라우스 비르트의 PL/0 언어)이다. 단순화를 위해 'ident'와 'number'는 종단 기호로 간주된다.

```ebnf

program = block "." .

block =

["const" ident "=" number {"," ident "=" number} ";"]

["var" ident {"," ident} ";"]

{"procedure" ident ";" block ";"} statement .

statement =

[ident ":=" expression

| "call" ident

| "begin" statement {";" statement} "end"

| "if" condition "then" statement

| "while" condition "do" statement

] .

condition =

"odd" expression

| expression ("="|"#"|"<"|"<="|">"|">=") expression

.

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor = ident | number | "(" expression ")" .

```

종단 기호는 따옴표로 묶여 있으며('ident'와 'number' 제외), 각 비종단 기호는 문법 규칙에 정의되어 있다.

위 문법을 반영하는 예측 파서는 각 비종단 기호에 해당하는 프로시저가 있으며, 구문 분석은 모든 비종단 기호가 처리될 때까지 하향식(top-down)으로 수행된다. 입력 코드의 첫 번째 단어는 전역 변수 'sym'에 저장되며, 전역 함수 'getsym'을 사용하여 'sym'을 업데이트한다.

Haskell이나 ML과 같은 함수형 언어에서의 재귀 하향 구문 분석 구현은 특히 간단한 편이다. 더 자세한 내용은 [http://www.cs.nott.ac.uk/~gmh/pearl.pdf Functional Pearls: Monadic Parsing in Haskell] 문서를 참조할 수 있다.

3. 1. C 구현

다음은 PL/0 언어를 C로 구현한 재귀 하향 파서이다. 이 파서는 소스 코드를 읽어 파싱에 실패하면 오류 메시지를 표시하고, 성공하면 아무 메시지도 표시하지 않는다.

문법의 각 비단말 기호에 해당하는 프로시저가 있으며, 파싱은 마지막 비단말 기호가 처리될 때까지 상향식으로 진행된다. 이 프로그램은 입력의 현재 기호를 저장하는 전역 변수 `sym`과 `sym`을 업데이트하는 함수 `nextsym`에 의존한다.

`nextsym`과 `error` 함수 구현은 단순성을 위해 생략되었다.

```c

typedef enum {ident, number, lparen, rparen, times, slash, plus,

minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,

endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,

varsym, procsym, period, oddsym} Symbol;

Symbol sym;

void nextsym(void);

void error(const char msg[]);

int accept(Symbol s) {

if (sym == s) {

nextsym();

return 1;

}

return 0;

}

int expect(Symbol s) {

if (accept(s))

return 1;

error("expect: 예상치 못한 기호");

return 0;

}

void factor(void) {

if (accept(ident)) {

;

} else if (accept(number)) {

;

} else if (accept(lparen)) {

expression();

expect(rparen);

} else {

error("factor: 구문 오류");

nextsym();

}

}

void term(void) {

factor();

while (sym == times || sym == slash) {

nextsym();

factor();

}

}

void expression(void) {

if (sym == plus || sym == minus)

nextsym();

term();

while (sym == plus || sym == minus) {

nextsym();

term();

}

}

void condition(void) {

if (accept(oddsym)) {

expression();

} else {

expression();

if (sym == eql || sym == neq || sym == lss || sym == leq || sym == gtr || sym == geq) {

nextsym();

expression();

} else {

error("condition: 잘못된 연산자");

nextsym();

}

}

}

void statement(void) {

if (accept(ident)) {

expect(becomes);

expression();

} else if (accept(callsym)) {

expect(ident);

} else if (accept(beginsym)) {

do {

statement();

} while (accept(semicolon));

expect(endsym);

} else if (accept(ifsym)) {

condition();

expect(thensym);

statement();

} else if (accept(whilesym)) {

condition();

expect(dosym);

statement();

} else {

error("statement: 구문 오류");

nextsym();

}

}

void block(void) {

if (accept(constsym)) {

do {

expect(ident);

expect(eql);

expect(number);

} while (accept(comma));

expect(semicolon);

}

if (accept(varsym)) {

do {

expect(ident);

} while (accept(comma));

expect(semicolon);

}

while (accept(procsym)) {

expect(ident);

expect(semicolon);

block();

expect(semicolon);

}

statement();

}

void program(void) {

nextsym();

block();

expect(period);

}

```

다음은 소스에 나타난 EBNF 예시 문법이다.

```ebnf

program = block "." .

block =

["const" ident "=" number {"," ident "=" number} ";"]

["var" ident {"," ident} ";"]

{"procedure" ident ";" block ";"} statement .

statement =

[ident ":=" expression

| "call" ident

| "begin" statement {";" statement} "end"

| "if" condition "then" statement

| "while" condition "do" statement

] .

condition =

"odd" expression

| expression ("="|"#"|"<"|"<="|">"|">=") expression

.

expression = ["+"|"-"] term {("+"|"-") term} .

term = factor {("*"|"/") factor} .

factor = ident | number | "(" expression ")" .

3. 2. 파스칼 구현

pascal

// Modern Pascal 구현 (www.ModernPascal.com)

// C 예제를 Ozz Nixon이 파스칼로 포팅함

{$S-}

type

SYMBOL = (ident, number, lparen, rparen, times, slash, plus,

minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,

endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,

varsym, procsym, period, oddsym);

var

sym: SYMBOL;

procedure expression; forward;

procedure nextsym;

begin

// 사용자 구현 //

end;

procedure error(const msg: String);

begin

// 사용자 구현 //

end;

function accept(S: Symbol): boolean;

begin

if sym = s then

begin

nextsym;

Result := True;

end

else

Result := False;

end;

function expect(S: Symbol): boolean;

begin

Result := accept(s);

if not result then

error('Expect: 예상치 못한 기호');

end;

procedure factor;

begin

if (accept(ident)) then

begin

//

end

else if (accept(number)) then

begin

//

end

else if (accept(lparen)) then

begin

expression;

expect(rparen);

end

else

begin

error('factor: 구문 오류');

nextsym;

end;

end;

procedure term;

begin

factor;

while (sym = times) or (sym = slash) do

begin

nextsym;

factor;

end;

end;

procedure expression;

begin

if (sym = plus) or (sym = minus) then

nextsym;

term;

while (sym = plus) or (sym = minus) do

begin

nextsym;

term;

end;

end;

procedure condition;

begin

if (accept(oddsym)) then

begin

expression;

end

else

begin

expression;

if (sym = eql) or (sym = neq) or (sym = lss) or (sym = leq) or (sym = gtr) or (sym = geq) then

begin

nextsym;

expression;

end

else

begin

error('condition: 유효하지 않은 연산자');

nextsym;

end;

end;

end;

procedure statement;

begin

if (accept(ident)) then

begin

expect(becomes);

expression;

end

else if (accept(callsym)) then

begin

expect(ident);

end

else if (accept(beginsym)) then

begin

repeat

statement;

until not (accept(semicolon));

expect(endsym);

end

else if (accept(ifsym)) then

begin

condition;

expect(thensym);

statement;

end

else if (accept(whilesym)) then

begin

condition;

expect(dosym);

statement;

end

else

begin

error('statement: 구문 오류');

nextsym;

end;

end;

procedure block;

begin

if (accept(constsym)) then

begin

repeat

expect(ident);

expect(eql);

expect(number);

until not (accept(comma));

expect(semicolon);

end;

if (accept(varsym)) then

begin

repeat

expect(ident);

until not (accept(comma));

expect(semicolon);

end;

while (accept(procsym)) do

begin

expect(ident);

expect(semicolon);

block;

expect(semicolon);

end;

statement;

end;

begin

nextsym;

block;

expect(period);

end;

4. 재귀 하향 파서 생성기

TMG – 1960년대와 1970년대 초에 사용된 초기 컴파일러-컴파일러
JavaCC
Coco/R
ANTLR
Spirit 파서 프레임워크 – 사전 컴파일 단계가 필요 없는 C++ 재귀 하향 파서 생성기 프레임워크
parboiled (Java) – PEG 파싱 라이브러리, 자바용 재귀 하향 파서
[http://spirit.sourceforge.net/ Spirit Parser Framework] - C++ 언어에 의한 재귀 하향 파서 생성 프레임워크. 프리컴파일 단계가 불필요.
[http://www.codecodex.com/wiki/index.php?title=Recursive_descent_parsing CodeCodex: Recursive descent parsing] - C 언어, Java 언어, OCaml 에서의 재귀 하향 파서의 예
[http://search.cpan.org/~dconway/Parse-RecDescent-1.94/lib/Parse/RecDescent.pod Parse::RecDescent]: Perl 에 의한 각종 재귀 하향 파서
[http://pyparsing.sourceforge.net/ pyparsing]: Python 에 의한 각종 재귀 하향 파서

5. 종류

백트래킹이 없는 재귀 하향 파서를 '''예측 파서'''(predictive parser)라고 한다. 예측 구문 분석은 문맥 자유 문법의 일종인 LL(k) 문법 클래스에서만 가능하며, 어떤 양의 정수 k가 존재하여 재귀 하향 구문 분석에서 다음에 사용해야 할 생성 규칙을 선택하는 데 k개의 토큰을 조사하여 결정할 수 있다. LL(k) 문법은 모호성이 없고 좌재귀도 포함하지 않는다. 문맥 자유 문법은 좌재귀가 없는 형식으로 변환 가능하지만, 좌재귀를 제거했다고 해서 LL(k) 문법이 되는 것은 아니다. 예측 파서는 선형 시간에 동작한다.

백트래킹이 있는 재귀 하향 구문 분석에서는 각 생성 규칙을 매번 시도하여 적용해야 할 생성 규칙을 결정한다. 백트래킹이 있는 재귀 하향 구문 분석은 LL(k) 문법 이외에도 적용할 수 있지만, LL(k) 이외의 문법에서 유한 시간 내에 구문 분석이 완료될지는 보장되지 않는다. 또한 완료되었다 하더라도, 백트래킹이 있는 재귀 하향 구문 분석은 지수 시간을 필요로 한다.

예측 파서는 자주 사용되지만, 문법을 LL(k) 형식으로 변환하는 것보다 LR법이나 LALR법의 파서를 만드는 경우도 많다.

Pratt 파서

6. 참고 문헌

Compilers: Principles, Techniques, and Tools^영어 (컴파일러: 원리, 기법, 도구), 알프레드 V. 아호, 라비 세티, 제프리 D. 울만 공저, 초판, 4.4절.
Modern Compiler Implementation in Java, Second Edition^영어 (자바를 이용한 현대적인 컴파일러 구현, 2판), 앤드루 아펠, 2002.
Recursive Programming Techniques^영어 (재귀 프로그래밍 기법), W.H. 버지, 1975.
Crafting a Compiler with C^영어 (C로 컴파일러 제작하기), 찰스 N. 피셔와 리처드 J. 르블랑 주니어 공저, 1991.
Compiling with C# and Java^영어 (C#과 자바를 이용한 컴파일링), 팻 테리, 2005, 624쪽.
Algorithms + Data Structures = Programs^영어 (알고리즘 + 자료구조 = 프로그램), 니클라우스 비르트, 1975.
Compiler Construction^영어 (컴파일러 구축), 니클라우스 비르트, 1996.

참조

_[1] FOLDOC Recursive+descent+parser
_[2] 서적 Recursive Programming Techniques https://archive.org/[...] Addison-Wesley Publishing Company
_[3] 서적 A Practical Approach to Compiler Construction https://books.google[...] Springer 2017-03-22
_[4] 서적 Compilers: Principles, Techniques and Tools https://archive.org/[...] Addison Wesley 1986

본 사이트는 AI가 위키백과와 뉴스 기사,정부 간행물,학술 논문등을 바탕으로 정보를 가공하여 제공하는 백과사전형 서비스입니다.
모든 문서는 AI에 의해 자동 생성되며, CC BY-SA 4.0 라이선스에 따라 이용할 수 있습니다.
하지만, 위키백과나 뉴스 기사 자체에 오류, 부정확한 정보, 또는 가짜 뉴스가 포함될 수 있으며, AI는 이러한 내용을 완벽하게 걸러내지 못할 수 있습니다.
따라서 제공되는 정보에 일부 오류나 편향이 있을 수 있으므로, 중요한 정보는 반드시 다른 출처를 통해 교차 검증하시기 바랍니다.

문의하기 : help@durumis.com