GBTokenizer Class Reference
Inherits from | NSObject |
Declared in | GBTokenizer.h GBTokenizer.m |
Overview
Provides common methods for tokenizing input source strings.
Main responsibilities of the class are to split the given source string into tokens and provide simple methods for iterating over the tokens stream. It works upon ParseKit framework's PKTokenizer
. As different parsers require different tokenizers and setups, the class itself doesn't create a tokenizer, but instead requires the client to provide one. Here's an example of simple usage:
NSString *filename = ... NSString *input = ... PKTokenizer *worker = [PKTokenizer tokenizerWithString:input]; GBTokenizer *tokenizer = [[GBTokenizer allow] initWithTokenizer:worker filename:filename]; while (![tokenizer eof]) { NSLog(@"%@", [tokenizer currentToken]); [tokenizer consume:1]; }
This example simply iterates over all tokens and prints each one to the log. If you want to parse a block of input with known start and/or end token, you can use one of the block consuming methods instead. Note that you still need to provide the name of the file as this is used for creating GBSourceInfo
objects for parsed objects!
To make comments parsing simpler, GBTokenizer
automatically enables comment reporting to the underlying PKTokenizer
, however to prevent higher level parsers dealing with complexity of comments, any lookahead and consume method doesn't report them. Instead these methods skip all comment tokens, however they do make them accessible through properties, so if the client wants to check whether there's any comment associated with current token, it can simply ask by sending lastCommentString
. Additionally, the client can also get the value of a comment just before the last one by sending previousCommentString
- this can be used to get any method section comments which aren't associated with any element. If there is no "stand-alone" comment before the last one, previousCommentString
returns nil
. GBTokenizer
goes even further when dealing with comments - it automatically groups single line comments into a single comment group and removes all prefixes and suffixes.
Note: Both comment values are persistent until a new comment is found! At that time, previous comment contains the value of last comment and the new comment is stored as last comment. This allows us parsing through complex code (like #ifdef
/ #elif
/ #else
blocks etc.) without fear of loosing any comment information. It does require manual resetting of comments whenever the comment is actually attached to an object. Resetting is performed by sending resetComments
message to the receiver.
Tasks
Initialization & disposal
-
+ tokenizerWithSource:filename:
Returns initialized autoreleased instance using the given sourcePKTokenizer
. -
– initWithSourceTokenizer:filename:
Initializes tokenizer with the given sourcePKTokenizer
.
Tokenizing handling
-
– currentToken
Returns the current token. -
– lookahead:
Returns the token by looking ahead the given number of tokens from current position. -
– consume:
Consumes the given ammoun of tokens, starting at the current position. -
– consumeTo:usingBlock:
Enumerates and consumes all tokens starting at current token up until the given end token is detected. -
– consumeFrom:to:usingBlock:
Enumerates and consumes all tokens starting at current token up until the given end token is detected. -
– eof
Specifies whether we're at EOF.
Information handling
-
– sourceInfoForCurrentToken
ReturnsGBSourceInfo
for current token and filename. -
– sourceInfoForToken:
ReturnsGBSourceInfo
object describing the given token source information.
Comments handling
-
– resetComments
ResetslastComment
andpreviousComment
values. -
lastComment
Returns the last comment ornil
if comment is not available. property -
previousComment
Returns "stand-alone" comment found immediately before the comment returned fromlastCommentString
. property
Properties
lastComment
Returns the last comment or nil
if comment is not available.
@property (readonly) GBComment *lastComment
Discussion
The returned [GBComment stringValue]
contains the whole last comment string, without prefixes or suffixes. To optimize things a bit, the actual comment string value is prepared on the fly, as you send the message, so it's only handled if needed. As creating comment string adds some computing overhead, you should cache returned value if possible.
If there's no comment available for current token, nil
is returned.
See Also
Declared In
GBTokenizer.h
previousComment
Returns "stand-alone" comment found immediately before the comment returned from lastCommentString
.
@property (readonly) GBComment *previousComment
Discussion
Previous comment is a "stand-alone" comment which is found immediately before lastCommentString
but isn't associated with any language element. These are ussually used to provide meta data and other instructions for formatting or grouping of "normal" comments returned with lastCommentString
. The value should be used at the same time as lastCommentString
as it is automatically cleared on the next consuming! If there's no stand-alone comment immediately before last comment, the value returned is nil
.
The returned [GBComment stringValue]
contains the whole previous comment string, without prefixes or suffixes. To optimize things a bit, the actual comment string value is prepared on the fly, as you send the message, so it's only handled if needed. As creating comment string adds some computing overhead, you should cache returned value if possible.
See Also
Declared In
GBTokenizer.h
Class Methods
tokenizerWithSource:filename:
Returns initialized autoreleased instance using the given source PKTokenizer
.
+ (id)tokenizerWithSource:(PKTokenizer *)tokenizer filename:(NSString *)filename
Parameters
- tokenizer
- The underlying (worker) tokenizer to use for actual splitting.
- filename
- The name of the file without path used for generating source info.
Return Value
Returns initialized instance or nil
if failed.
Exceptions
- NSException
- Thrown if the given tokenizer or filename is
nil
or filename is empty string.
Declared In
GBTokenizer.h
Instance Methods
consume:
Consumes the given ammoun of tokens, starting at the current position.
- (void)consume:(NSUInteger)count
Parameters
- count
- The number of tokens to consume.
Discussion
This effectively "moves" currentToken
to the new position. If EOF is reached before consuming the given ammount of tokens, consuming stops at the end of stream and currentToken
returns EOF token. If comment tokens are detected while consuming, they are not counted and consuming count continues with actual language tokens. However if there is a comment just before the next current token (i.e. after the last consumed token), the comment data is saved and is available through lastCommentString
. Otherwise last comment data is cleared, even if a comment was detected in between.
Declared In
GBTokenizer.h
consumeFrom:to:usingBlock:
Enumerates and consumes all tokens starting at current token up until the given end token is detected.
- (void)consumeFrom:(NSString *)start to:(NSString *)end usingBlock:(void ( ^ ) ( PKToken *token , BOOL *consume , BOOL *stop ))block
Parameters
- start
- Optional starting token or
nil
.
- end
- Ending token.
- block
- The block to be called for each token.
Discussion
For each token, the given block is called which gives client a chance to inspect and handle tokens. If start token is given and current token matches it, the token is consumed without reporting it to block. However if the token doesn't match, the method returns immediately without doint anything. End token is also not reported and is also automatically consumed after all previous tokens are reported. Also read consume:
documentation to understand how comments are dealt with.
Exceptions
- NSException
- Thrown if the given end token is
nil
.
Declared In
GBTokenizer.h
consumeTo:usingBlock:
Enumerates and consumes all tokens starting at current token up until the given end token is detected.
- (void)consumeTo:(NSString *)end usingBlock:(void ( ^ ) ( PKToken *token , BOOL *consume , BOOL *stop ))block
Parameters
- end
- Ending token.
- block
- The block to be called for each token.
Discussion
For each token, the given block is called which gives client a chance to inspect and handle tokens. End token is not reported and is automatically consumed after all previous tokens are reported. Sending this message is equivalent to sending consumeFrom:to:usingBlock:
and passing nil
for start token. Also read consume:
documentation to understand how comments are dealt with.
Exceptions
- NSException
- Thrown if the given end token is
nil
.
Declared In
GBTokenizer.h
currentToken
Returns the current token.
- (PKToken *)currentToken
See Also
Declared In
GBTokenizer.h
eof
Specifies whether we're at EOF.
- (BOOL)eof
Return Value
Returns YES
if we're at EOF, NO
otherwise.
Declared In
GBTokenizer.h
initWithSourceTokenizer:filename:
Initializes tokenizer with the given source PKTokenizer
.
- (id)initWithSourceTokenizer:(PKTokenizer *)tokenizer filename:(NSString *)filename
Parameters
- tokenizer
- The underlying (worker) tokenizer to use for actual splitting.
- filename
- The name of the file without path that's the source for tokenizer's input string.
Return Value
Returns initialized instance or nil
if failed.
Discussion
This is designated initializer.
Exceptions
- NSException
- Thrown if the given tokenizer or filename is
nil
or filename is empty string.
Declared In
GBTokenizer.h
lookahead:
Returns the token by looking ahead the given number of tokens from current position.
- (PKToken *)lookahead:(NSUInteger)offset
Parameters
- offset
- The offset from the current position.
Return Value
Returns the token at the given offset or EOF token if offset point after EOF.
Discussion
If offset "points" within a valid token, the token is returned, otherwise EOF token is returned. Note that this method automatically skips any comment tokens and only counts actual language tokens.
See Also
Declared In
GBTokenizer.h
resetComments
Resets lastComment
and previousComment
values.
- (void)resetComments
Discussion
This message should be sent whenever a comment is "attached" to an object. As comments are persistent, failing to reset would lead to using the same comment for next object as well!
Declared In
GBTokenizer.h
sourceInfoForCurrentToken
Returns GBSourceInfo
for current token and filename.
- (GBSourceInfo *)sourceInfoForCurrentToken
Return Value
Returns declared file data.
Discussion
This is equivalent to sending sourceInfoForToken:
and passing currentToken
as the token parameter.
Exceptions
- NSException
- Thrown if current token is
nil
.
See Also
Declared In
GBTokenizer.h
sourceInfoForToken:
Returns GBSourceInfo
object describing the given token source information.
- (GBSourceInfo *)sourceInfoForToken:(PKToken *)token
Parameters
- token
- The token for which to get file data.
Return Value
Returns declared file data.
Discussion
The method converts the given token's offset within the input string to line number and uses that information together with assigned filename
to prepare the token info object.
Exceptions
- NSException
- Thrown if the given token is
nil
.
See Also
Declared In
GBTokenizer.h