GBTokenizer Class Reference
| Inherits from | NSObject |
| Declared in | GBTokenizer.h GBTokenizer.m |
Overview
Provides common methods for tokenizing input source strings.
Main responsibilities of the class are to split the given source string into tokens and provide simple methods for iterating over the tokens stream. It works upon ParseKit framework's PKTokenizer. As different parsers require different tokenizers and setups, the class itself doesn't create a tokenizer, but instead requires the client to provide one. Here's an example of simple usage:
NSString *filename = ...
NSString *input = ...
PKTokenizer *worker = [PKTokenizer tokenizerWithString:input];
GBTokenizer *tokenizer = [[GBTokenizer allow] initWithTokenizer:worker filename:filename];
while (![tokenizer eof]) {
NSLog(@"%@", [tokenizer currentToken]);
[tokenizer consume:1];
}
This example simply iterates over all tokens and prints each one to the log. If you want to parse a block of input with known start and/or end token, you can use one of the block consuming methods instead. Note that you still need to provide the name of the file as this is used for creating GBSourceInfo objects for parsed objects!
To make comments parsing simpler, GBTokenizer automatically enables comment reporting to the underlying PKTokenizer, however to prevent higher level parsers dealing with complexity of comments, any lookahead and consume method doesn't report them. Instead these methods skip all comment tokens, however they do make them accessible through properties, so if the client wants to check whether there's any comment associated with current token, it can simply ask by sending lastCommentString. Additionally, the client can also get the value of a comment just before the last one by sending previousCommentString - this can be used to get any method section comments which aren't associated with any element. If there is no "stand-alone" comment before the last one, previousCommentString returns nil. GBTokenizer goes even further when dealing with comments - it automatically groups single line comments into a single comment group and removes all prefixes and suffixes.
Note: Both comment values are persistent until a new comment is found! At that time, previous comment contains the value of last comment and the new comment is stored as last comment. This allows us parsing through complex code (like #ifdef / #elif / #else blocks etc.) without fear of loosing any comment information. It does require manual resetting of comments whenever the comment is actually attached to an object. Resetting is performed by sending resetComments message to the receiver.
Tasks
Initialization & disposal
-
+ tokenizerWithSource:filename:Returns initialized autoreleased instance using the given sourcePKTokenizer. -
– initWithSourceTokenizer:filename:Initializes tokenizer with the given sourcePKTokenizer.
Tokenizing handling
-
– currentTokenReturns the current token. -
– lookahead:Returns the token by looking ahead the given number of tokens from current position. -
– consume:Consumes the given ammoun of tokens, starting at the current position. -
– consumeTo:usingBlock:Enumerates and consumes all tokens starting at current token up until the given end token is detected. -
– consumeFrom:to:usingBlock:Enumerates and consumes all tokens starting at current token up until the given end token is detected. -
– eofSpecifies whether we're at EOF.
Information handling
-
– sourceInfoForCurrentTokenReturnsGBSourceInfofor current token and filename. -
– sourceInfoForToken:ReturnsGBSourceInfoobject describing the given token source information.
Comments handling
-
– resetCommentsResetslastCommentandpreviousCommentvalues. -
lastCommentReturns the last comment ornilif comment is not available. property -
previousCommentReturns "stand-alone" comment found immediately before the comment returned fromlastCommentString. property
Properties
lastComment
Returns the last comment or nil if comment is not available.
@property (readonly) GBComment *lastCommentDiscussion
The returned [GBComment stringValue] contains the whole last comment string, without prefixes or suffixes. To optimize things a bit, the actual comment string value is prepared on the fly, as you send the message, so it's only handled if needed. As creating comment string adds some computing overhead, you should cache returned value if possible.
If there's no comment available for current token, nil is returned.
See Also
Declared In
GBTokenizer.hpreviousComment
Returns "stand-alone" comment found immediately before the comment returned from lastCommentString.
@property (readonly) GBComment *previousCommentDiscussion
Previous comment is a "stand-alone" comment which is found immediately before lastCommentString but isn't associated with any language element. These are ussually used to provide meta data and other instructions for formatting or grouping of "normal" comments returned with lastCommentString. The value should be used at the same time as lastCommentString as it is automatically cleared on the next consuming! If there's no stand-alone comment immediately before last comment, the value returned is nil.
The returned [GBComment stringValue] contains the whole previous comment string, without prefixes or suffixes. To optimize things a bit, the actual comment string value is prepared on the fly, as you send the message, so it's only handled if needed. As creating comment string adds some computing overhead, you should cache returned value if possible.
See Also
Declared In
GBTokenizer.hClass Methods
tokenizerWithSource:filename:
Returns initialized autoreleased instance using the given source PKTokenizer.
+ (id)tokenizerWithSource:(PKTokenizer *)tokenizer filename:(NSString *)filenameParameters
- tokenizer
- The underlying (worker) tokenizer to use for actual splitting.
- filename
- The name of the file without path used for generating source info.
Return Value
Returns initialized instance or nil if failed.
Exceptions
- NSException
- Thrown if the given tokenizer or filename is
nilor filename is empty string.
Declared In
GBTokenizer.hInstance Methods
consume:
Consumes the given ammoun of tokens, starting at the current position.
- (void)consume:(NSUInteger)countParameters
- count
- The number of tokens to consume.
Discussion
This effectively "moves" currentToken to the new position. If EOF is reached before consuming the given ammount of tokens, consuming stops at the end of stream and currentToken returns EOF token. If comment tokens are detected while consuming, they are not counted and consuming count continues with actual language tokens. However if there is a comment just before the next current token (i.e. after the last consumed token), the comment data is saved and is available through lastCommentString. Otherwise last comment data is cleared, even if a comment was detected in between.
Declared In
GBTokenizer.hconsumeFrom:to:usingBlock:
Enumerates and consumes all tokens starting at current token up until the given end token is detected.
- (void)consumeFrom:(NSString *)start to:(NSString *)end usingBlock:(void ( ^ ) ( PKToken *token , BOOL *consume , BOOL *stop ))blockParameters
- start
- Optional starting token or
nil.
- end
- Ending token.
- block
- The block to be called for each token.
Discussion
For each token, the given block is called which gives client a chance to inspect and handle tokens. If start token is given and current token matches it, the token is consumed without reporting it to block. However if the token doesn't match, the method returns immediately without doint anything. End token is also not reported and is also automatically consumed after all previous tokens are reported. Also read consume: documentation to understand how comments are dealt with.
Exceptions
- NSException
- Thrown if the given end token is
nil.
Declared In
GBTokenizer.hconsumeTo:usingBlock:
Enumerates and consumes all tokens starting at current token up until the given end token is detected.
- (void)consumeTo:(NSString *)end usingBlock:(void ( ^ ) ( PKToken *token , BOOL *consume , BOOL *stop ))blockParameters
- end
- Ending token.
- block
- The block to be called for each token.
Discussion
For each token, the given block is called which gives client a chance to inspect and handle tokens. End token is not reported and is automatically consumed after all previous tokens are reported. Sending this message is equivalent to sending consumeFrom:to:usingBlock: and passing nil for start token. Also read consume: documentation to understand how comments are dealt with.
Exceptions
- NSException
- Thrown if the given end token is
nil.
Declared In
GBTokenizer.hcurrentToken
Returns the current token.
- (PKToken *)currentTokenSee Also
Declared In
GBTokenizer.heof
Specifies whether we're at EOF.
- (BOOL)eofReturn Value
Returns YES if we're at EOF, NO otherwise.
Declared In
GBTokenizer.hinitWithSourceTokenizer:filename:
Initializes tokenizer with the given source PKTokenizer.
- (id)initWithSourceTokenizer:(PKTokenizer *)tokenizer filename:(NSString *)filenameParameters
- tokenizer
- The underlying (worker) tokenizer to use for actual splitting.
- filename
- The name of the file without path that's the source for tokenizer's input string.
Return Value
Returns initialized instance or nil if failed.
Discussion
This is designated initializer.
Exceptions
- NSException
- Thrown if the given tokenizer or filename is
nilor filename is empty string.
Declared In
GBTokenizer.hlookahead:
Returns the token by looking ahead the given number of tokens from current position.
- (PKToken *)lookahead:(NSUInteger)offsetParameters
- offset
- The offset from the current position.
Return Value
Returns the token at the given offset or EOF token if offset point after EOF.
Discussion
If offset "points" within a valid token, the token is returned, otherwise EOF token is returned. Note that this method automatically skips any comment tokens and only counts actual language tokens.
See Also
Declared In
GBTokenizer.hresetComments
Resets lastComment and previousComment values.
- (void)resetCommentsDiscussion
This message should be sent whenever a comment is "attached" to an object. As comments are persistent, failing to reset would lead to using the same comment for next object as well!
Declared In
GBTokenizer.hsourceInfoForCurrentToken
Returns GBSourceInfo for current token and filename.
- (GBSourceInfo *)sourceInfoForCurrentTokenReturn Value
Returns declared file data.
Discussion
This is equivalent to sending sourceInfoForToken: and passing currentToken as the token parameter.
Exceptions
- NSException
- Thrown if current token is
nil.
See Also
Declared In
GBTokenizer.hsourceInfoForToken:
Returns GBSourceInfo object describing the given token source information.
- (GBSourceInfo *)sourceInfoForToken:(PKToken *)tokenParameters
- token
- The token for which to get file data.
Return Value
Returns declared file data.
Discussion
The method converts the given token's offset within the input string to line number and uses that information together with assigned filename to prepare the token info object.
Exceptions
- NSException
- Thrown if the given token is
nil.
See Also
Declared In
GBTokenizer.h