colonel.conllu.lexer module

Module providing the ConlluLexerBuilder class and related exception classes.

class colonel.conllu.lexer.ConlluLexerBuilder[source]

Bases: object

Class containing PLY Lex rules for processing the CoNLL-U format and for creating new related PLY Lexer instances.

Usually you can simply invoke the class method build() which returns a PLY Lexer; such lexer instance is ready to process your input, making use of the rules provided by the ConlluLexerBuilder class itself.

classmethod build()[source]

Returns a PLY Lexer instance for CoNLL-U processing.

The returned lexer makes use of the rules defined by ConlluLexerBuilder.

Return type:Lexer
static find_column(token)[source]

Given a LexToken, it returns the related column number.

Return type:int
states = (('v0', 'exclusive'), ('v1', 'exclusive'), ('v2', 'exclusive'), ('v3', 'exclusive'), ('v4', 'exclusive'), ('v5', 'exclusive'), ('v6', 'exclusive'), ('v7', 'exclusive'), ('v8', 'exclusive'), ('v9', 'exclusive'), ('c1', 'exclusive'), ('c2', 'exclusive'), ('c3', 'exclusive'), ('c4', 'exclusive'), ('c5', 'exclusive'), ('c6', 'exclusive'), ('c7', 'exclusive'), ('c8', 'exclusive'), ('c9', 'exclusive'))
static t_ANY_error(token)[source]
Return type:None
static t_COMMENT(token)[source]

[#][^n]*

Return type:LexToken
static t_DECIMAL_ID(token)[source]

([1-9][0-9]+|[0-9]).[1-9][0-9]*

Return type:LexToken
t_INITIAL_v9_NEWLINE(token)[source]

n

Return type:LexToken
static t_INTEGER_ID(token)[source]

[1-9][0-9]*

Return type:LexToken
static t_RANGE_ID(token)[source]

[1-9][0-9]*-[1-9][0-9]*

Return type:LexToken
static t_c1_FORM(token)[source]

[^nt]+

Return type:LexToken
static t_c2_LEMMA(token)[source]

[^nt]+

Return type:LexToken
static t_c3_UPOS(token)[source]
Return type:LexToken
static t_c4_XPOS(token)[source]

[^nt ]+

Return type:LexToken
static t_c5_FEATS(token)[source]
Return type:LexToken
static t_c6_HEAD(token)[source]

([1-9][0-9]+|[0-9])|_

Return type:LexToken
static t_c7_DEPREL(token)[source]

[^nt ]+

Return type:LexToken
static t_c8_DEPS(token)[source]
Return type:LexToken
static t_c9_MISC(token)[source]

[^nt ]+

Return type:LexToken
t_v0_v1_v2_v3_v4_v5_v6_v7_v8_TAB(token)[source]

t

Return type:LexToken
tokens = ('NEWLINE', 'TAB', 'COMMENT', 'INTEGER_ID', 'RANGE_ID', 'DECIMAL_ID', 'FORM', 'LEMMA', 'UPOS', 'XPOS', 'FEATS', 'HEAD', 'DEPREL', 'DEPS', 'MISC')
exception colonel.conllu.lexer.IllegalCharacterError(token)[source]

Bases: colonel.conllu.lexer.LexerError

Exception raised by ConlluLexerBuilder when a lexer error caused by invalid input is encountered.

An exception instance must be initialized with the LexToken which the lexer was not able to process, so that line_number and column_number can be extracted; a short error message is also generated by the constructor.

column_number = None

Column position, associated with line_number, containing the illegal character, or the start of an illegal sequence.

line_number = None

Line number containing the illegal character, or the start of an illegal sequence.

exception colonel.conllu.lexer.LexerError[source]

Bases: Exception

Generic error class for ConlluLexerBuilder.