colonel.conllu.parser module

Module providing the ConlluParserBuilder class and related exception classes.

class colonel.conllu.parser.ConlluParserBuilder[source]

Bases: object

Class containing PLY Yacc rules for processing the CoNLL-U format and for creating new related PLY LRParser instances.

Usually you can simply invoke the class method build() which returns a PLY LRParser; such parser instance is ready to process your input, making use of the rules provided by the ConlluParserBuilder class itself.

As usual, this class is paired with an associated lexer, which in in this case is served by ConlluLexerBuilder.

classmethod build()[source]

Returns a PLY LRParser instance for CoNLL-U processing.

The returned parser makes use of the rules defined by ConlluParserBuilder.

Return type:LRParser
static p_comment(prod)[source]

comment : COMMENT NEWLINE

Return type:None
static p_comments_many(prod)[source]

comments : comments comment

Return type:None
static p_comments_one(prod)[source]

comments : comment

Return type:None
static p_error(token)[source]
Return type:None
static p_sentence_with_comments(prod)[source]

sentence : comments wordlines NEWLINE

Return type:None
static p_sentence_without_comments(prod)[source]

sentence : wordlines NEWLINE

Return type:None
static p_sentences_many(prod)[source]

sentences : sentences sentence

Return type:None
static p_sentences_one(prod)[source]

sentences : sentence

Return type:None
static p_wordline_emptynode(prod)[source]

wordline : DECIMAL_ID TAB FORM TAB LEMMA TAB UPOS TAB XPOS TAB FEATS TAB HEAD TAB DEPREL TAB DEPS TAB MISC NEWLINE

Return type:None
static p_wordline_multiword(prod)[source]

wordline : RANGE_ID TAB FORM TAB LEMMA TAB UPOS TAB XPOS TAB FEATS TAB HEAD TAB DEPREL TAB DEPS TAB MISC NEWLINE

Return type:None
static p_wordline_word(prod)[source]

wordline : INTEGER_ID TAB FORM TAB LEMMA TAB UPOS TAB XPOS TAB FEATS TAB HEAD TAB DEPREL TAB DEPS TAB MISC NEWLINE

Return type:None
static p_wordlines_many(prod)[source]

wordlines : wordlines wordline

Return type:None
static p_wordlines_one(prod)[source]

wordlines : wordline

Return type:None
exception colonel.conllu.parser.IllegalEmptyNodeError(prod)[source]

Bases: colonel.conllu.parser.ParserError

Exception raised by ConlluParserBuilder when a word line was parsed correctly and has been recognised as an empty node line, however the data is not valid for this kind of element.

An exception instance must be initialized with the YaccProduction related to the word line containing illegal data, so that the line_number can be extracted; a short error message is also generated by the constructor.

exception colonel.conllu.parser.IllegalEofError[source]

Bases: colonel.conllu.parser.ParserError

Exception raised by ConlluParserBuilder when a parser error caused by invalid end-of-file is encountered.

When this exception is raised, it means that the end of the input data has been reached, but some additional tokens were expected in order to be valid CoNLL-U.

exception colonel.conllu.parser.IllegalMultiwordError(prod)[source]

Bases: colonel.conllu.parser.ParserError

Exception raised by ConlluParserBuilder when a word line was parsed correctly and has been recognised as a multiword token line, however the data is not valid for this kind of element.

An exception instance must be initialized with the YaccProduction related to the word line containing illegal data, so that the line_number can be extracted; a short error message is also generated by the constructor.

exception colonel.conllu.parser.IllegalTokenError(t)[source]

Bases: colonel.conllu.parser.ParserError

Exception raised by ConlluParserBuilder when a parser error caused by invalid token is encountered.

An exception instance must be initialized with the LexToken which the parser was not able to process, so that all the exception attributes can be extracted; a short error message is also generated by the constructor.

column_number = None

Column position, associated with line_number, related to the illegal token encountered, or to the first token of an illegal tokens sequence.

line_number = None

Line number related to the illegal token encountered, or to the first token of an illegal tokens sequence.

type = None

The type of the illegal token encountered, or of the first token of an illegal tokens sequence.

value = None

The value of the illegal token encountered, or of the first token of an illegal tokens sequence.

exception colonel.conllu.parser.ParserError[source]

Bases: Exception

Generic error class for ConlluParserBuilder.