colonel.conllu package

Module contents

This package provides methods and modules to process the CoNLL-U format.

In most situations it’s sufficient to make use of parse() and to_conllu() functions, without caring too much about the implementation under the hood.

In more detail, this package provides a lexical analyzer (see lexer) and a parser (see parser) to transform the raw string input into related Sentence objects.

Lexer and parser classes are implemented taking advantage of the PLY (Python Lex-Yacc) library; you can learn more from the PLY documentation and from the Lex & Yacc Page.

colonel.conllu.parse(content)[source]

Parses a CoNLL-U string content, returning a list of sentences.

Raises:
  • lexer.LexerError – (any specific subclass) in case of invalid input breaking the rules of the CoNLL-U lexer
  • parser.ParserError – (any specific subclass) in case of invalid input breaking the rules of the CoNLL-U parser
Parameters:

content (str) – CoNLL-U formatted string to be parsed

Return type:

List[Sentence]

Returns:

list of parsed Sentence items

colonel.conllu.to_conllu(sentences)[source]

Serializes a list of sentences to a formatted CoNLL-U string.

This method simply concatenates the output of Sentence.to_conllu() for each given sentence and do not perform any validity check; sentences and elements not compatible with CoNLL-U format could lead to an incorrect output value or raising of exceptions.

Parameters:sentences (List[Sentence]) – list of Sentence items
Return type:str
Returns:a CoNLL-U formatted representation of the sentences