colonel.conllu package¶
Module contents¶
This package provides methods and modules to process the CoNLL-U format.
In most situations it’s sufficient to make use of parse()
and
to_conllu()
functions, without caring too much about the implementation
under the hood.
In more detail, this package provides a lexical analyzer (see lexer
)
and a parser (see parser
) to transform the raw string input into
related Sentence
objects.
Lexer and parser classes are implemented taking advantage of the PLY (Python Lex-Yacc) library; you can learn more from the PLY documentation and from the Lex & Yacc Page.
-
colonel.conllu.
parse
(content)[source]¶ Parses a CoNLL-U string content, returning a list of sentences.
Raises: - lexer.LexerError – (any specific subclass) in case of invalid input breaking the rules of the CoNLL-U lexer
- parser.ParserError – (any specific subclass) in case of invalid input breaking the rules of the CoNLL-U parser
Parameters: content (
str
) – CoNLL-U formatted string to be parsedReturn type: Returns: list of parsed
Sentence
items
-
colonel.conllu.
to_conllu
(sentences)[source]¶ Serializes a list of sentences to a formatted CoNLL-U string.
This method simply concatenates the output of
Sentence.to_conllu()
for each given sentence and do not perform any validity check; sentences and elements not compatible with CoNLL-U format could lead to an incorrect output value or raising of exceptions.Parameters: sentences ( List
[Sentence
]) – list ofSentence
itemsReturn type: str
Returns: a CoNLL-U formatted representation of the sentences