colonel.multiword module

Module providing the Multiword class.

class colonel.multiword.Multiword(first_index=None, last_index=None, **kwargs)[source]

Bases: colonel.base_sentence_element.BaseSentenceElement

Representation of a Multiword Token sentence element

first_index

The first word index (inclusive) covered by the multiword token.

This usually corresponds to the value of the Word.index of the first Word which is part of this multiword token.

It is compatible with CoNLL-U ID field, which in case of a multiword token is a range of integer numbers, where first and last bound indexes are separated by a dash (-): the first index here corresponds to the value at left.

is_valid()[source]

Returns whether or not the object can be considered valid, however ignoring the context of the sentence in which the word itself is possibly inserted.

In compliance with the CoNLL-U format, an instance of type Multiword is considered valid only when first_index is set to a value greater than zero (0) and last_index is set to a value greater than first_index.

Return type:bool
last_index

The last word index (inclusive) covered by the multiword token.

This usually corresponds to the value of the Word.index of the last Word which is part of this multiword token.

It is compatible with CoNLL-U ID field, which in case of a multiword token is a range of integer numbers, where first and last bound indexes are separated by a dash (-): the first index here corresponds to the value at right.

to_conllu()[source]

Returns a CoNLL-U formatted representation of the element.

No validity check is performed on the attributes; values not compatible with CoNLL-U format could lead to an incorrect output value or raising of exceptions.

Return type:str