MaltParser

MaltParser 1.9 - Available options

All options are categorized into one of the following option groups: system, config, singlemalt, input, output, graph, nivre, multiplanar, planar, 2planar, covington, lib, guide, pproj. Every option can have the following attributes:

Attribute

Description

name

The name of the option

type

There are following option types:

unary	The option has no value, this type is only used by the help option to indicate that help should be displayed.
bool	Boolean option, can take either true or false value.
integer	Integer option, can take an integer value.
string	String option, can take a string value.
enum	Enum option, can only take a predefined value.
stringenum	StringEnum option, can either take a string value or a predefined value.
class	Class option, can take a predefined value that corresponds to a class in the MaltParser distribution.

flag

A short version option indicator.

default

If there is a default value it is specified by this attribute.

usage

Indicates the usage of the option:

train	The option is only relevant during learning.
process	The option is only relevant during processing (parsing)
both	The option is relevant both during learning and processing (parsing)
save	The option is saved during learning and cannot be overridden during processing (parsing)

All the option groups and options are described in detail below. An option begins with the following format if the attribute is applicable:

name -flag type default value usage

system

The system option group contains options that have a special status, because they control the overall system. These options can only have one value each. For instance, you cannot specify more than one option file.

option_file -f string

There are several ways to control MaltParser and one way is to supply all options in an option file. The option_file option can be used to specify the path to this option file.

help -h unary

Displays a short description of all available options.

verbosity -v enum info

There are several levels of verbosity for the system output stream, from showing all debugging messages (which can be useful when modifying or extending the source code of MaltParser) to turning off all messages. MaltParser uses Apache log4j logging services. To find out more about the different levels please consult the Apache log4j documentation. The default verbosity level is info, which means that all error, warning and informational messages are displayed.

off	Logging turned off
fatal	Logging of very severe error events
error	Logging of error events
warn	Logging of harmful situations
info	Logging of informational messages
debug	Logging of debugging messages

config

The config option group contains general options for a configuration.

name -c string both

The configuration name is the name of the configuration and also the name of the MaltParser configuration file, which ends with the file suffix .mco. The name is your own choice, but it is appropriate to give the configuration a name that reflects the content. This option must always be specified, except when the url option is used instead of name.

url -u string both

It is possible to specify a URL to the configuration file instead of specifying the configuration name. For example, if you have a configuration file with the following URL: http://maltparser.org/mco/test.mco you can write -u http://maltparser.org/mco/test.mco.

flowchart -m enum parse both

There are seven predefined flow charts.

learn	Learn a Single MaltParser configuration
parse	Parse with a Single MaltParser configuration
info	Prints the info file of a configuration
unpack	Unpacks a configuration
convert	Simple format converter
analyze
proj	Projectivizes input data using a configuration
deproj	Deprojectivizes input data using a configuration
learnwo	Same as learn, but also outputs the graphs to file specified by the flag -o
testdata	Generates test instances to run experiments with a learner outside MaltParser. Use for example the flag -li true to save instances.

type -t class singlemalt both

MaltParser 1.9.2 has one available configuration type: singlemalt. Later releases may contain additional configuration types. For example, one type could be an ensemble parser configuration containing many single malt configurations.

singlemalt

Single Malt Parser configuration

workingdir -w string user.dir both

By default the working directory is the directory where MaltParser is started from, but it is possible to specify another directory with the workingdir option.

logging -cl enum info both

In contrast to the system-verbosity option, the logging option controls the level of verbosity of an individual configuration. The different verbosity or logging levels are the same as for the system-verbosity option.

off	Logging turned off
fatal	Logging of very severe error events
error	Logging of error events
warn	Logging of harmful situations
info	Logging of informational messages
debug	Logging of debugging messages

logfile -lfi string stdout both

By default the logging will be output to the standard output stream, but it is possible to direct this output stream to a logging file by specifying the logfile option.

singlemalt

The singlemalt option group is used when the singlemalt configuration type is specified.

mode -sm enum parse both

This option is replaced by --config-flowchart and should not by used anymore. The value of this option will be mapped to --config-flowchart.

learn
parse

parsing_algorithm -a class nivreeager save

The single malt configuration contains seven deterministic parsing algorithms. Four algorithms produce projective dependency graphs: Nivre arc-eager, Nivre arc-standard, Covington projective and Stack projective. Three algorithms are able to produce non-projective graphs: Covington non-projective, Stack eager and Stack lazy. Nivre's parsing algorithms have an option group called nivre, for controlling the behavior of the algorithm, Covington's algorithms have a corresponding option group called covington. For more information about the parsing algorithm see the user guide: Parsing Algorithms.

nivreeager	Nivre arc-eager
nivrestandard	Nivre arc-standard
covnonproj	Covington non-projective
covproj	Covington projective
stackproj	Stack projective
stackeager	Stack eager
stacklazy	Stack lazy
planar	Planar eager
2planar	2-Planar eager

guide_model -gm class single save

MaltParser 1.9.2 has one available guide model type: single. Later releases may contain additional guide model types.

single

Classic guide

null_value -nv enum one save

MaltParser 1.9.2 and later versions (implemented in Java) have the possibility of distinguishing between different kinds of null-values when extracting the feature vector. For input columns like POSTAG it is possible to differentiate two null-values:

NO NODE: There exists no corresponding dependency graph node (e.g., because the lookahead extend beyond the end of the string), which means that the feature is really undefined.
ROOT NODE: The dependency graph node is a root node, which means that it is not possible to extract an input column value (for example, the word form or the part-of-speech).

In addition to the two null value categories for input columns, there is one more for the output columns:

NO VALUE: The dependency graph node exists and is not the root, but has not yet been assigned a value for the output column requested (e.g., has not been assigned a head and therefore does not have a dependency type).

With this option it is possible to specify the degree of differentiation of null-values.

none: Excludes all kinds of null-values when extracting the feature vector, this option value is not possible for learning methods that have symbolic feature vector encoding.
one: Maps all kinds of null values to one symbol.
rootnode: Distinguishes between NO NODE and ROOT NODE, and the NO VALUE null-value case is mapped to the ROOT NODE null-value for output columns.
novalue: Distinguishes between NO NODE and ROOT NODE for both input and output columns, and NO VALUE for output columns.

none	Excludes all types of null values
one	Maps all kinds of null values to one symbol
rootnode	Distinguish between no node and root node
novalue	Distinguish between no node and root node, and no value for output column

diagnostics -di bool false both

If true ,then diagnostics is written to standard out or the file specified by option diafile. By default this option is false.

diafile -dif string stdout both

By default the diagnostics will be output to the standard output stream, but it is possible to direct this output stream to a diagnostics file by specifying the diafile option.

use_partial_tree -up bool false save

If true, then partial trees are allowed as input and the parser will construct these partial trees before parsing. By default this option is false. Please see the user guide: Partial trees

propagation -fp string save

The propagation option is used for specifying the propagation specification file, which is an XML file (see user guide: Propagation)

input

The input option group contains options that control the input data. In MaltParser 1.9.2, the values of options in the input option group must match the values of corresponding options in the output option group. This restriction is likely to be removed in later releases.

infile -i string both

The input data file is specified by the infile option. It is important that the input data file is formatted according to the format specified by the format option. For example, if format=conllx the input file should at least contain eight columns during learning and six column during parsing.

format -if stringenum conllx save

This option tells the parser which format is used in the input data file. The format is defined in an XML file. For more information see the user guide: Input and output format. There are two data format specification files in the MaltParser distribution (included in maltparser-1.9.2.jar):

conllx defines the CoNLL-X shared task format
conllu defines the CoNLL-U format
malttab defines the Malt-TAB format.

conllx	CoNLL-X data format
conllu	CoNLL-U data format
malttab	MaltTAB data format

reader -ir class tab both

In MaltParser 1.9.2 there are ine input reader:

tab reads tab-separated files and with columns defined by the input format.

tab	Tab-separated reader

charset -ic string UTF-8 save

The charset option defines the character set of the input data file, for example, UTF-8 or ISO8858-1.

reader_options -iro string both

MaltParser has several data readers and with this option it is possible to control individual data readers.

iterations -it integer 1 both

Number of iterations over the input file.

output

The output option group contains options that control the output data. In MaltParser 1.9.2, the values of options in the output option group must match the values of corresponding options in the input option group. This restriction is likely to be removed in later releases.

outfile -o string both

The output data file is specified by the outfile option.

format -of stringenum both

This option tells the parser which format is used for the output data file. The format is defined in an XML file. For more information see the user guide: Define your own input/output format. There are two data format specification files in the MaltParser distribution (included in maltparser-1.9.2.jar):

conllx defines the CoNLL-X shared task format
conllu defines the CoNLL-U format
malttab defines the Malt-TAB format.

conllx	CoNLL-X data format
conllu	CoNLL-U data format
malttab	MaltTAB data format

writer -ow class tab both

In MaltParser 1.9.2 there is two output writer:

tab writes tab-separated files with columns defined by the input format.

tab	Tab-separated writer

charset -oc string UTF-8 save

The charset option defines the character set of the output data file, for example, UTF-8 or ISO8858-1.

writer_options -owo string both

MaltParser has several data writers and with this option it is possible to control individual data writers.

graph

The graph option group controls internal data structures, such as the sentence and the dependency graph.

max_sentence_length -gsl integer 256 both

By default, the maximum sentence length is 256 tokens. If the input data file has sentences that are longer than 256 tokens, this option may be used to adjust the internal data structures, so that longer sentences can be loaded. This option is deprecated, there is no upper limit of the sentence length.

root_label -grl string ROOT save

Default label used for unattached tokens that are automatically attached to the special root node after parsing is completed.

head_rules -ghr string save

It is possible to define head finding rules to control the transformation from phrase structure to dependency structure. For more information see the user guide: Head-finding rules.

nivre

The nivre option group controls the Nivre arc-eager and Nivre arc-standard parsing algorithms.

allow_root -nr bool true save

If allow_root=true, the parser treats the special root node as a token during parsing, allowing root dependents to be attached with a RightArc transition; otherwise root dependents are not attached during parsing. In both cases, unattached tokens are attached to the special root node with the default label after parsing is completed.

allow_reduce -ne bool false save

If allow_reduce=true, the Reduce transition is permissible even if the node on top of the stack does not have a head. As a result, this node will be attached to the special root node after parsing is completed, which may give rise to non-projective trees.

enforce_tree -nt bool false process

If enforce_tree=true, the parser will use an extended transition system that makes sure that the output parse is a tree by (if necessary) unshifting unattached tokens remaining on the stack after the buffer has been emptied and forcing them to be attached.

multiplanar

The multiplanar option group contains options that are common to the multiplanar family of algorithms (planar and 2-planar algorithms).

planar_root_handling -prh enum normal save

The planar_root_handling option specifies how dependents of the special root node are handled in the planar or 2-planar parser.

relaxed	Root dependents not attached during parsing (attached with default label afterwards).
normal	Root dependents attached by RightArc transition during parsing (unattached tokens attached with default label afterwards).

planar

The planar option group controls the Nivre planar parsing algorithm.

connectedness -pcon enum none save

If connectedness=true, the parser only generates connected dependency graphs.

none	Don't enforce connectedness at all, words whose head the parser doesn't know will be linked to the root node. With this option, the parser will work with planar dependency forests. A forest may be seen as a tree by considering all the roots linked to the dummy root node, but it needn't be planar when seen this way.
reduceonly	The last node in a connected component cannot be reduced. No restrictions on shift transitions. This option guarantees that the dependency graph obtained counting links to the dummy root node is planar and connected.
full	Enforce full connectedness by not only not allowing to reduce the last node in a component, but not allowing to shift the last word if the graph is not connected. The produced graph will be connected and planar even without considering the dummy root node.

acyclicity -pacy bool true save

If acyclicity=true, the parser only generates acyclic dependency graphs.

no_covered_roots -pcov bool false save

If covered_roots=true, the parser disallows covered roots (i.e. disallows non-projective structures, while with this option set to false, it allows planar structures that are not projective).

2planar

The 2-planar option group controls the 2-planar parsing algorithm.

reduceonswitch -2pr bool false save

If reduceonswitch=true, the parser reduces the active stack immediately after switching stacks.

covington

allow_root -cr bool true save

allow_shift -cs bool false save

If allow_shift=true, Shift is a valid transition, allowing the parser to skip remaining tokens in Left; otherwise all tokens in Left must be inspected before the next token is shifted.

lib

This group contains options that are specific for the liblinear and libsvm learner.

options -lo string save

There are many LIBSVM options (see LIBSVM Documentation). Note that all whitespace is replaced by underscore if this option is specified in the command-line prompt. For example, it could look like this: -lo -s_0_-t_1_-d_2_-g_0.2_-c_1_-r_0_-e_1.0. Liblinear have several options (see liblinear Documentation) that you can specify with this option. Note that all whitespace is replaced by underscore if this option is specified in the command-line prompt. For example, it could look like this: -lo -s_4_-c_0.1

external -lx string train

Path to train or svm-train executable file of the liblinear or the libsvm package.

save_instance_files -li bool false save

If save_instance_files=true, training instance files are saved in the configuration, otherwise these files are deleted. The training instance files are not used during parsing.

verbosity -lv enum silent train

Verbosity of the liblinear or the libsvm package

silent	No output from the liblinear or the libsvm package is logged.
error	Only the error stream of the liblinear or the libsvm package is logged.
all	All output of the liblinear or the libsvm package is logged.

guide

Contains options that are specific for the guide, which can be seen as an interface (or glue) between the parsing algorithm and the learner. During learning, the parsing algorithm sends training instances to the guide, which prepares the corresponding feature vectors that are sent to the learner. During parsing, the parsing algorithm requests the prediction of parser actions from the guide, which means that the guide prepares the feature vectors that are sent to the classifier (which makes use of the model induced in the learning phase).

features -F stringenum save

The features option is used for specifying the feature model specification file, which is an XML file (see user guide: Feature model) or a text file with the file suffix .par (see user guide of MaltParser 0.x (C-impl) Feature Models). If no feature specification file is specified, the parser will use a default feature model specification for the given parsing algorithm that is included in the MaltParser distribution (included in the maltparser-1.9.2.jar file).

nivreeager	Nivre arc-eager default model
nivrestandard	Nivre arc-standard default model
covnonproj	Covington non-projective default model
covproj	Covington projective default model
stackproj	Stack projective default model
stackeager	Stack projective default model
stacklazy	Stack projective default model
planar	Planar arc-eager default model
2planar	2-Planar arc-eager default model

data_split_column -d string save

For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide should split up the training instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time. The option data_split_column indicates which input column in the data format specification file should be used for splitting up the training instances, for example, -d POSTAG or -d CPOSTAG. It is not a good idea to use fine-grained features, such as LEMMA or FORM, since this would result in thousands of models.

data_split_structure -s string save

For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide should split up the training instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time. The option data_split_structure specifies the data structure that should be used for splitting up the traning instances. For example, with Nivre's parsing algorithm it is possible to use the top token on the stack (-s Stack[0]) or the next input token (-s Input[0]); for Covington's algorithms it should be either -s Left[0] or -s Right[0].

data_split_threshold -T integer 50 save

For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide should split up the training instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time. The option data_split_threshold specifies the frequency threshold for training a separate model. For example, -T 100 means that all training sets that contain less than 100 instances will be merged into a default training set.

kbest -k integer -1 save

The classifier can produce a k-best list of predicted parser actions. The kbest option indicates how many items the k-best list should contain. If -k -1, all possible parser actions are ranked in the k-best list. If -k 1, there is only one prediction in the k-best list. MaltParser 1.9.2 (behavior ≠ malt0.4) only makes use of the k-best list when the parser action is not permissible. Later releases of MaltParser will make use of the k-best list in a more intelligent way. If --malt0.4-behavior=true, this option will be overridden with k=1.

kbest_type -kt class rank process

The classifier can produce a k-best list of predicted parser actions.

rank	Only ranked list

learner -l class liblinear save

This option specifies the learning method (learner package). MaltParser 1.9.2 includes the LIBSVM learner and the LIBLINEAR learner.

libsvm	LIBSVM learner
liblinear	LIBLINEAR learner

decision_settings -gds string T.TRANS+A.DEPREL save

This option specifies how a parser action is combined or divided. By default, arc label(s) and transition are combined into one individual decision. For more information see the user guide: Prediction strategy.

classitem_separator -gcs string ~ save

By default the combination of transition and dependency type into one class is separated by an underscore. If some dependency label contains an underscore, this could mess up the separation of the class. Therefore another classitem_separator should be used in this case.

pproj

marking_strategy -pp enum none save

Marking strategy for pseudo-projective transformation.

none	No pseudo-projective transformation
baseline	Projectivizes input data
head	Projectivizes input data with head encoding for labels
path	Projectivizes input data with path encoding for labels
head+path	Projectivizes input data with head and path encoding for labels

covered_root -pcr enum none save

Attachment strategy for covered roots.

none	No covered root transformation; covered roots treated as any other node
ignore	No covered root transformation; covered roots ignored in projectivity tests (old implementation of none)
left	Attach covered roots to the left end of the shortest covering arc
right	Attach covered roots to the right end of the shortest covering arc
head	Attach covered roots to the head of the shortest covering arc

lifting_order -plo enum shortest save

Lifting order, in case a dependency graph contains multiple non-projective arcs.

shortest	Lift the shortest arcs first (break ties from left to right)
deepest	Lift the most deeply nested arcs first (break ties from left to right)

Get MaltParser

Documentation

Resources

Contact

MaltParser 1.9 - Available options

name-flagtypedefault valueusage

system

option_file -f string

help -h unary

verbosity -v enum info

config

name -c string both

url -u string both

flowchart -m enum parse both

type -t class singlemalt both

workingdir -w string user.dir both

logging -cl enum info both

logfile -lfi string stdout both

singlemalt

mode -sm enum parse both

parsing_algorithm -a class nivreeager save

guide_model -gm class single save

null_value -nv enum one save

diagnostics -di bool false both

diafile -dif string stdout both

use_partial_tree -up bool false save

propagation -fp string save

input

infile -i string both

format -if stringenum conllx save

reader -ir class tab both

charset -ic string UTF-8 save

reader_options -iro string both

iterations -it integer 1 both

output

outfile -o string both

format -of stringenum both

writer -ow class tab both

charset -oc string UTF-8 save

writer_options -owo string both

graph

max_sentence_length -gsl integer 256 both

root_label -grl string ROOT save

head_rules -ghr string save

nivre

allow_root -nr bool true save

allow_reduce -ne bool false save

enforce_tree -nt bool false process

multiplanar

planar_root_handling -prh enum normal save

planar

connectedness -pcon enum none save

acyclicity -pacy bool true save

no_covered_roots -pcov bool false save

2planar

reduceonswitch -2pr bool false save

covington

allow_root -cr bool true save

allow_shift -cs bool false save

lib

options -lo string save

external -lx string train

save_instance_files -li bool false save

verbosity -lv enum silent train

guide

features -F stringenum save

data_split_column -d string save

data_split_structure -s string save

data_split_threshold -T integer 50 save

kbest -k integer -1 save

kbest_type -kt class rank process

learner -l class liblinear save

decision_settings -gds string T.TRANS+A.DEPREL save

classitem_separator -gcs string ~ save

pproj

marking_strategy -pp enum none save

covered_root -pcr enum none save

lifting_order -plo enum shortest save

name -flag type default value usage