MaltParser

MaltParser 1.0.1 - Available options

All options are categorized into one of the following option groups: system, config, singlemalt, malt0.4, input, output, graph, nivre, covington, libsvm, guide, pproj. Every option can have the following attributes:

AttributeDescription
nameThe name of the option
type There are following option types:
unaryThe option has no value, this type is only used by the help option to indicate that help should be displayed.
boolBoolean option, can take either true or false value.
integerInteger option, can take an integer value.
stringString option, can take a string value.
enumEnum option, can only take a predefined value.
stringenumStringEnum option, can either take a string value or a predefined value.
classClass option, can take a predefined value that corresponds to a class in the MaltParser distribution.
flag A short version option indicator.
default If there is a default value it is specified by this attribute.
usage Indicates the usage of the option:
trainThe option is only relevant during learning.
processThe option is only relevant during processing (parsing)
bothThe option is relevant both during learning and processing (parsing)
saveThe option is saved during learning and cannot be overridden during processing (parsing)

All the option groups and options are described in detail below. An option begins with the following format if the attribute is applicable:

name-flagtypedefault valueusage

system

The system option group contains options that have a special status, because they control the overall system. These options can only have one value each. For instance, you cannot specify more than one option file.

option_file -f string  

There are several ways to control MaltParser and one way is to supply all options in an option file. The option_file option can be used to specify the path to this option file.

help -h unary  

Displays a short description of all available options.

verbosity -v enum info 

There are several levels of verbosity for the system output stream, from showing all debugging messages (which can be useful when modifying or extending the source code of MaltParser) to turning off all messages. MaltParser uses Apache log4j logging services. To find out more about the different levels please consult the Apache log4j documentation. The default verbosity level is info, which means that all error, warning and informational messages are displayed.

 offLogging turned off
 fatalLogging of very severe error events
 errorLogging of error events
 warnLogging of harmful situations
 infoLogging of informational messages
 debugLogging of debugging messages

config

The config option group contains general options for a configuration.

name -c string  both

The configuration name is the name of the configuration and also the name of the MaltParser configuration file, which ends with the file suffix .mco. The name is your own choice, but it is appropriate to give the configuration a name that reflects the content. This option must always be specified, except when the url option is used instead of name.

url -u string  both

It is possible to specify a URL to the configuration file instead of specifying the configuration name. For example, if you have a configuration file with the following URL: http://w3.msi.vxu.se/~jha/maltparser/configs/test.mco you can write -u http://w3.msi.vxu.se/~jha/maltparser/configs/test.mco.

type -t class singlemalt both

MaltParser 1.0.1 has one available configuration type: singlemalt. Later releases may contain additional configuration types. For example, one type could be an ensemble parser configuration containing many single malt configurations.

 singlemaltSingle Malt configuration

workingdir -w string user.dir both

By default the working directory is the directory where MaltParser is started from, but it is possible to specify another directory with the workingdir option.

logging -cl enum info both

In contrast to the system-verbosity option, the logging option controls the level of verbosity of an individual configuration. The different verbosity or logging levels are the same as for the system-verbosity option.

 offLogging turned off
 fatalLogging of very severe error events
 errorLogging of error events
 warnLogging of harmful situations
 infoLogging of informational messages
 debugLogging of debugging messages

logfile -lfi string stdout both

By default the logging will be output to the standard output stream, but it is possible to direct this output stream to a logging file by specifying the logfile option.

singlemalt

The singlemalt option group is used when the singlemalt configuration type is specified.

mode -m enum parse both

The mode option is used to specify the type of processing that MaltParser should perform. For example, if the value is learn, MaltParser will create a Single Malt configuration and induce a parsing model from the input data. If the value is parse, it will parse new data using a Single Malt configuration. More information about the different modes can be found in the user guide.

 learnCreates a configuration and induces a parsing model from input data
 parseParses the input using a configuration
 infoPrints the info file of a configuration
 unpackUnpacks a configuration
 projProjectivizes input data using a configuration
 deprojDeprojectivizes input data using a configuration

parsing_algorithm -a class nivreeager save

The single malt configuration contains four deterministic parsing algorithms. Three algorithms produce projective dependency graphs: Nivre arc-eager, Nivre arc-standard and Covington projective. One algorithm is able to produce non-projective graphs: Covington non-projective. Nivre's parsing algorithms have an option group called nivre, for controlling the behavior of the algorithm, Covington's algorithms have a corresponding option group called covington. For more information about the parsing algorithm see the user guide: Parsing Algorithms.

 nivreeagerNivre arc-eager
 nivrestandardNivre arc-standard
 covnonprojCovington non-projective
 covprojCovington projective

null_value -nv enum one save

MaltParser 1.0.1 and later versions (implemented in Java) have the possibility of distinguishing between different kinds of null-values when extracting the feature vector. For input columns like POSTAG it is possible to differentiate two null-values:

In addition to the two null value categories for input columns, there is one more for the output columns: With this option it is possible to specify the degree of differentiation of null-values.

 noneExcludes all types of null values
 oneMaps all kinds of null values to one symbol
 rootlabelSame as 'one', but null value for output column is mapped to the root label
 rootnodeDistinguish between no node and root node
 novalueDistinguish between no node and root node, and no value for output column

malt0.4

To be able to reproduce the results of MaltParser 0.x (C-implementation), the new MaltParser can emulate the behavior of the old MaltParser 0.x. In some cases this means that the new parser actually has to replicate some minor bugs in the old implementation. Note: It is only possible to reproduce the results for Nivre arc-eager and Nivre arc-standard algorithms. For the Two covington algorithms, it is also likely that the results will be slightly better with the new implementation, even if behavior is set to true.

If behavior=true, this creates dependencies to other options, the values of which will be reset regardless of how they are otherwise specified. For example, the option null_values will be assigned the value rootlabel even if this option is specified to have another value. It is only possible to reproduce the results of MaltParser 0.x if the tagsets posset, cposset and depset have the appropriate tagset files.

behavior -mcb bool false save

If behavior=true, emulate the behavior of MaltParser 0.4.

posset -mcp string  train

To be able to reproduce the results of MaltParser 0.x (C implementation) it is important to preload some of the tagsets in the symbol table during learning. The part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines).

cposset -mcc string  train

To be able to reproduce the results of MaltParser 0.x (C implementation) it is important to preload some of the tagsets in the symbol table during learning. The coarse-grained part-of-speech tagset must be specified in a text file with one tag per line (and no blank lines).

depset -mcd string  both

To be able to reproduce the results of MaltParser 0.x (C implementation) it is important to preload some of the tagsets in the symbol table during learning. The dependency type tagset must be specified in a text file with one tag per line (and no blank lines).

input

The input option group contains options that control the input data. In MaltParser 1.0.1, the values of options in the input option group must match the values of corresponding options in the output option group. This restriction is likely to be removed in later releases.

infile -i string  both

The input data file is specified by the infile option. It is important that the input data file is formatted according to the format specified by the format option. For example, if format=conllx the input file should at least contain eight columns during learning and six column during parsing.

format -if stringenum conllx save

This option tells the parser which format is used in the input data file. The format is defined in an XML file. For more information see the user guide: Input and output format. There are already two data format specification files in the MaltParser distribution (included in malt.jar):

 conllxCoNLL-X data format
 malttabMaltTAB data format

reader -ir class tab both

In MaltParser 1.0.1, the only possible choice is tab-separated reader, but in later releases other readers may be included, for example, XML readers.

 tabTab-separated reader

charset -ic string UTF-8 save

The charset option defines the character set of the input data file, for example, UTF-8 or ISO8858-1.

output

The output option group contains options that control the output data. In MaltParser 1.0.1, the values of options in the output option group must match the values of corresponding options in the input option group. This restriction is likely to be removed in later releases.

outfile -o string  both

The output data file is specified by the outfile option.

format -of stringenum conllx save

This option tells the parser which format is used for the output data file. The format is defined in an XML file. For more information see the user guide: Define your own input/output format. There are already two data format specification files in the MaltParser distribution (included in malt.jar):

 conllxCoNLL-X data format
 malttabMaltTAB data format

writer -ow class tab both

In MaltParser 1.0.1, the only possible choice is tab-separated writer, but in later releases other writers may be available, for example, XML writers.

 tabTab-separated writer

charset -oc string UTF-8 save

The charset option defines the character set of the output data file, for example, UTF-8 or ISO8858-1.

graph

The graph option group controls internal data structures, such as the sentence and the dependency graph.

max_sentence_length -gsl integer 256 both

By default, the maximum sentence length is 256 tokens. If the input data file has sentences that are longer than 256 tokens, this option may be used to adjust the internal data structures, so that longer sentences can be loaded.

root_label -grl string ROOT save

Default label used for unattached tokens that are automatically attached to the special root node after parsing is completed.

nivre

The nivre option group controls the Nivre arc-eager and Nivre arc-standard parsing algorithms.

root_handling -r enum normal save

The root_handling option specifies how dependents of the special root node are handled.

 strictRoot dependents not attached during parsing (attached with default label afterwards), reduction of unattached tokens not permissible
 relaxedRoot dependents not attached during parsing (attached with default label afterwards), reduction of unattached tokens permissible
 normalRoot dependents attached by RightArc transition during parsing (unattached tokens attached with default label afterwards)

post_processing -npp bool false save

If post_processing=true, the parser will make a second pass over the input where only unattached tokens are processed.

covington

allow_root -cr bool true save

If allow_root=true, the parser treats the special root node as a token during parsing, allowing root dependents to be attached with a RightArc transition; otherwise root dependents are not attached during parsing. In both cases, unattached tokens are attached to the special root node with the default label after parsing is completed.

allow_shift -cs bool false save

If allow_shift=true, Shift is a valid transition, allowing the parser to skip remaining tokens in Left; otherwise all tokens in Left must be inspected before the next token is shifted.

libsvm

This group contains options that are specific for the LIBSVM learner.

libsvm_options -lso string  save

There are many LIBSVM options (see LIBSVM Documentation). Note that all whitespace is replaced by underscore if this option is specified in the command-line prompt. For example, it could look like this: -lso -s_0_-t_1_-d_2_-g_0.2_-c_0.5_-r_0_-e_1.0 .

libsvm_external -lsx string  train

If you have the LIBSVM package installed on your system then it is possible to use the C++ implementation of LIBSVM learner instead of the internal Java implementation (libsvm.jar) during learning time. It is very likely that the external C++ implementation is faster and uses less memory on your system. By specifying this option with the path to the executable file svm-train (Microsoft Windows use svm-train.exe) the parser will train LIBSVM models with svm-train instead of using libsvm.jar. Note: There can be a slight differences in accuracy between using the internal LIBSVM learner and the external LIBSVM learner, due to different versions of LIBSVM and the precision in assigning floating-point parameters.

save_instance_files -lsi bool false save

If save_instance_files=true, training instance files are saved in the configuration, otherwise these files are deleted. The training instance files are not used during parsing.

guide

Contains options that are specific for the guide, which can be seen as an interface (or glue) between the parsing algorithm and the learner. During learning, the parsing algorithm sends training instances to the guide, which prepares the corresponding feature vectors that are sent to the learner. During parsing, the parsing algorithm requests the prediction of parser actions from the guide, which means that the guide prepares the feature vectors that are sent to the classifier (which makes use of the model induced in the learning phase).

features -F stringenum  save

The features option is used for specifying the feature model specification file, which is an XML file (see user guide: Feature model) or a text file with the file suffix .par (see user guide of MaltParser 0.x (C-impl) Feature Models). If no feature specification file is specified, the parser will use a default feature model specification for the given parsing algorithm that is included in the MaltParser distribution (included in the malt.jar file).

 nivreeagerNivre arc-eager default model
 nivrestandardNivre arc-standard default model
 covnonprojCovington non-projective default model
 covprojCovington projective default model

data_split_column -d string  save

For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide should split up the training instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time. The option data_split_column indicates which input column in the data format specification file should be used for splitting up the training instances, for example, -d POSTAG or -d CPOSTAG. It is not a good idea to use fine-grained features, such as LEMMA or FORM, since this would result in thousands of models.

data_split_structure -s string  save

For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide should split up the training instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time. The option data_split_structure specifies the data structure that should be used for splitting up the traning instances. For example, with Nivre's parsing algorithm it is possible to use the top token on the stack (-s Stack[0]) or the next input token (-s Input[0]); for Covington's algorithms it should be either -s Left[0] or -s Right[0].

data_split_threshold -T integer 50 save

For some learning methods (like LIBSVM) it is impractical to induce a single model based on all training instances. With the data_split_column, data_split_structure and data_split_threshold options it is possible to define how the guide should split up the training instances to train several models. Note: Usually this will result in a slight drop in accuracy but a significant decrease in learning time. The option data_split_threshold specifies the frequency threshold for training a separate model. For example, -T 100 means that all training sets that contain less than 100 instances will be merged into a default training set.

kbest -k integer -1 save

The classifier can produce a k-best list of predicted parser actions. The kbest option indicates how many items the k-best list should contain. If -k -1, all possible parser actions are ranked in the k-best list. If -k 1, there is only one prediction in the k-best list. MaltParser 1.0.1 (behavior ≠ malt0.4) only makes use of the k-best list when the parser action is not permissble. Later releases of MaltParser will make use of the k-best list in a more intelligent way. If --malt0.4-behavior=true, this option will be overridden with k=1.

learner -l class libsvm save

This option specifies the learning method (learner package). MaltParser 1.0.1 only includes the LIBSVM learner. It is very likely that later releases will have support for other learner packages.

 libsvmLIBSVM learner

prediction_strategy -gps class combined save

A combined prediction strategy combines the transiton and the dependency type into one class. For example, the RightArc transition and a dependency type object will be one class RA_object.

 combinedCombined model

classitem_separator -gcs string _ save

By default the combination of transition and dependency type into one class is separated by an underscore. If some dependency label contains an underscore, this could mess up the separation of the class. Therefore another classitem_separator should be used in this case.

pproj

marking_strategy -pp enum none save

Marking strategy for pseudo-projective transformation.

 noneNo pseudo-projective transformation
 baselineProjectivizes input data
 headProjectivizes input data with head encoding for labels
 pathProjectivizes input data with path encoding for labels
 head+pathProjectivizes input data with head and path encoding for labels

covered_root -pcr enum none save

Attachment strategy for covered roots.

 noneNo covered root transformation
 leftAttach covered roots to the left end of the covering arc
 rightAttach covered roots to the right end of the covering arc
 headAttach covered roots to the head of the covering arc

lifting_order -plo enum shortest save

Lifting order, in case a dependency graph contains multiple non-projective arcs.

 shortestLift the shortest arcs first (break ties from left to right)
 deepestLift the most deeply nested arcs first (break ties from left to right)