MaltParser

Start using MaltParser

This is short guide to get familiar with MaltParser. We start with running the MaltParser without any arguments by writing the following into the command line prompt (it is important that you are in the malt-1.0.1 directory):

prompt> java -jar malt.jar
This command will display the following output:
-----------------------------------------------------------------------------
                          MaltParser 1.0.1
-----------------------------------------------------------------------------
         MALT (Models and Algorithms for Language Technology) Group
             School of Mathematics and Systems Engineering (MSI)
                        Vaxjo University, Sweden
-----------------------------------------------------------------------------

Usage:
   java -jar malt.jar -f  
   java -jar malt.jar -h for more help

help                  (  -h) : Show help
-----------------------------------------------------------------------------
option_file           (  -f) : Path to the option file
-----------------------------------------------------------------------------
verbosity             (  -v) : Verbosity level
  debug      - Logging of debugging messages
  error      - Logging of error events
  fatal      - Logging of very severe error events
  info       - Logging of informational messages
  off        - Logging turned off
  warn       - Logging of harmful situations
-----------------------------------------------------------------------------
Here you can see the basic usage and options. To get all available options:
prompt> java -jar malt.jar -h
All these options are also described as a short version and as a fully described version.

Train a parsing model

Now we ready to train our first parsing model. In the directory examples/data there are two treebank data files talbanken05_train.conll and talbanken05_test.conll, which contain a very small portion of the treebank Talbanken05. The example data are formatted according to the CoNLL data format. Note that these data is very small and you need more training data to create a useful parsing model.

To train a default parsing model with MaltParser write the following into the command line prompt:

prompt> java -jar malt.jar -c test -i examples/data/talbanken05_train.conll -m learn
This line tells MaltParser to create a parsing model named test.mco from the treebank in the file 'examples/data/talbanken05_train.conll. The parsing model get it's name from the configuration name, which are specified by the option flag (-c) without the file suffix .mco. The configuration name is a name of your own choice. The option flag (-i) tells the parser where to find the input data. The last option flag (-m) specifies the parsing mode learn, in this case we want to induce a model by using the default learning method (LIBSVM). MaltParser outputs the following information:
Started: Sat Oct 13 15:03:52 CEST 2007
Initialize the parsing algorithm...
Reading sentences from 'examples/data/talbanken05_train.conll':1
Number of sentences: 32
Creating all models
Creating LIBSVM model libsvm.mod
Saving the symbol table...
Saving the configuration specific options...
Creates configuration file 'd:\exp\malt1.0\install\malt-1.0.1\test.mco' ...
Finished: Sat Oct 13 15:03:54 CEST 2007
Learning time: 00:00:02 (2234 ms)
Must of the logging information is self-explaining: it tells you that parser is started at a certain time and date and it reads sentences from a specified file and it contains 32 sentences. It continues with information about which learning models that are created, in this case it only creates one LIBSVM model. Finally, it saves the symbol table and all options that cannot be changed during parsing and stores everything in a configuration file named test.mco. In the end the parser informs you about the learning time.

Parse data with your parsing model

When we have a parsing model that we can use for parsing new sentences from the same language. It is important that unparsed sentences are formatted according to the format with all input information like part-of-speech tags, in this case we have formatted sentences with the six first column in CoNLL data format. To parse type the following:

prompt> $ java -jar malt.jar -c test -i examples/data/talbanken05_test.conll -o out.conll -m parse
where -c test is the name of the configuration (the prefix file name of test.mco), -i examples/data/talbanken05_test.conll tells the parser where to find the input data, -o out.conll is the output file name and finally -m parse specifies that the parser should be excuted in parsing mode.

Get configuration information

Sometimes it useful to get information about configuration, for instance which settings that have been used when creating the parsing model. To get this information you type:

prompt> java -jar malt.jar -c test -m info
This will output a lot of information about the configuration:
CONFIGURATION
Configuration name:   test
Configuration type:   singlemalt
Created:              Sat Oct 13 15:03:52 CEST 2007

SYSTEM
Operating system architecture: x86
Operating system name:         Windows XP
JRE vendor name:               Sun Microsystems Inc.
JRE version number:            1.6.0_03

MALTPARSER
Version:                       1.0.1
Build date:                    October 13 2007

SETTINGS
config
  workingdir (  -w)                     user.dir
  name (  -c)                           test
  logging ( -cl)                        info
  type (  -t)                           singlemalt
  logfile (-lfi)                        stdout
  url (  -u)
covington
  allow_root ( -cr)                     true
  allow_shift ( -cs)                    false
graph
  max_sentence_length (-gsl)            256
  root_label (-grl)                     ROOT
guide
  data_split_structure (  -s)
  learner (  -l)                        libsvm
  kbest (  -k)                          -1
  features (  -F)
  classitem_separator (-gcs)            _
  prediction_strategy (-gps)            combined
  data_split_column (  -d)
  data_split_threshold (  -T)           50
input
  infile (  -i)                         examples/data/talbanken05_train.conll
  reader ( -ir)                         tab
  charset ( -ic)                        UTF-8
  format ( -if)                         /appdata/dataformat/conllx.xml
libsvm
  libsvm_exclude_null (-lse)            no
  libsvm_options (-lso)
  libsvm_exclude_columns (-lsc)
nivre
  root_handling (  -r)                  normal
  post_processing (-npp)                false
output
  charset ( -oc)                        UTF-8
  format ( -of)                         /appdata/dataformat/conllx.xml
  writer ( -ow)                         tab
  outfile (  -o)
pproj
  covered_root (-pcr)                   none
  marking_strategy ( -pp)               none
singlemalt
  parsing_algorithm (  -a)              nivreeager
  behavior (-mcb)                       malt1.0
  symbol_table (-mct)
  special_symbols (-mcs)                /appdata/specialsymbols/malt1.0.xml
  mode (  -m)                           learn

DEPENDENCIES
--guide-features (  -F)                 /appdata/features/NivreEager.par

FEATURE MODEL
InputColumn(POSTAG, Stack[0])
InputColumn(POSTAG, Input[0])
InputColumn(POSTAG, Input[1])
InputColumn(POSTAG, Input[2])
InputColumn(POSTAG, Input[3])
InputColumn(POSTAG, Stack[1])
OutputColumn(DEPREL, Stack[0])
OutputColumn(DEPREL, ldep(Stack[0]))
OutputColumn(DEPREL, rdep(Stack[0]))
OutputColumn(DEPREL, ldep(Input[0]))
InputColumn(FORM, Stack[0])
InputColumn(FORM, Input[0])
InputColumn(FORM, Input[1])
InputColumn(FORM, head(Stack[0]))

LIBSVM INTERFACE
  LIBSVM version: 2.84
  SVM-param string:
  Null-value handling: INCLUDE_NULL_VALUES
LIBSVM SETTINGS
  SVM type      : C_SVC (0)
  Kernel        : POLY (1)
  Degree        : 2
  Gamma         : 0.2
  Coef0         : 0.0
  Cache Size    : 40.0 MB
  C             : 0.5
  Eps           : 1.0
  Shrinking     : 1
  Probability   : 0
  #Weight       : 0
The information consists of several types of information:
Information typeDescription
CONFIGURATIONThe name and the type of configuration and when it was created.
SYSTEMInformation about the system that used when creating the configuration, such as processor, operating system and the version of Java Runtime Environment (JRE).
MALTPARSERVersion of MaltParser and when it was built.
SETTINGSAll option settings divided into several categories.
DEPENDENCIESIn some cases the parser correct it self when illegal combination is specified or some option is missing. In example above the feature specification file is not specified and parser use the default feature specification file for the Nivre Arc-eager parsing algorithm.
FEATURE MODELOutputs the content of the feature specification file.
<LEARNER> INTERFACEInformation and settings of interface to the learner, in example above the LIBSVM is used.
<LEARNER> SETTINGSAll settings of the specific learner options, in example above the LIBSVM is used.

Unpack a configuration

It is possible to unpack the configuration file test.mco by typing:

prompt> java -jar malt.jar -c test -m unpack
This command will create a new directory test containing following files:
FileDescription
libsvm.modThe LIBSVM model that is used for predicting the next parsing action.
savedoptions.sopContains all option settings that cannot be changed when parsing.
symboltables.symContains all distinct value of the training data, divided into different columns. For example, column POSTAG in the CoNLL format has it's own symbol table with all distinct values occuring in the training data.
test_singlemalt.infoInformation about the configuration, contains the same information described above.

Different ways to specify options

MaltParser is equipped with different ways to specify the option settings:

MethodDescriptionExample
Command-line option flagUses the option flag with one minus sign - before the option flag and blank between the option flag and the value-c test
Command-line option group and option nameUses both the option group name and option name to specify the option. It should always begin with two minus -- sign before the option group name and one minus sign - to separate the option group name and the option name. The equal sign =is used for separating the option and the value.--config-name=test
Command-line option nameIs a shorter version of Command-line option group and option name and can only be used when the option name is unambiguous. --name=test
Option fileThe option settings is specified in a option file, formatted in XML. To read the option file the option flag -f is used. Note that command line option settings override the settings in the option file if they are specified twice.
<?xml version="1.0" encoding="UTF-8"?>
<experiment>
	<optioncontainer>
		<optiongroup groupname="config">
			<optionvalue name="name" value="test"/>
		</optiongroup>
	</optioncontainer>
</experiment>