From aloft!blink!att!ips.id.ethz.ch!roth Fri Jan 21 21:25:40 1994
Received: from blink.UUCP by aloft (4.1/DCS-aloft-M3.1)
	id AA24793; Fri, 21 Jan 94 21:25:40 EST
Errors-To: aloft!blink!att!ips.id.ethz.ch!omr-request
Received: from att.UUCP by blink.att.com (4.1/SMI-3.2)
	id AA04927; Fri, 21 Jan 94 21:21:01 EST
Received: by att.att.com; Fri Jan 21 10:34:56 EST 1994
Received: from sitter (actually sitter-gw.ethz.ch) by bernina.ethz.ch 
          with SMTP inbound; Fri, 21 Jan 1994 16:24:01 +0100
Received: by sitter id AA01976; Fri, 21 Jan 94 16:23:57 +0100
Reply-To: aloft!blink!att!ips.id.ethz.ch!omr
Errors-To: aloft!blink!att!ips.id.ethz.ch!omr-request
Sender: aloft!blink!att!ips.id.ethz.ch!omr-request
Message-Id: <9401211523.AA01960@sitter>
Received: from ips.id.ethz.ch (julia) by sitter id AA01960;
          Fri, 21 Jan 94 16:23:54 +0100
Received: by julia.ethz.ch; Fri, 21 Jan 94 16:23:49 +0100
To: att!ips.id.ethz.ch!omr
Cc: att!ips.id.ethz.ch!roth
Subject: MidiScan tested...
Date: Fri, 21 Jan 1994 16:23:49 +0100
From: Martin Roth <aloft!blink!att!ips.id.ethz.ch!roth>

    Musitek MidiScan
    ----------------

Abstract:

    This article describes the commercial OMR system MidiScan
    for Windows. It contains symbol counts of four pages
    and recognition rates (total recognition rate is 87%).
    Some recognition properties and basic assumptions of
    the programs are listed. Opinions given are my own...

Author:

    Martin Roth, Eng. CS
    Steinstr. 58, CH-8003 Zurich, Switzerland
    e-mail: roth@ips.id.ethz.ch

Date:
	
    January 1994

Table of Contents:

    1) Product information
    2) Description
    3) Overall performance
    4) Basic constraints
    5) TIFF and TIF
    6) Recognition Properties
    7) Overall behaviour, personal opinion
    8) Recognition results

***********************************************************
*   If you don't want to read it all, skip to the first   *
*       table in chapter eight named "TOTALS" first!      *
*       (search for the next occurrence of "TOTALS")      *
***********************************************************

My apologies for all the English spelling and grammar
mistakes...

Feel free to post any other opinions to omr@ips.id.ethz.ch!

---------------------------------------------------------------

1) PRODUCT INFORMATION

MidiScan is a software for PC/Windows. The program was written
by Christopher Newell (ZH Computer, Minneapolis) and Wladyslav
Homenda (CPZH, Warsaw, Poland). The company is located at:

Musitek, 410 Bryant Cir., Suite K, Ojai, CA 93023-4200,
tel. (805) 646 8051

The software comes on one 3.5" Disk and uses about one MByte
of disk space when installed. The manual suggests a 386 with
4 MBytes RAM as minimum configuration. MidiScan costs about
sFr. 800.- here in Switzerland/Europe (US$ 550.-).

---------------------------------------------------------------

2) DESCRIPTION

The program does not have an interface to a scanner (planned
for future extension), but it reads a TIF file (for the problems
with PC-TIF format see below) The recognition first searches
the page for staves, showing the beginning and ending with a
small inverted square. If the automatic recognition is not
correct, it can be redone by hand by moving and, if necessary,
resizing the beginning and ending marks with the mouse.

After the staffs are located, the recognition is completely
batch processing, not allowing any kind of interaction.

After recognition is finished, a symbolic description is
created (called MNOD, Music Notation Object Description),
with which the image as recognized is reconstructed.

An MNOD editor then allows to edit the MNOD structures and
correct recognition mistakes by adding, deleting or changing
symbols. The editor divides the screen horizontally and displays
the original bitmap in the upper half, the MNOD image in the
lower half; the two windows can be enlarged or scrolled
together. With this layout, it's easy to compare the two
versions and find mistakes.

Once finished, the program creates a Standard MIDI file from
the MNOD document. The MNOD document can be stored and loaded
later, if other severe errors are found. For the future,
Musitek plans to add the possibility to print MNOD files.

---------------------------------------------------------------

3) OVERALL PERFORMANCE

If you like a simple percentage: MidiScan recognized 87 percent
of the symbols in my few tests correctly. It seems to do a good
job for high quality scans at 300 dpi.

Recognition of a page takes in the order of some minutes on a
reasonable PC (like a 33 MHz 486, >=4 MB RAM).

In my opinion, the system performance can not be judged from
the total percentage (87%). By looking at the first table
of part 8, you'll see that stafflines, barlines and black
note heads are all recognized with 90%-99% accuracy, while
for example performance for white note heads is rather poor
(only 60%). You can also see that black notes outside the 
stafflines were often wrong (wrong pitch), which is a hint
that recognition of ledger lines is insufficient.

---------------------------------------------------------------

4) BASIC CONSTRAINTS

The recognition of course just searches for symbols which can
be expressed in the final MIDI file. This means it looks for
stafflines, notes, accidentals, rests, ties (but not slurs),
dots, barlines (including repetition marks and double bar lines),
clefs and measures. It ignores all other symbols, such as
accents, slurs, texts, tempo, fingering, volume markings...
   5+ 01/21 Stephane Collart   Re: Suggestion for testing (fwd)<<Forwarded mess
---------------------------------------------------------------

5) TIFF and TIF: PC versus the rest of the world

MidiScan reads only uncompressed files in 'TIF' format. However,
PC people seem to understand something else than all other
people when talking about TIFF. Usually, TIFF (Tagged Image
File Format) refers to the image file format defined in:
TIFF Revision 6.0, June 3, 1992, Aldus Developers Desk, Aldus
Corp, 411 1st Av. South, Seattle, WA 98104-2871.

This file format is very general. Depending on the byte order
it starts with the four bytes "MM\0*" (Motorola) or "II*\0"
(Intel). The files MidiScan reads don't start with either of
the two possibilities... 

Nevertheless, the 'PC-TIF' seems to be the format that most
scanning software produces on PC's. I'm using a scanner on a
workstation (UNIX) generating standard TIFF files, which are
rejected by MidiScan. The files have to be converted first on
the PC from TIFF to PC-TIF using a program like Windows
PaintShop (Shareware).

Anyway, MidiScan should be able to read at least a few of the
real standards, such as TIFF or GIF. Furthermore, it should
read the compressed version; saving a full page of 300 dpi
bitmap needs about 1.5 MBytes, whereas even simple run-length
encoding would allow to store the document in only 200 kBytes.

---------------------------------------------------------------

6) RECOGNITION PROPERTIES

The recognition part seems to do a good job if the scans are
high quality. If the quality is not optimal, the recognition
rate drops rapidly. The program is not of much use when
scanning at 200 dpi, for example.

Some assumptions were obviously made:

- Staves must be straight. MidiScan seems to have build in
  routines to correct for skew, but always assumes straight
  lines for the staves. If staves are bent slightly (as often
  the case when scanning from books), the recognized notes will
  show a systematic error where the line bend more than half a
  staffline distance (pitches of all note heads in that area
  wrong).

- Each staff must begin with a clef. At the beginning of each
  staff, MidiScan always finds one of the clef symbols,
  regardless if there is anything roughly looking like a clef
  or not.

Obviously, MidiScan searches for note heads and stems, and
later tries to count the number of beams or flags. Sometimes,
this results in the wrong type (flags instead of beams), but
this doesn't affect the MIDI file (as long as the number is
correct).

MidiScan has severe problems recognizing time measures (4/4,
3/4 and so on). Tuplet numbers (triplets etc.) have to be added
by hand most of the time.

---------------------------------------------------------------

7) OVERALL BEHAVIOUR, PERSONAL OPINION

For good scans, MidiScan does a fairly good job. Including the
work necessary for scanning, running the programs and carefully
finding and editing the mistakes, the overall time is still
comparable to the time needed to input a page of music by means
of a keyboard or mouse. For dense pages in good quality print
and good scan, working with MidiScan is probably faster, in all
other cases, it's probably still faster by inputing the notes by
keyboard and mouse.

Some bugs and crashes occur far too often. For example, MidiScan
simply crashed without comment when loading a certain image file
I used. After deleting a very small part of the white border, it
accepted the file. On other files, MidiScan always crashed
after the some pages when processing all the pages of a
document, but worked without crash when doing it in little
steps, one page at a time. As it was not able to use all pages
at once, the resulting MIDI files of each page had to be
assembled later in the sequencer program...

Locating stafflines does not seem to be a strong point of
MidiScan: often, one line on a page was omited or recognized
completely wrong. Due to the structure of the program, you
always have to run the automatic search for stafflines first,
and decide after that if you want to correct it by hand. If you
don't pay attention while doing the stafflines by hand and miss
the line with was wrong, you have to start all over, including
the automatic recognition (of which you already know it won't
work correctly).

IMHO, the program is NOT worth its price in the current form for
a musician. The crashes occur far too often, and not being able
to run documents with several pages at one means additional
patchwork in the sequencer or notation package when you import
the MIDI files. Software sold for sFr. 800.- (approx. US$ 550.-)
should not feature this kind of bugs, even if it's PC Windows
software (sorry, couldn't resist THAT one...). Let's wait for
the first updates (which I really hope are free for the few
people that already bought the package!).

The recognition part seems fairly good (compared with what I saw
from my own work and in papers from others). The staffline
recognition certainly can be improved (IMHO, algorithm used by
others and myself are better at finding and tracking the
stafflines). However, for musicians who don't want to know
about pattern recognition problems and just expect a program
to quickly scan the printed pages and generate a MIDI file
within 10 minutes, MidiScan will be disappointing.

The manual is okay as far as the usage of the program is
concerned. It should give some hints about how to obtain good
results. For example, my dealer didn't realize that scanning
with a light skew might affect the recognition result.. the
manual should mention whether scanning several times to get the
lines horizontal is worth the time it takes. It should show
examples of how to set 'contrast' or 'darkness' of the scanner:
is it better if the lines are very thin, or better if the scan
is so dark that lines are quite thick? Can the system handle
breaks in the lines? What about breaks in the stems? It should
also mention the general assumptions (straight lines, first
symbol on each staff is a clef).

(note that the opinions expressed above are my own; your mileage
may vary. This is NOT an official report of ETH Zurich, just a
personal text from a single MidiScan user)

---------------------------------------------------------------

8) RECOGNITION RESULTS IN DETAIL

In order to find out more than just a global percentage, I
counted some examples by hand (quite a time-consuming process)
according to the following scheme:

- a symbol which cannot be recognized by the MidiScan due to the
  basic limitations (no representation in MIDI) does not count.

- a symbol that should be recognized can either be CORRECT,
  WRONG (correct symbol, but wrong parameters, such as a wrong
  pitch on a note head), MISSING (not found as the correct
  symbol by MidiScan) or FALSE-HIT (meaning MidiScan found a
  symbol where none, or a different one, was).

- Percentages are defined as the amount of correctly recognized
  symbols out of all the symbols (which should have been
  recognized). Note that FALSE HITS have no influence on the
  percentage.

- Due to the procedure of counting, some errors count twice: if
  instead of a time measure (like 3/4) MidiScan recognized a
  quarter break, this is a "MISSING" of a time measure as well
  as a "FALSE HIT" for breaks.

- The elements of "note" symbols are counted separately: for
  both black and white notes: on space, on a line, and outside
  the stafflines. The stems usually are no problems for
  recognition and are not counted. Beams and flags are counted
  separately.

- Barlines were mostly found, but attributes such as "double
  barline" or "repetition mark" were missing. Although these
  attributes show up in the MNOD representation, they don't
  change the MIDI file generated. Counts were done for `bar-
  lines' (correct if the barline were located) and 'barline+'
  (correct if the style (simple, double, repetition) was right).

- Beams and flags were sometimes mistaken, which does not affect
  the MIDI file if the same number is recognized. Thus 'beams
  and flags' were counted together, the type (beam or flag) did
  not have any influence, only the wrong number of flags or beams
  makes the symbol "wrong".

Abbreviation for symbol classes used below:

staffs      correct if staff is found without manual changes
barlines    correct if barlines are recognized, regardless of
            type (double, repetition..)
barline+    correct if special barlines (double, repetition)
            are found correctly. These errors don't affect
            the MIDI file, thus THIS LINE IS EXCLUDED FROM
            THE TOTAL given at the bottom of the table.
black/line  black notes (head plus stem) on a staffline
black/space black notes (head plus stem) between two stafflines
black/out   black notes (head plus stem) above or below staff
white/line  white notes (head, with or without stem) on a line
white/space white notes between two stafflines
white/out   white notes above or below the staff
flags|beams flags or beams (correct if number recognized is
            correct. Type does not affect the MIDI file).
brk whole   whole note break   \  These symbols have no
brk half    half note break     > parameters and thus 
brk quarter quarter note break /  cannot be 'wrong'
brk 8|16    smaller breaks (wrong if duration is not right)
dot         prolongation dot after a note head
natural     natural before a note (not for key changes)
sharp       sharp before a note (not for key changes)
flat        flat before note (not for key changes)
key         key changes (group of flats, sharps, naturals)
measure     time measure, like 2/4, 3/4 or 'C'
clef vi     violin (G) clef 
clef ba     bass (F) clef
clef c      C clef
ties        ties (only counted if notes tied were correctly found)
tuplets     trioles and such

counted as:

indoc       number of symbols in the document
corr        correctly recognized symbol
corr%       percentage of correctly recognized symbols
f-hit       false hits (symbol found where there is none, or
            another one), sometimes called "misdetection"
wrong       symbol found, but wrong parameters (wrong pitch
            for notes, wrong duration for flags|beams...),
            often called "misclassification"
missed      symbol not found (detection failed)

(empty table entry means zero)


=============== TOTALS (all counted pages) ====================

The names of table columns (symbol classes) and rows
(recognition results) are explained just above!

To help understanding the table: the number of symbols in
the document (indoc) equals the correctly recognized (corr)
plus the wrong ones (wrong) plus the missed ones (missed).
The false hits are fake symbols generated by MidiScan which
are not present the document at that location, they don't
affect the 'corr%'. 'barline+' does not count for the
totals, therefor the brackets.

            indoc   corr    corr%   f-hit   wrong   missed
staffs       41      39      95%              1       1
barlines    228     226      99%              2
barline+    (21)      7      33%     (1)    (14)
black/line  307     279      91%      3      10      18
black/space 227     206      91%              6      15
black/out   255     235      92%             19       1
white/line   40      24      60%              6      10
white/space  29      16      55%      1       2      11
white/out    20      13      65%     13       2       5
flags|beams 107      92      86%      3       2      13
brk whole    19       2      11%                     17
brk half      1       0       0%                      1
brk quarter 104      82      78%      8              22
brk 8|16     16      14      88%      2               2
dot          52      44      85%      2               8
natural      29      26      90%                      3
sharp        19      17      89%                      2
flat          4       3      75%                      1
key          43      32      74%              6       5
measure       4       0       0%      2               4
clef vi      25      25     100%      3
clef ba      12      12     100%      1
clef c                                1
ties         20      10      50%      3              10
tuplets       4       0       0%                      4
=========================================================
total      1606    1397      87%     40      56     153
=========================================================

Here are the tables for the documents (each table a page)
I counted: (all documents scanned at 300 dpi).

image:      aeber02
contents:   one-voice jazz standard
    
            indoc   corr    corr%   f-hit   wrong   missed
staffs        5       5     100%          
barlines     29      27      93%              2
barline+     (6)      4      80%     (1)     (2)
black/line   37      13      35%              8      16
black/space  26      10      38%              5      11
black/out     1               0%              1
white/line   11       4      36%              6       1
white/space   8       1      13%              1       6
white/out     2       2     100%      4
flags|beams  10       8      80%              2
brk whole                   
brk half      1               0%                      1
brk quarter   1               0%      7               1
brk 8|16      6       5      83%      2               1
dot          14      11      79%      1               3
natural       1               0%                      1
sharp                       
flat          1               0%                      1
key           1               0%              1
measure       1               0%      2               1
clef vi       1       1     100%      3
clef ba                               1
clef c                                1
ties          9       5      56%                      4
tuplets       4               0%                      4
---------------------------------------------------------
total       169      92      55%     21      26      51   
    

images:     boni1-3
contents:   piano and (smaller) solo line, classical (waltz)

            indoc   corr    corr%   f-hit   wrong   missed
staffs       12      12     100%            
barlines     59      59     100%        
barline+     (3)              0%             (3)
black/line   95      94      99%              1
black/space  70      69      99%              1
black/out    71      69      97%              2
white/line   13      13     100%
white/space   7       7     100%
white/out     6       6     100%      2
flags|beams  64      58      91%                      6
brk whole     9       2      22%                      7
brk half          
brk quarter  11       9      82%      1               2
brk 8|16             
dot          17      15      88%                      2
natural       9       9     100%
sharp         9       9     100%
flat          1       1     100%
key          12      12     100%
measure       3               0%                      3
clef vi       8       8     100%
clef ba       4       4     100%
clef c    
ties          2       2     100%
tuplets     
---------------------------------------------------------
total       482     458      95%      3       4      20



            indoc   corr    corr%   f-hit   wrong   missed
staffs       12      11      92%              1
barlines     75      75     100%
barline+     (6)              0%             (6)
black/line   92      89      97%      1       1       2
black/space  51      50      98%                      1
black/out    92      79      86%             12       1
white/line    7       1      14%                      6
white/space   4       3      75%                      1
white/out     6       3      50%      4               3
flags|beams  17      11      65%      2               6
brk whole    10               0%                     10
brk half    
brk quarter  37      23      62%                     14
brk 8|16      6       5      83%                      1
dot           8       6      75%                      2
natural       9       8      89%                      1
sharp         4       3      75%                      1
flat          2       2     100%
key          15       9      60%              2       4
measure     
clef vi       8       8     100%
clef ba       4       4     100%
clef c    
ties          6       3      50%      3               3
tuplets     
---------------------------------------------------------
total       465     393      85%     10      16      56



            indoc   corr    corr%   f-hit   wrong   missed
staffs       12      11      92%                      1
barlines     65      65     100%
barline+     (6)      3      50%             (3)
black/line   83      83     100%      2
black/space  80      77      96%                      3
black/out    91      87      96%              4
white/line    9       6      67%                      3
white/space  10       5      50%      1       1       4
white/out     6       2      33%      1       2       2
flags|beams  16      15      94%      1               1
brk whole   
brk half    
brk quarter  55      50      91%                      5
brk 8|16      4       4     100%
dot          13      12      92%      1               1
natural      10       9      90%                      1
sharp         6       5      83%                      1
flat                  
key          15      11      73%              3       1
measure             
clef vi       8       8     100%
clef ba       4       4     100%
clef c    
ties          3               0%                      3
tuplets     
---------------------------------------------------------
total       490     454      93%      6      10      26



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  This mail was distributed by the omr mailing list.
  Please send contributions to: omr@ips.id.ethz.ch
  Contact the list administrator as: omr-request@ips.id.ethz.ch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

