From aloft!blink!att!ips.id.ethz.ch!roth Fri Jan 21 21:25:40 1994 Received: from blink.UUCP by aloft (4.1/DCS-aloft-M3.1) id AA24793; Fri, 21 Jan 94 21:25:40 EST Errors-To: aloft!blink!att!ips.id.ethz.ch!omr-request Received: from att.UUCP by blink.att.com (4.1/SMI-3.2) id AA04927; Fri, 21 Jan 94 21:21:01 EST Received: by att.att.com; Fri Jan 21 10:34:56 EST 1994 Received: from sitter (actually sitter-gw.ethz.ch) by bernina.ethz.ch with SMTP inbound; Fri, 21 Jan 1994 16:24:01 +0100 Received: by sitter id AA01976; Fri, 21 Jan 94 16:23:57 +0100 Reply-To: aloft!blink!att!ips.id.ethz.ch!omr Errors-To: aloft!blink!att!ips.id.ethz.ch!omr-request Sender: aloft!blink!att!ips.id.ethz.ch!omr-request Message-Id: <9401211523.AA01960@sitter> Received: from ips.id.ethz.ch (julia) by sitter id AA01960; Fri, 21 Jan 94 16:23:54 +0100 Received: by julia.ethz.ch; Fri, 21 Jan 94 16:23:49 +0100 To: att!ips.id.ethz.ch!omr Cc: att!ips.id.ethz.ch!roth Subject: MidiScan tested... Date: Fri, 21 Jan 1994 16:23:49 +0100 From: Martin Roth Musitek MidiScan ---------------- Abstract: This article describes the commercial OMR system MidiScan for Windows. It contains symbol counts of four pages and recognition rates (total recognition rate is 87%). Some recognition properties and basic assumptions of the programs are listed. Opinions given are my own... Author: Martin Roth, Eng. CS Steinstr. 58, CH-8003 Zurich, Switzerland e-mail: roth@ips.id.ethz.ch Date: January 1994 Table of Contents: 1) Product information 2) Description 3) Overall performance 4) Basic constraints 5) TIFF and TIF 6) Recognition Properties 7) Overall behaviour, personal opinion 8) Recognition results *********************************************************** * If you don't want to read it all, skip to the first * * table in chapter eight named "TOTALS" first! * * (search for the next occurrence of "TOTALS") * *********************************************************** My apologies for all the English spelling and grammar mistakes... Feel free to post any other opinions to omr@ips.id.ethz.ch! --------------------------------------------------------------- 1) PRODUCT INFORMATION MidiScan is a software for PC/Windows. The program was written by Christopher Newell (ZH Computer, Minneapolis) and Wladyslav Homenda (CPZH, Warsaw, Poland). The company is located at: Musitek, 410 Bryant Cir., Suite K, Ojai, CA 93023-4200, tel. (805) 646 8051 The software comes on one 3.5" Disk and uses about one MByte of disk space when installed. The manual suggests a 386 with 4 MBytes RAM as minimum configuration. MidiScan costs about sFr. 800.- here in Switzerland/Europe (US$ 550.-). --------------------------------------------------------------- 2) DESCRIPTION The program does not have an interface to a scanner (planned for future extension), but it reads a TIF file (for the problems with PC-TIF format see below) The recognition first searches the page for staves, showing the beginning and ending with a small inverted square. If the automatic recognition is not correct, it can be redone by hand by moving and, if necessary, resizing the beginning and ending marks with the mouse. After the staffs are located, the recognition is completely batch processing, not allowing any kind of interaction. After recognition is finished, a symbolic description is created (called MNOD, Music Notation Object Description), with which the image as recognized is reconstructed. An MNOD editor then allows to edit the MNOD structures and correct recognition mistakes by adding, deleting or changing symbols. The editor divides the screen horizontally and displays the original bitmap in the upper half, the MNOD image in the lower half; the two windows can be enlarged or scrolled together. With this layout, it's easy to compare the two versions and find mistakes. Once finished, the program creates a Standard MIDI file from the MNOD document. The MNOD document can be stored and loaded later, if other severe errors are found. For the future, Musitek plans to add the possibility to print MNOD files. --------------------------------------------------------------- 3) OVERALL PERFORMANCE If you like a simple percentage: MidiScan recognized 87 percent of the symbols in my few tests correctly. It seems to do a good job for high quality scans at 300 dpi. Recognition of a page takes in the order of some minutes on a reasonable PC (like a 33 MHz 486, >=4 MB RAM). In my opinion, the system performance can not be judged from the total percentage (87%). By looking at the first table of part 8, you'll see that stafflines, barlines and black note heads are all recognized with 90%-99% accuracy, while for example performance for white note heads is rather poor (only 60%). You can also see that black notes outside the stafflines were often wrong (wrong pitch), which is a hint that recognition of ledger lines is insufficient. --------------------------------------------------------------- 4) BASIC CONSTRAINTS The recognition of course just searches for symbols which can be expressed in the final MIDI file. This means it looks for stafflines, notes, accidentals, rests, ties (but not slurs), dots, barlines (including repetition marks and double bar lines), clefs and measures. It ignores all other symbols, such as accents, slurs, texts, tempo, fingering, volume markings... 5+ 01/21 Stephane Collart Re: Suggestion for testing (fwd)< parameters and thus brk quarter quarter note break / cannot be 'wrong' brk 8|16 smaller breaks (wrong if duration is not right) dot prolongation dot after a note head natural natural before a note (not for key changes) sharp sharp before a note (not for key changes) flat flat before note (not for key changes) key key changes (group of flats, sharps, naturals) measure time measure, like 2/4, 3/4 or 'C' clef vi violin (G) clef clef ba bass (F) clef clef c C clef ties ties (only counted if notes tied were correctly found) tuplets trioles and such counted as: indoc number of symbols in the document corr correctly recognized symbol corr% percentage of correctly recognized symbols f-hit false hits (symbol found where there is none, or another one), sometimes called "misdetection" wrong symbol found, but wrong parameters (wrong pitch for notes, wrong duration for flags|beams...), often called "misclassification" missed symbol not found (detection failed) (empty table entry means zero) =============== TOTALS (all counted pages) ==================== The names of table columns (symbol classes) and rows (recognition results) are explained just above! To help understanding the table: the number of symbols in the document (indoc) equals the correctly recognized (corr) plus the wrong ones (wrong) plus the missed ones (missed). The false hits are fake symbols generated by MidiScan which are not present the document at that location, they don't affect the 'corr%'. 'barline+' does not count for the totals, therefor the brackets. indoc corr corr% f-hit wrong missed staffs 41 39 95% 1 1 barlines 228 226 99% 2 barline+ (21) 7 33% (1) (14) black/line 307 279 91% 3 10 18 black/space 227 206 91% 6 15 black/out 255 235 92% 19 1 white/line 40 24 60% 6 10 white/space 29 16 55% 1 2 11 white/out 20 13 65% 13 2 5 flags|beams 107 92 86% 3 2 13 brk whole 19 2 11% 17 brk half 1 0 0% 1 brk quarter 104 82 78% 8 22 brk 8|16 16 14 88% 2 2 dot 52 44 85% 2 8 natural 29 26 90% 3 sharp 19 17 89% 2 flat 4 3 75% 1 key 43 32 74% 6 5 measure 4 0 0% 2 4 clef vi 25 25 100% 3 clef ba 12 12 100% 1 clef c 1 ties 20 10 50% 3 10 tuplets 4 0 0% 4 ========================================================= total 1606 1397 87% 40 56 153 ========================================================= Here are the tables for the documents (each table a page) I counted: (all documents scanned at 300 dpi). image: aeber02 contents: one-voice jazz standard indoc corr corr% f-hit wrong missed staffs 5 5 100% barlines 29 27 93% 2 barline+ (6) 4 80% (1) (2) black/line 37 13 35% 8 16 black/space 26 10 38% 5 11 black/out 1 0% 1 white/line 11 4 36% 6 1 white/space 8 1 13% 1 6 white/out 2 2 100% 4 flags|beams 10 8 80% 2 brk whole brk half 1 0% 1 brk quarter 1 0% 7 1 brk 8|16 6 5 83% 2 1 dot 14 11 79% 1 3 natural 1 0% 1 sharp flat 1 0% 1 key 1 0% 1 measure 1 0% 2 1 clef vi 1 1 100% 3 clef ba 1 clef c 1 ties 9 5 56% 4 tuplets 4 0% 4 --------------------------------------------------------- total 169 92 55% 21 26 51 images: boni1-3 contents: piano and (smaller) solo line, classical (waltz) indoc corr corr% f-hit wrong missed staffs 12 12 100% barlines 59 59 100% barline+ (3) 0% (3) black/line 95 94 99% 1 black/space 70 69 99% 1 black/out 71 69 97% 2 white/line 13 13 100% white/space 7 7 100% white/out 6 6 100% 2 flags|beams 64 58 91% 6 brk whole 9 2 22% 7 brk half brk quarter 11 9 82% 1 2 brk 8|16 dot 17 15 88% 2 natural 9 9 100% sharp 9 9 100% flat 1 1 100% key 12 12 100% measure 3 0% 3 clef vi 8 8 100% clef ba 4 4 100% clef c ties 2 2 100% tuplets --------------------------------------------------------- total 482 458 95% 3 4 20 indoc corr corr% f-hit wrong missed staffs 12 11 92% 1 barlines 75 75 100% barline+ (6) 0% (6) black/line 92 89 97% 1 1 2 black/space 51 50 98% 1 black/out 92 79 86% 12 1 white/line 7 1 14% 6 white/space 4 3 75% 1 white/out 6 3 50% 4 3 flags|beams 17 11 65% 2 6 brk whole 10 0% 10 brk half brk quarter 37 23 62% 14 brk 8|16 6 5 83% 1 dot 8 6 75% 2 natural 9 8 89% 1 sharp 4 3 75% 1 flat 2 2 100% key 15 9 60% 2 4 measure clef vi 8 8 100% clef ba 4 4 100% clef c ties 6 3 50% 3 3 tuplets --------------------------------------------------------- total 465 393 85% 10 16 56 indoc corr corr% f-hit wrong missed staffs 12 11 92% 1 barlines 65 65 100% barline+ (6) 3 50% (3) black/line 83 83 100% 2 black/space 80 77 96% 3 black/out 91 87 96% 4 white/line 9 6 67% 3 white/space 10 5 50% 1 1 4 white/out 6 2 33% 1 2 2 flags|beams 16 15 94% 1 1 brk whole brk half brk quarter 55 50 91% 5 brk 8|16 4 4 100% dot 13 12 92% 1 1 natural 10 9 90% 1 sharp 6 5 83% 1 flat key 15 11 73% 3 1 measure clef vi 8 8 100% clef ba 4 4 100% clef c ties 3 0% 3 tuplets --------------------------------------------------------- total 490 454 93% 6 10 26 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This mail was distributed by the omr mailing list. Please send contributions to: omr@ips.id.ethz.ch Contact the list administrator as: omr-request@ips.id.ethz.ch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~