Data abstraction

The primary purpose of computer systems is to manipulate data. However, data can be presented in different ways – even the same data can be presented in different ways. The consequence of this is that as applications are developed (by different people in different places) to serve related purposes (for example, different aspects of music analysis) related and even identical data can be expressed in different, non-interoperable encodings, and, very often, these encodings are built in to the software, so that one researcher’s software can be used only on their particular choice of data encoding.

Data abstraction is a means to circumvent this problem. The idea is to identify and make explicit the properties of the data which are specifically relevant to a particular task or situation to be addressed by a program, and then supply library functions that will allow the program to manipulate that data without explicit reference to its actual syntax. This means that other

data, represented in other ways, can then be used with the same program, conditional only upon the supply of appropriate library functions for that other data representation.

To put this in context, consider the example given in a paper published in the early 1990s. First, we represented Syrinx by Debussy in a rich representation using the words of the UK version of common music notation (CMN): crotchet, quaver and so on; and a full equal-tempered chromatic representation of pitch: A, G, E and so on. Next, we built a library of software functions for comparing, adding, subtracting, multiplying and supplying constants for data represented in this language. Then we built a program to carry out a procedure of paradigmatic analysis as defined by Ruwet, using the library defined over the data. The program was then used to produce an analysis of the piece similar to that of Nattiez.

The difference between this and the more conventional approach to programming is the existence of the library, mediating between the analysis program and the data. However, so far, little is gained, except a more intellectually complete representation.

The analysis program was then run on the new representation, using the new library, but without any change to the analysis program itself. The analysis resulting from this was compared with the original, and found to be identical. Thus, the program was shown to be operable over an alternative representation encoding the necessary musical features of the data, notwithstanding the detail of the computer encoding.

Finally, to demonstrate the power of the simple abstraction used, we represented a piece of music in a completely different pitch system using integers, in a way similar to the second representation above. This piece of music was the first of the Three Pieces for Two Quarter-tone Pianos by Charles Ives; as the title suggests, these pieces are written for a scale with 24 divisions per octave instead of the more familiar 12. Notwithstanding the fact that the pieces are in this very different pitch vocabulary, the paradigmatic analysis approach is in principle applicable to it; we were able to run our analysis program on the new data, again without changing the program. All that was necessary was the implementation of a library containing functions for the musically relevant operations on the data in terms of which the analysis program was specified. The theoretical line drawn between the operations of the analysis process and those which define the data types is called the abstraction boundary.

Data abstraction