How much more could art history be about?

The art historian Dr. Griselda Pollock, best known for her work on feminist art history, recently published a short essay in response to a computer vision research paper written by Babak Saleh, Kanako Abe, Ravneet Singh Arora, and Ahmed Elgammal. The project, called “Toward Automated Discovery of Artistic Influence,” uses computer vision algorithms and machine learning techniques to find compositional similarities between paintings. Dr. Pollock is quite critical, not of the project’s implementation, about which she appears to have no interest, but of the terms in which it is framed and its purported goals.

Some of Dr. Pollock’s objections are flimsy. Her claim that this algorithm buttresses the Western Art Historical Canon is trivial, easily addressed by adding works from non-western artists and craftsmen to the corpus of images for comparison. More opaque are her comments about the algorithm’s possible complicity in the art market’s perpetuation of this same canon via a shared emphasis on connoisseurship. In light of the global flow of capital her assumption that auction houses are currently invested in the defense of a single, Western canon is questionable. Granted, these are minor points in her critique.

Others are more serious. She goes so far as to assert that computers have no place in art history:

Even at the most basic level, machines would not be helpful in developing these larger narratives. The idea that machines can see or notice what human beings do not is a fallacy, because the machine is only doing what it is told – and it is the programmers who are setting parameters. But those parameters are based on a woefully old-fashioned and dull misunderstanding of what art historians do, and what they look for.

Setting aside the fact that computers equipped with infra-red and multispectral sensors are literally capable of perceiving works of art in ways that humans cannot, Dr. Pollock does not support her unqualified dismissal of technology. Her objection that the algorithm is based on a “woefully old-fashioned and dull misunderstanding of what art historians do” is not proof that computers cannot contribute to developing larger art historical narratives, but in fact an argument for why art historians should be involved in designing such algorithms.

Dr. Pollock’s critique suffers from her own failure (one might go so far as to call it woeful) to grasp the methodological standards of another discipline. If we recognize that computer scientists solve complex problems by first breaking them up into discrete tasks that can be reintegrated once solved, then it makes little sense to chastise them for failing to program an artificially intelligent art historian that operates in accordance with current methodological trends. We might instead congratulate them on beginning to solve one of its many components.

The big question is not that Caillebotte (one of the examples given) was influenced by Degas. Instead it is what he did with that “Degas-ness”. Did he get what Degas was doing?

If we are to use computers to help us answer broader social questions about art, they must first be able to recognize compositional, technical, and iconographic parallels between works – just as any student of art history must. Many of the uninspiring or quirky compositional comparisons produced by this particular algorithm would be perfectly at home in a freshman’s first awkward writing assignment.

Dr. Pollock’s essay is marked not by the critical rigor of her superb art historical analysis, but instead by the anxiety of everyone who has ever seen their industry transformed by a new technology. Pollock sounds like a manuscript scribe writing a reaction essay to an early version of the printing press, pointing out its flaws and ignoring its revolutionary potential. However, this anxiety, like much anxiety about the digital humanities, is fundamentally misplaced. We are not going to be replaced by machines. Considering how cheap the labor of art historians currently is, it is doubtful that anyone will invest the time and money necessary to create a program that would do the myriad tasks that Dr. Pollock outlines. Anyone, that is, besides an art historian.

If members of our discipline want to ensure that the methodological insights of the last century make their way into digital art history, then we should spend less time criticizing the efforts of computer scientists and more time participating in them. Someday, perhaps even within the next decade, we may see the development of programs that combine text mining tools, visual object recognition algorithms, formal comparison algorithms, and social network analyses in order to produce highly contextualized studies of social art historical phenomena. Art historians must at least learn to use, or better yet to help develop, these technologies if we are to benefit from them rather than be rendered obsolete by them.

Ironically, Dr. Pollock’s article is illustrated by a photograph of a salon-style gallery that perpetuates the idea of art history as a discipline that defends an arbitrary and exclusive Western Canon. It is accompanied a propagandistic caption: “Thinking does not need machines.” Thinking may not need computers, but their ability to process enormous amounts of information systematically and quickly allows us to think about previously overwhelming questions in a new way. To blithely dismiss the utility of these tools is to willfully cripple one’s own discipline out of fear and misunderstanding.

GLAMHackPhilly: Empire and the Collection of the British Museum

Featured

This weekend I got together with a bunch of other brilliant, dedicated people to participate in GLAMHackPhilly (that’s GLAM as in Gallery, Library, and Art Museum, not genre of music popularized by David Bowie). My team worked with the British Museum’s linked open data, trying to see if we could trace trends in the museum’s acquisitions that would be easy to hypothesize, but, previously at least, time consuming if not very difficult to collect meaningful data on. The main question was to see how many objects were coming into the museum from colonies of the British Empire vs. non-colonies over the history of the museum.

Acquiring the Data

Part of our team tried using SPARQL to find objects in the British Museum open linked data that have 1) a date associated with their acquisition by the museum and 2) a location of origin [was present at]. The tricky part is getting uniform specificity in the location, because object records contain locations including archaeological sites, cities, regions, countries, or even continents. Fortunately, locations have a “broader” predicate that allows you to navigate from more specific to less specific geographic locations. Taking the country as our unit, we tried to use SPARQL to find the country associated with each location, but because no one was very familiar with SPARQL, it’s entirely possible that our solution isn’t actually accomplishing what we wanted it to do.

So what is SPARQL? SPARQL is a query language designed to allow people to search structured semantic data written in RDF (Resource Description Framework). In RDF, different types of “objects” are connected to each other using relationships. People talk about these objects and relationships in grammatical terms, as subject, predicate, and object. For example, a museum item can be connected to a geographical location using a relationship (or “predicate”) such as “was present at.” Together, the subject, predicate, and object are known as a “triple.” SPARQL works by allowing you to substitute variables for one or more entities in a triple. There are four different types of SPARQL queries. We used the SELECT query to retrieve entities and return them in a table, but you can also return results in RDF using a CONSTRUCT query, ask true or false questions about entities and predicates with an ASK query, or extract an RDF graph using a DESCRIBE query.

While the British Museum’s SPARQL access point has a couple of examples of SPARQL queries, there are many more examples of more complex queries available at the Europeana SPARQL endpoint. I’d also recommend this How to SPARQL walk-through.

Part of our query is pretty straightforward – find all of the objects that have a date of acquisition associated with them. I think we accomplished this in SPARQL with this query:

SELECT DISTINCT ?yearAcquired WHERE
{
?item ecrm:P30i_custody_transferred_through ?hasCustody .
?hasCustody ecrm:P4_has_time-span ?hasTime .
?hasTime rdfs:label ?yearAcquired .
}

Every word with a “?” prepended to it is a variable, while all of the predicates are tied to a namespace specified using ecrm:, rdfs:, and skos:. So what does this particular query do? First, it finds all entities that have acquisition data associated with them: ?item ecrm:P30i_custody_transferred_through ?hasCustody. (If we were being more rigorous in our query, which we probably should have been, we might check to ensure that this custody transfer had the British Museum as its object, but I’m guessing that most of the objects in the museum would only have a “custody transfer” related to their acquisition by the museum.) Then it narrows these down to entities for which the acquisition event is linked to a date: ?hasCustody ecrm:P4_has_time-span ?hasTime. Finally, it gets the human readable “label” of this date: ?hasTime rdfs:label ?yearAcquired.

At least, that’s what I think this query does. We can test this query by substituting a concrete date for the final variable:

SELECT DISTINCT * WHERE {
?item ecrm:P30i_custody_transferred_through ?hasCustody .
?hasCustody ecrm:P4_has_time-span ?hasTime .
?hasTime rdfs:label "1900" .
}

This query should return only those entities which have a “custody transfer”/acquisition date of 1900. While I’m not sure that it returns all of the objects that were acquired in 1900, it certainly returns a lot of them, and it doesn’t return any objects that were acquired in other years.

Now that we have all of the items with acquisition dates, we can narrow down these items to only those that also have a location (any location, besides the British Museum) associated with them: ?item ecrm:P12i_was_present_at ?placeThing. Because of how the British Museum specifies places, must go a step further and find the actual location where the “being present” of the item “took place at”: ?placeThing ecrm:P7_took_place_at ?locality. Finally, we can get the human-readable label of the location: ?locality skos:prefLabel ?localityLabel. The whole query then looks as follows:

SELECT ?countryLable ?yearAcquired WHERE {
?item ecrm:P30i_custody_transferred_through ?hasCustody .
?hasCustody ecrm:P4_has_time-span ?hasTime .
?hasTime rdfs:label ?yearAcquired .
?item ecrm:P12i_was_present_at ?placeThing .
?placeThing ecrm:P7_took_place_at ?locality .
?locality skos:prefLabel ?countryLabel .
}

But this is where we run into a problem. Although I have optimistically used the variable ?countryLabel to name the location of the object, in fact the types of locations specified in the item records are quite variable. They can range from specific obscure archaeological sites to entire continents, depending on the provenance of the object. Trying to identify these locations using a place-name authority would be something of a nightmare, and probably an abysmal failure. Fortunately, the British Museum has linked each of these locations to more general locations with the predicate skos:broader. These more general locations are in turn linked to even more general locations. So every single location theoretically can be traced back to a continent using the skos:broader predicate.

At the very least, we need to filter the results of this query to return only those which actually have a country as the type of location linked from their record. We can do this by adding a filer which makes sure that the locality type is country (or city state, if we’re in Ancient Greece). Here’s the whole query, because the syntax is important:

SELECT * WHERE {
?item ecrm:P30i_custody_transferred_through ?hasCustody .
?hasCustody ecrm:P4_has_time-span ?hasTime .
?hasTime rdfs:label ?yearAcquired .
?item ecrm:P12i_was_present_at ?placeThing .
?placeThing ecrm:P7_took_place_at ?locality .
?locality skos:prefLabel ?countryLabel .
?locality ecrm:P2_has_type ?localityType .
?localityType skos:prefLabel ?localityTypeLabel .
Filter(?localityTypeLabel = "country or city-state")
}

For the question we initially wanted to pose of the data, we need to find the country associated with the more or less precise location that is actually linked to each item. We figured out a verbose (read: very ugly) way to do this using the skos:broader predicate and UNION to join together multiple filtered subqueries, but it was very slow to run. I suspect there is a more elegant way to do this, although I have yet to test this. In any case, because this more robust search took too long, we ended up presenting the data from the preliminary search, which only returned items that had countries (rather than sites or cities or regions) as their location type – a small fraction of the total items with location data, much less of all of the items in the British Museum’s Semantic Web Collection.

Binning the Data

Once we had a query that was able to find all of the items acquired in a given year and group them by country, another member of the group wrote a small python program that iterated over years and added the number of items acquired in each country to a table. She also modified this program to sum the years into decades, making it easier to read and analyze the data. I’m pretty sure she was using something like this SPARQL wrapper.

Analysis and Visualization

Finally, I summed the acquisitions from the former colonies of the British Empire and subtracted that from the total acquisitions in each year in order to arrive at two numbers of acquisitions for each year: one from countries that were at some point British colonies, and one from countries were never officially part of the British Empire. Feel free to check out the spreadsheet with acquisitions by year, and that with the acquisitions by decade. (Again, these are only the items that had a country as the object of their was_present_at predicate because the more comprehensive query took too long to run the second day.)

I graphed the final colony vs. non-colony totals, first by year:

and then by decade, which is a lot easier to read even if it is less granular:

Another member of the group made choropleth maps showing the number of items received from each country, marking the top ten sources of items in red, the next ten countries in orange, and the next ten countries in yellow. I’m working on trying to make my own maps using google fusion tables.

Conclusions

I’m pretty reluctant to try to draw any conclusions based on these graphs for many reasons. I don’t have a sense of how exhaustive our query was, either in terms of looking in the right places for acquisition data, or in terms of finding all of the items that had locations and acquisition dates. I’m pretty sure we also lost a lot, at least in the query that produced this data, in only looking at items that had a country as their specified location. Objects with more precise locations, although retrievable with our more sophisticated query, aren’t represented in these graphs. Beyond that, I’m not sure to what extent the digital records reflect the actual acquisitions of the museums. They may simply not have records available for some of the earlier years.

That said, I’m pretty proud of how much we learned just trying to answer this question, in only two days. Considering that no one on the team had ever used SPARQL before, we managed to make a surprising amount of progress towards the goal that we set for ourselves yesterday morning. While SPARQL queries take a little getting used to, I can definitely see myself playing around with them. It seems like it should be possible to use for everything from automatic bibliography generation (just find all of the sources that items related to a category are mentioned in) to more complex network analyses linking humans and museum objects in surprising ways. I’m really excited to see how linked open data continues to change scholarship and curatorial practice.

Appendix

I made a gist of our hot mess of a SPARQL query. If you have any ideas on how to make this better, please let me know.

Reflectance Transformation Imaging

One of the techniques that I was able to use during my curatorial internship in the Bryn Mawr Special Collections last spring was Reflectance Transformation Imaging (also known as polynomial texture mapping). The technique uses an algorithm to reconstruct the surface normals of an object based on changes in illumination position. That I was able to use this technique at all is thanks to the non-profit Cultural Heritage Imaging project’s excellent tutorials, as well as to funding for equipment from Bryn Mawr College Special Collections.

I haven’t blogged about my use of this technology, however, because until this point it has been somewhat awkward to share RTI files online. Possible solutions involved either sharing the RTI files directly (placing the burden on others to install the RTI viewer software) or java plugins, such the excellent RTI viewer WordPress plugin. Neither is particularly elegant.

That’s why I’m so thrilled that Gianpaolo Palma has released the code for his WebGL-based RTI viewer, which neatly combines an RTI-Viewer with a tile-based deep-zoom. I’ll be putting up some of the coin RTI’s that I made as soon as I get the chance to process them on a windows machine.