Uncovering cultural databases.

What can data visualizations of cultural databases tell us, what can they uncover & reveal that was not known before?

To find out what the process of mapping data onto a plane can uncover I talked to experts from different fields about a visualization on movie references over the last 100 years. The data came from the internet movie database a community of movie enthusiasts that linked movies by the references they made to other movies.


Communication model for visualizations

Visualizations are encoded out of two elements the content in form of data and the methods (Meirelles, I., 2013) that map the content onto a medium like paper, computer screen or a physical sculpture.
These two elements create the visualization that than can be decoded by humans to accomplish two main tasks, analysis as knowledge discovery and look-up in form of knowledge management (Baur, D., 2013). In the context of decoding information Card, Mackinlay & Shneiderman talk about visualizations as external cognition (Card, S.K., Mackinlay, J.D. & Shneiderman, B., 1999). While all of this is perfectly fine the real challenge still lies ahead. Analysis & look-up of visualizations as external cognitions are part of the communication and understanding process which are only helpful if we can act on them, if they can help us to make better decisions.

From representation to exploration & analysis to discover, understand & act. With the help of scientist from the fields of network science, sociology as well as media studies this paper tries to find answers to the question what visualizations are able to reveal on the example of one recent project called ‘culturegraphy’.
The visualizations

The visualizations are based on data from the internet movie database (IMDB) which is an online movie community with 42 million users. IMDB collects all kinds of information about movies, one part is a collection of references between movies. In the database these references are stored under the menu point ‘connections’. There are nine different reference types (alternate language version of, edited from, features, follows, references, remake of, spin off from, spoofs & version of). For the visualization we focused on the category ‘reference‘. To give a better impression what these references look like here two examples for the movie ‘Star Wars’ from 1977: The robot C3PO was modelled after the robot from the movie Metropolis (1927). C3PO used a line from the movie Seven Samurai from 1954: „It seems we are made to suffer. It‘s our lot in life”. But not only Star Wars referenced other movies they also got referenced in many occasions. The line „May the Force be with you.” was referenced by movies like Barney Miller: Quo Vadis? (1978), The Big Fix (1978), Sledge Hammer!: They Shoot Hammers, Don‘t They? (1986), Beverly Hills Cop II (1987) and many others. We mapped this copying, transforming & combining of ideas over time in a network structure. While positioning the nodes of the graph on the y-axis on a timeline from 1900 to 2015 the x-axis has different versions like degree (number of connections), modularity (closeness of the nodes to each other) or genres the movies are ordered into. The different visualizations can be filtered by year, degree and by the characteristics of each graphic like genre or modularity. These graphics give a summery (instead of the maybe misleading word ‘overview’) over the inspiration in the history of cinema.

To develop an understanding of what this summery visualizations can provide insighs to this paper will go thought two stages of the above presented communication model for visualizations & look deeper into the building blocks content & analysis.


The first objection that needs to be made is about data itself. Data are in my definition are units processable by computer systems. On the lowest level this kind of data is not more than 0 or 1, yes or no, on or off. What this means is that everything we can visualize must be build out of single discreet entities. Data readable for computers can not display continuous objects (Kittler, F., 1991). In addition to this any data about our natural world needs to be measured in one way or the other this makes it to binned components of reality, “Weltbilder” that are rather a reflection of reality than reality itself. This makes everything that we can visualize with a computer into “shadows“ of reality. Rather than freeing us from plato’s cave they show the same shadows just in a different format. Alfed Korzybski made this objections about the objects that I want to create from the data, maps, by saying:

„A map is not the territory it represents, but if correct, it has a similar structure to the territory, which accounts for its usefulness“

On the other hand without these abstract entities we wouldn’t have the advantage that they are able to be computed. This opens up new possibilities to process and understand data. Visualizations are one tool that became more powerful through the help of machines. Or as Ben Shneiderman puts it:

“Like Galileo’s telescope (1564-1642), Hooke’s microscope (1635-1703), or Roentgen’s x-rays (1845-1923), new information analysis tools are creating visualizations of never before seen structures. Jupiter’s moon, plant cells, and skeletons of living creatures were all revealed by previous technologies. Today, new network science concepts and analysis tools are making isolated groups, influential participants, and community structures visible in ways never before possible.”

Especially datasets concerned with cultural information the understanding of the data is crucial. Problems do not only occur on the lowest level of data as digital entities but mainly in the process of collecting the data. Other than datasets collected about natural phenomena like temperature measures these datasets are not collected by machines. They are created by humans with all there biases, influences, imprecisions & variations from person to person. In the case of the IMDB data this does make the data to not only be a reflection of movie history. As Prof. Jan Distelmeyer for media studies from the University Potsdam suggested in an interview the data is rather a fusion of three forces, movie history, the internet and the users.

The users which created the content can only make their conclusions post-hoc. In some cases the director, screenwriter etc. might have mentioned their references in interviews or books but often the references are based on the audiences own knowledge of movie history and their guesses on where the reference came from. Media philosopher Vilem Flusser wrote a book about this disconnection between intention and the gesture which questions the subjectivity between these two entities.

The internet as a medium might be influencing the way we interpret and act on cultural artefacts like movies. A platform like IMDB would not be possible through letters, the radio, television or telephone. The network structure of the internet itself might influence the perception of network processes in society & communication. As physicist Cristopher Moore sayed in an interview with Melanie Mitchell

“… as George Lakoff says, every metaphor illuminates certain aspects of a real-world system and obscures others. … I like to talk about how in the eighteen hundreds everything was a steam engine. And then everything was a computer and then in like the 80s and the 90s everything was an economy or maybe ecology and now at the moment, everything is a network, these are ultimately just words and each of these ways of -- each of these lenses as you put it, are going to highlight certain aspects of the system and maybe confuse or obscure others.”

Movie history is only one of the three forces. The information which the users create with the help of the internet. Prof. Distelmeyers final thoughts in the interview point to the dataset as a portrait of the IMDB users that are not visible in the visualization itself but only their reflections on the history of cinema as they see it.

All this should not be seen negative. This thoughts don’t make the data un-useful or meaningless. They might even make them more interesting. In 1973 cultural studies scholar Stuart Hall developed a model of communication called encoding/decoding (Hall, Stuart.1973). The model describes the relationship between mass media communication channels as the movie industry and the audience. The mass media is encoding content for us through their frameworks of knowledge, relations of production and technical infrastructure. This encoded programmes than get decoded through the audiences frameworks of knowledge, relations of production and technical infrastructure. If any of this encoding / decoding components don’t work the communication fails. On this point Halls sketch of his model ends. But in an interview with Ian Angus et al. in 1989 Hall explains the problems of his model and that it in some way he might have seen the model as a full circle rather than an open ended system. A circle in which the mass media is influenced by the consumes. If someone creates a movie that no one watches his movie fails. The creators of these mass media products need to for fill the wishes of their audience. Otherwise no one will consume the media. This feedback-loop might be a closer hint toward the real meaning of the database behind IMBD.
The visualization might be a hint towards this. It is the reflection of movie enthusiasts onto the industry and thoughts behind the system.

In 1774 Johann Gottfried Herder wrote “jede Nation hat ihren Mittelpunkt der Glückseligkeit in sich wie jede Kugel ihren Schwerpunkt!” (every nation has its center of beatitude in itself like every orb has its focal point)(Johann Gottfried Herder, 1774) this thought of culture as a single orbs that can co-exist but not interact with each other lead in the 21. century to the idea of multiculturalism (Welsch, W., 2009). They accept cultural diversity but affiliation between the systems is not anticipated or even unwanted. The image of orbs suggest that once different cultures collide they bounce back. In our globalised world where our shoes are from China our movies from America our food from Italy and our music from South Korea this system can’t be seen as a good model of reality.
Philosopher Wolfgang Welsch suggested the concept of transculturality as a new model (Welsch, W., 2009). Culture is seen as a network rather than single orbs. In this model there is no clear cultural space that each of us inhibits. We are all nodes connected to many other nodes through our neural network, social network and communication network (Castells, M., 2012).

Once brought together this two thoughts of Welsch and Hall we can see this visualization as the rise and acceptance of the transcultural model in the movie Industry as well as in the IMDB users.

But where does this came from? How did the “rise of the network society” (Castells, M., 1996) came about in the field of cinema and why?

One answer my lie in the pressure to innovate. Goods that are produced to be consumed must hold value for the consumer often through any kind of innovation, something new that this goods offer. The avant-garde of the modernist movement with its zenith in the 1920’s searched for this constant innovation in literature, theatre and the visual arts through a radical break with the tradition. Artistic experiments lead to innovative forms of originality (Bleicher, J.K., 2008). The two main problems were the detachment from the audience and the pressure of this innovation. Because something was totally different from the tradition nothing indicated towards the quality of this innovation and with it the favour of the audience. During the same time another movement searched for a different way out of the innovation pressure. The “non-art” Dadaism used randomness towards innovation for example in pieces like Tristan Tzara’s random poems through the cut-up technique.

The postmodern cinema is avoiding this innovation pressure through copying, combining & transforming already existing material. The claim is not to create something entirely new, but to join already familiar objects into an emergent object of the known. This self-referentialty did not only provided a useful way towards creativity it also worked very well for the audience. These very inspiration driven movies disrupt the classic movie genres. A good example therefor is the Matrix (1999), a mixture of a Sci-Fi, Kung Fu, Adventure, Computer game with allegories to buddhism, self-discovery, love story, John-Woo Action, myth-hopping (from “Alice‘s Adventures in Wonderland“ to “The Wizard of Oz”), religious story of salvation and system-criticism (Distelmeyer, J., 2008). This allowed multiple readings of the same movie and with it addresses a multitude of different audiences. But not only that the audience became wider also the behaviour of watching a movie changed. In times where movies only screened in cinemas people tend to watch movies once. The rise of retail and rental industry changed this. The movie industry became interested in an audience that bought a movie even after already watching it in the movies. Self-references in the movie history gave movies a complexity that made it worth to watch them again. They create a fan base, build clues that are only visible by watching the movie with a stop button.

Not only the audience became strongly influenced in their watching habits by the rise of this new distribution media. The directors, writers & producers only had the possibility to create this referential rich movies because of the new technology. It only became possible to watch movies that are not running in the cinema through this new distribution channels. The adjacent possible as Stuart Kauffman would call it (2003) made space for a new kind of cinematic experience. That directors like Quentin Tarantino worked in video rental stores before becoming famous directors of this new kind of cinema indicates towards the connection between the technical and the cultural change. Tarantino told the BBC, „When people ask me if I went to film school, I tell them, ‚no, I went to films’.“


While the classical cinema fits very well into the orb model of culture as single units, entities with an exact position when it comes to their genre distinction. Post-modern cinema breaks with this model. IMBD classifies movies into 22 categories like action, drama, romance. These categories reflect a distinction of the pre-network era to me. A recent article in the atlantic (Madrigal, A., 2014) counted 76,897 genre distinctions on Netflix. This might come closer to a categorisation of the post-modern cinema but we need to ask ourself if it still makes sense to distinct orbs of genres in a network model. The rise of the network society feels to me at the same time like a break with the categorisation model. There is no way we can stereo-type cultural objects anymore. This is the main message that the visualizations make visible. It breaks with the independency of movies. It breaks with the idea of the creator as a pure genius that invents out of nowhere.

In the book “The Overview Effect” Frank White shows how the experience of circling the Earth from space changes space travellers perceptions of themselves, their world and their future. Or as St. Thomas said in 1639 “seeing is believing”. I argue that these visualizations have a similar effect on us. They make visible the effects of a network culture in which orbs do not exist anymore. Objects aren‘t single entities but deeply embedded into their surroundings. Once seen movies only as an example for these processes of cultural transmissions the idea can spread into fields where this model of thought really changes the way we act in certain positions. Political decisions would generate very different positions seen from an orb or from a network perspective. Obviously these visulsualizations are only a small contribution towards a network orientated society. But as Duncan Watts (2004) suggested in his paper „Six Degrees: The Science of a Connected Age” the connectivity between a human and any other human on earth is on average six individuals apart. Our society is a dense network in which ideas can spread quickly.


Meirelles, I., 2013. Design for Information, Rockport Publishers.

Baur, D., 2013. Big Pictures in the Small: Visualizations on Mobile Devices, Available at:

Card, S.K., Mackinlay, J.D. & Shneiderman, B., 1999. Readings in Information Visualization, Morgan Kaufmann.

Kittler, F., 1991. Es gibt keine Software.

Hall, Stuart. Encoding and Decoding in the Television Discourse. Birmingham [England: Centre for Cultural Studies, University of Birmingham, 1973. 507-17. Print.

Johann Gottfried Herder, Auch eine Philosophie der Geschichte zur Bildung der Menschheit [1774] (Frankfurt/Main: Suhrkamp 1967), 44 f.

Welsch, W., 2009. Was ist eigentlich Transkulturalität? transcript-Verlag.

Manuel Castells :

Bleicher, J.K., 2008. Zurück in die Zukunft. In Formen intertextueller Selbstreferentialität im postmodernen Film. Lit Verlag.

Distelmeyer, J., 2008. Die Tiefe der Oberfläche. In Oberflächenrausch. Bewegung auf dem Spielfeld des postklassischen Hollywood-Kinos.

Kauffman, S., 2003. THE ADJACENT POSSIBLE. Available at:

„Faces of the week“. BBC. May 14, 2004.

Madrigal, A., 2014. How Netflix Reverse Engineered Hollywood. Available at: [Accessed February 21, 2014].

Watts, D.J., 2004. Six Degrees: The Science of a Connected Age, W. W. Norton & Company.