CONTENT ANALYSIS
This term has come to be identified with research that renders written, and in some cases visual material as countable – measurable –
content
. This is understood to mean the words, phrases and images that comprise manifest content as what is visible to the seeing/reading eye; there is a message and that message is decipherable in a number of ways.
- Nowadays this notion of content comes to use in both traditional – written and audio-visual – and digital, web-based formats.
- News and entertainment media, policy documents, literature, and colour spectra lend themselves to this treatment of content as manifest and measurable in quantifiable terms.
- As documents and media messages widely construed become increasingly web-embedded this ‘content’ can also be broken down into byte-sized data; if not words, then phrases or search results based on keywords.
- As the use and pursuit of web-content as ‘keywords’ have become the staple of everyday and academic uses of the web, aided and abetted by increasingly able automated search engine software, related products and services (see
Chapter 5
), pre-digital era notions of content have taken on a new dimension.
- Indeed they lend themselves very well to automated sorts of analysis, for example, commercial search engines for the open web or more specialized tools for research or surveillance purposes.
- There are some who argue that content – as a product of social and cultural life – has become digitized and with this so has (hyper)textual production and (hypermedia) circulation and their role in how societies reproduce themselves meaningfully and in memory.
For the time being, I am going to refer to the variety or approaches under discussion here as content in a generic sense of the term precisely because of the way that the web has created a swathe of ‘content’ and interactions for emergent research. This also means resorting to the quantitative–qualitative division to acknowledge their respective research traditions as these set off along the paths sketched above.
Quantitative
traditions of content analysis lay the emphasis on forensic analysis of aggregated, countable evidence: keyword frequency, terms, collocations. The unit of analysis is thereby the ‘message component’ (Abdelal et al., citing Neuendorf, 2009:
5) that is visible to any reader/s. Here, whilst there is a level of opaqueness (otherwise there would no point embarking on a research inquiry), this approach proceeds on the assumption that transparency can be made evident by unpacking and then analysing the content on its own terms. This is basically the ‘idea that the individual text is meaningful on its own and that a summary of the message within it is the desirable outcome’ (Abdelal et al. 2009: 6).
Qualitative
traditions beg to differ. The demarcation line here is that qualitative understandings treat content as opaque by definition; sub-text – the different meanings (and thereby receptions) that lie ‘between the lines’ of any manifest content – is as important, if not more significant in psychological or political terms than the manifest, explicit message. We will look at how this approach operates for research in due course.
How then do researchers adhering to these broad traditions go about deciphering their material – text/message?
- For quantitative modes this is managed by devising a
coding scheme
as a device for breaking the message down into suitably manageable components: phrases, transitions, single words. These schemes are devised around familiar notions of probability and non-probability sampling (see
Chapter 6
, surveys and questionnaires section) for particular sorts of research questions.
- For qualitative modes, we see that this literal notion of ‘coding’ is replaced by a more figurative one: that of ‘encoding’ and ‘decoding’, to borrow here from Stuart Hall (1996). This means that the researcher’s task is to unearth the inner – perhaps even hidden – significance of the message in context. The assumption here is that there is a hidden meaning, if not several. Texts are treated here as phenomena, representations of thought and experience, and as sociocultural artefacts.
- Nestled somewhere in between are what we could call ‘architectural’ modes that examine the mechanics and so structure of texts. How they function linguistically is the primary focus. These deep forensic analyses have developed respective procedures for exploring these structures and functionalities based on whether meaning is determined in the final analysis (no pun intended) by these underlying structures or the way language is enacted, or practised; for example, the difference
between looking at grammatical structures (nouns, verbs, prepositional phrases) or inflections of speech such as pauses, accentuation, rhythm.
4
Back to content analysis within the quantitative tradition. Coding schemes, human or computer-aided, can be divided into two broad forms: (1) frequency counts (keywords, phrases, word pairs) and (2) keyword-in-context analysis. In essence, this is what a search engine on the web does for us every day; based on our search items (in themselves a form of elementary coding), its algorithm sets about sifting and collating millions of texts stored in the service provider’s data banks and then presenting the results according to a particular, and highly effective code: the ‘top ten hits’. Whilst ordinary web-surfers (including researchers) seldom ask why or how their search engine came up with that particular list, scholars pose just these sorts of questions because of the influence of the top ten on status and significance, to wit
citation indexes
. More on this below.
Practicalities
Let’s for the moment treat the term
content analysis
as a rubric under which both qualitative and quantitative approaches make diverse, and sometimes overlapping claims about how societies and cultures make sense of their world.
As noted already, real-life research scenarios and ‘texts’ now encompass more than words on a page of hard copy. Moreover, in computer-mediated and multimedia settings there are many ways to conduct ‘content analysis’ by combining quantitative and qualitative indicators of significance in symbolic and pragmatic forms:
Quantitative notions of content analysis
Quantitative notions of content analysis are generally based on research questions that entail hypothesis-testing (see
Chapter 2
, 000 section) and which regard texts as transparent carriers of meaning. These approaches treat written texts as made up of units of analysis: single words, phrases, or sentences. They also look at positioning as well as context.
- Significance is ascertained in terms of frequency (the number of times a word appears) and/or placement (where in a text keywords appear).
- Whilst words are the traditional unit of analysis, visual texts can also be broken down in this way, albeit not exclusively. For visual texts, for example, a television programme or film fragment, frequency and placement can be coded by looking at timing, frame-lengths, editing, as well as the script.
Coding schemes
Analysing the content in this way requires the researcher to set up some sort of
coding scheme
,
5
which they will then apply to the selection.
- The results are then translated into numerical values, organized and then presented as tables or graphs according to the research question in hand.
- Developing and then applying coding schemes in order to make sense of the material in its manifest form (words on a page or screen, colour frequencies, or images) is based on
deductive reasoning
: an analytical framework is applied to the data in a preconceived form. From there inferences are drawn and the strength of the findings based on the integrity and validity of the coding scheme used.
- To be effective, a coding scheme has to be applied to a clear sample; this could be based on various sorts of selectivity. For research looking to ascertain the frequency of keywords or their placement in a comprehensive way, then the stress here is on ensuring the sampling techniques, the coding scheme, and the findings are in a logical relationship.
A similar approach can be applied to the content analysis of television programmes, newspapers, or policy document-sets whereby the ‘content’ here is an amalgam of texts, scripts, images, audio (for example, in YouTube clips), and user-produced material. The various sorts of content can also be broken down into quantifiable coding schemes.
- However, different sorts of ‘content’ require their respective criteria for selection and collation; for example, a news item on television is comprised not only of the script but also of footage; a talk show likewise but this time there is an element of editing and ‘live’ interaction.
- Audio material is treated as units of sound: rhythm (for example, the ‘amen break’), samples (a melodic line), elements of the mix (separating out the drums from lead guitar), riffs, and so on. When a score is a core element this too is treated as a particular sort of text.
In sum, any coding scheme needs to make sense of, and elucidate the underlying research question and objectives on the inquiry. Once the content has had the coding applied
- the results need to be studied and analysed according to the theoretical framework governing the inquiry;
- this may see the researcher moving between an deductive approach to the material (where the hypothesis/research question sets the tone) or a more inductive one (where the outcome of the coding is studied for patterns and insights emerging from there);
- the findings are usually presented in the form of tables or graphs whereby the designated coded content is rendered as numerical values of frequency, volume, duration, or spacing.
Administration
A coding scheme has to be devised, according to the research question. In that respect there is a certain level of trial and error, an aspect often belied by the clarity and
graphic elegance of many coding schemes presented as research findings. What exactly comprises a coding scheme is effectively both theory and method (see
Chapter 2
, theory and method section); frequency, positioning, framing of single terms, pairs or phrases is selected based on an inquiry’s respective ‘how’, ‘what’, ‘where’, and ‘why’. In any case, this design phase is crucial and requires time and thought.
- Automated programs do not do this for you.
- Moreover, why rush to software when in fact much coding can be done manually – coloured pens and other sorts of drawn graphics work if the rationale behind these selections makes sense.
- That said, for larger chunks of digitized material, software tools, such as Invivo can facilitate these schematics by performing an analysis based on the criteria preset by the researcher.
- All these tools have their own inbuilt defaults and thereby methodological preferences (see
Chapter 5
) which may affect the degree to which a manually devised scheme ‘works’.
Once the results emerge, you need to shift up to another level of analysis; what do these findings (ensuring there are no errors in design or execution) tell you? What inferences can be drawn? What do they not tell us?
Advantages and disadvantages
The advantages of content analysis in this form are:
- The material can be confined: limited to a particular genus, timeframe, or provenance. This works well for written archives, official policy document sets or intergovernmental resolutions, as much is now available on the web.
- Note though that interactive forms of content generation (for example, in discussion forums) need to be pinned down in terms of entry and exit dates.
- Once selected, the material can be reshuffled and examined in many ways, manually or digitally.
- For societies based on written literacy this is a major source of evidential material; for example, policy statements provide insight into political processes, speeches into a political leader’s worldview, and news scripts supply an angle on public debates and current events coverage.
- For questions looking into reconstructing important decision-making process as they emerge as policy, content analysis is very useful for navigating and unpacking public and political statements.
- Similarly for media content: current events and controversies, local and global, can be studied through the media content produced by news outlets.
- The coding and eventual analysis, if the selection is manageable and accessible, can be carried out manually and by a single researcher.
- Written texts, unlike human subjects, don’t talk back nor do they require informed consent. However – and this is an emerging issue for web-based content and other sorts of analysis – increasingly online texts are falling under conflicting
intellectual property rights regimes, for example, on social networking sites, photos and texts are by default the property of the ‘owner’ not the user (currently hotly contested by the way).
The last point flags some of the disadvantages, indeed issues around which qualitative approaches part company from these more quantitatively inflections:
- Focusing only on written texts, including visual data, assumes that social phenomena and relationships are contained, and so observable in the written word.
- In increasingly multimedia-saturated societies, conventional forms of investigating content as the written word produced from one source may not be adequate on its own.
- Strictly speaking content analysis focuses on the output of social actions, and interlocutors, as a sort of ‘scriptural economy’ (Certeau 1991). It is less suited for questions about behaviour, interpersonal interactions, or events, unless these are being reconstructed on the basis of official or eye-witness, informal archival material.
- Effective content analysis is more time-consuming than many realize.
- Issues arising out of selectivity, in terms of what and when, require thought, and access may be assumed rather than possible.
- Whether you develop your own or adopt an existing coding scheme, their application on your selection and then your eventual interpretation of the results is not self-explanatory; graphics produced require more time to consider and come to grips with than many think, often as much as setting up the coding and executing it.
- The focus on the content itself, however it may be approached, may distort other equally important conditions relevant to the inquiry, for example, the conditions of production (e.g. media messages during wartime) or how people (e.g. audiences) receive and respond to the message (e.g. gender-based or class-based differences in responding to news items).
- Meaning, as we will see in the section on textual/visual analysis, does not reside in the manifest, countable content as separated units; meaning resides in the whole rather than in the parts.
- In some political and cultural contexts content may well be misleading, mystifying, or deliberately coded, for example, propaganda, politically sensitive policy documents, or political speeches. Sometimes what is
not
said, or left out of the official record, is as important as the words on the page.
- Turning complex messages and their meanings into numerical values sacrifices understanding to the method, reduces content to numbers when in fact meaning is an
intersubjective
process.
This latter observation is where alternative and competing approaches to researching what humans (and their avatars) produce in written and other sorts of texts come in. What we see on the page, or the screen is not necessarily what we (think we) get.