Morten Hviid and Franco Mariuzzo opened the conference and welcomed the participants. The first session of the day aims to set the scene for the conference. The broad nature of the issues raised and the attention which digital markets have received recently make this a timely topic.
Danilo Montesi (Department of Computer Science and Engineering, University of Bologna, Italy), in his talk titled “Big Data & Social Science: A Data Driven Society?” discussed the effect that the large amounts of data generated by the internet has on society. The average American consumes 34 GB of data every day, spending 11 hours a day connected to the internet, watching television or making phone calls – data is gathered on all of this consumption. Such is the importance of data that it may be considered a major factor of production in the economy, alongside labour, capital and land.
Practical techniques include the PageRank tool used by Google to decide the order in which results for a search query should be presented. A consequence of this algorithm is that a few pages tend to dominate search queries, especially given the tendency of users to favour heavily the first search result given. The amount which businesses are willing to pay in order to appear higher in search results indicates the value they put on consumers’ attention. The average “Cost-Per-Click” is around $1 for most search queries but can be as high as $54 for queries such as car insurance or lawyers.
Certain types of data lend themselves to use by data scientists to provide a unique “fingerprint” for a given user. This is important because if a single user can be associated with multiple accounts across different networks then the resultant combined network provides much more information than any individual network. Data from photographs taken by people with their smartphones and digital cameras are an excellent source of such data. Research shows that with a sample of photographs from various social networks it is possible to identify photographs as belonging to the same users across the networks extremely accurately.
Technical solutions may arise in response to legal barriers. Peer-to-peer file sharing networks first arose to evade laws against the copying and sharing of copyrighted material. The “sharing economy” has now expanded far beyond its illegal roots. Wikipedia is many times larger than other encyclopaedias and research indicates it is no less accurate. Commercial encyclopaedias such as Britannia and Microsoft’s Encarta have left the market because of an inability to compete with Wikipedia. “Social” news services such as Reddit and Huffington Post are expanding greatly at the expense of conventional news sources such as print newspapers.
The implication of all of this may be that an increasing amount of human behaviour and decisions are determined by data. People may choose which hospital they use, which school they send their children to, or where to buy a house on the basis of data which was not available several years before. The structure of employment is also likely to be changed profoundly by the use of big data. Professions such as telemarketers and accountants may be largely eliminated by big data where other professions, such as dentists and personal trainers are likely to be unaffected.
Paul Bernal @PaulbernalUK (University of East Anglia @uealaw) explored how profiling and ‘personalisation’ can impact on privacy and autonomy in a pernicious way. His talk looked at some current examples, from online shopping and behavioural advertising to the interaction between the big players on the internet, the internet of things and more. He demonstrated that misunderstanding of perceptions and expectations of privacy can lead to disastrous results all around and suggested some potential ways forward and implications for the future.
Paul is investigating, from the perspective of the individual, how to protect our autonomy in the world of big data. The issue arises from the fact that even if people do the same things, they see a difference in results. For example, a study was done that Orbitz recognizes types of users’ computers in order to identify the more expensive equipment and to provide, on the basis of their tracking of on-line activities, more expensive offers for hotels. A new study reveals how Facebook “likes” say a lot more about a person than we first think: a wide variety of people’s personal attributes can be automatically inferred using their Facebook likes. The specific analysis shows that it can be used to improve products and services and businesses can obtain advantages from using it.
Computational linguistics reveals there is only the illusion of neutrality. For example, Wikipedia articles are biased against women according to an analysis of six different language versions of the online encyclopaedia. One more experiment – on Facebook users – demonstrated experimental evidence of massive-scale emotional contagion through social networks. It is significant because emotional states can be transferred between users via emotional contagion, leading people to experience the same emotions without their awareness. It gives grounds for manipulation to make people to use Facebook more often.
Paul demonstrated that both sides – giving and receiving information – matter and argues that the idea that search engines are organic and neutral cannot be true. Actually, they are what we want them to be and they do what we want them to do. It seems that today advertisers are more likely to suffer from further integration than consumers. The European Commission has filed complaints against Google over its alleged anticompetitive behaviour.
That is why we need algorithmic transparency: it should be possible to test by results, rather than by theory, how neutral the results of our searches. This has many implications: most users are not aware that their results have been tailored to them. This results in issues of consent; problems with accurate personalization and inaccurate personalization when wrong decisions have been made on the basis of results; mismatch of needs; misuse of systems; vulnerability of data; and vulnerability of systems.
Finally people have to be aware of potential risks, otherwise there is a risk of negative implications because big data techniques can be easily be applied to large numbers of people without obtaining their individual consent. People need to able to be aware about risks; there is a need for an informed judgement from legal perspective. Today the main question is: Who are big data changes actually serving and in whose interests? People need to be aware that their interests may be different from those of businesses.
Tony Curzon-Price @tonycurzonprice (Competition and Markets Authority @CMAgovUK) looks into how price-comparison websites are a competition issue and solving the search problem, which is the theme of his presentation. He first talked about the risk of becoming a slave to new digital monopolists.
He then explained the difference between the Bertrand world where people pick the cheapest and the diamond world where instead of price undercutting, there is overcutting and selling at a much higher price. He explained how search technology how evolved from the traditional techniques to the internet in terms of saving of man hours.
An overview of the car insurance market in terms of the insurance providers, shopping behaviour and purchasing behaviour was provided. He explained how price comparisons website get consumers’ attention through advertising on Google and concluded that price comparison leaves us with a Bertrand-Diamond model with intense competition among providers but a monopoly price facing the consumer, with the middleman taking all of the surplus.