Category Archives: search

Google’s privacy campaign, and three ways in which Google gets your data

Google is campaigning to reassure us that its Chrome browser is, well, no worse at recording your every move on the web than any other browser.

Using Chrome doesn’t mean sharing more information with Google than using any other browser

says a spokesman in this video, part of a series on Google Chrome & Privacy.

image

What then follows is links to four other videos describing the various ways in which Google Chrome records your web activity.

If you subtract the spin, the conclusion is that Google retrieves a large amount of data from you, especially if you stick with the default settings. Further, it is not possible as far as I know to use the browser without sending any data to your default search provider, most likely Google. The reason is the Omnibox, the combined address and search box. Here’s what Google’s Brian Rakowski says in the video on Google Chrome & Privacy – Browsers search and suggestions

For combined search and web address to work, input in the Omnibox will need to be sent to your search provider to return suggestions. If you have chosen Google as your search provider, only around 2% of the search input is logged and used to improve Google’s suggestion service. Rest assured that this data is anonymised as soon as possible within 24 hours, and you always have the option of disabling the suggest feature at any time.

However, even if you disable suggestions, what you type in the box still gets sent to your search provider if it is not a valid web address, in other words anything that is not a complete URL (though Chrome will infer the http:// prefix).

It is also worth noting that Google does not only get your data via browser features. Most web pages today are not served from a single source. They include scripts that serve data from other locations, which means that your browser requests it, which means that these other locations know your IP number, browser version and so on. Two of the most common sources for such scripts are Google AdSense (for advertising) and Google Analytics (for analysing web traffic).

Even if you contrive not to tell Google in advance where you are going, it will probably find out when you get there.

It is important to distinguish what Google can do from what it does do. Note the language in Rakowski’s explanation above. When he says input is sent to your search provider, he is describing the technology. When he says that data is anonymized as soon as possible, he is asking us to trust Google.

Note also that if you ask to send in auditors to verify that Google is successfully anonymising your data, it is likely that your request will be refused.

There are ways round all these things, but most of us have to accept that Google is getting more than enough data from us to create a detailed profile. Therefore the secondary question, of how trustworthy the company is, matters more than the first one, about how it gets the data.

Google’s strategy unveiled: a little bit of everything you do

Google CEO Eric Schmidt gave a keynote address at the Mobile World Congress yesterday, which is worth watching if you have an interest in the future of technology or, well, human life.

image

The talk was an informative and open insight into Google’s future direction. It was centred on mobile; but since Google now regards the mobile phone as the primary device for how we interact with the world, that was no limitation. Google is putting mobile first, said Schmidt, because it is the meeting point for the three things that matter: computing, connectivity and the cloud. He believes that phones will replace credit cards, for example, as they are smarter and more secure for financial transactions.

Google’s strategy is to combine the near-unlimited power of server-side computing with its database of human behaviour, to create devices that are “like magic. All of a sudden there are things you can do that were not previously possible.”

He gave an illuminating example: Google voice search. You speak into your phone, and Google transcribes your voice and performs a search. Voice recognition is nothing new, but the difference in the Google demo is that it works. Here’s how. The problem with voice recognition is that one word sounds very like another, especially since we do not speak with precision and every voice varies. Computers cannot understand exactly what we say, but they can use dictionaries to come up with a set of possibilities for what we said, one of which is likely to be correct.

image

The next step is the brilliant one. Google takes this set of possible phrases and compares it to recent Google searches. If one of them matches a popular search, then it is likely to be what you said. Bingo. Google now does this in four languages, with German demonstrated for the first time yesterday.

It works on the assumption that humans are not very original. We tend to do similar things, and to be interested in similar things. Therefore, as Schmidt noted, if you are a tourist walking around a city with your location-aware phone, Google does not only know where you are; it also has a good idea of where you will go next.

Another cool demo is for image recognition. We saw this in two guises. In one, you hold up your phone and do an image search using the camera as input. Result: information about the building you are looking at. [Or maybe the person? Hmm.]

In another demo, you point the camera at your foreign-language menu as you ponder which incomprehensible dish might be one you could eat. Back comes the translation in your own language.

Note that these demonstrations are not really about super-powerful phones, but rather about the other two factors mentioned above, the power of cloud computing combined with a vast database of knowledge.

Schmidt’s blind spot is that he does not really see privacy as an issue. He mentions it from time to time; but he is clear that he regards the trade-off, that we give our personal data to Google in return for these cool services, as worth it. I posted a remarkable quote yesterday. Here’s another one, from late on in the address:

Google will know more about the customer because it benefits the customer if we know more about them.

What Schmidt fails to do is to extrapolate the implications for stuff other than cool services. One is what happens if that huge database is used dishonourably. Another is the huge competitive advantage it gives to Google versus everyone else; Google has this data, but rest of us do not. A third is how that data could be used in ways that disadvantage us. An example is in insurance. Insurance is about pooling risk. The more data insurance companies have about you, the more accurately they can assess the risk, which means a wider range of premiums. If by some mechanism insurance companies are able to analyse Google’s data to assess risk, they can refuse to insure, or charge high penalties, for the higher risks. We won’t necessarily enjoy that, because it means more us may find it impossible to get the insurance we want at a price we can afford.

Google’s business strategy

That’s the technical side. What are Google’s business plans? Schmidt made some interesting comments here as well, many of them in the question and answer session.

Google does not plan to become a mobile operator. Schmidt received some fairly hostile questions on this topic. Since Google positions operators as dumb pipes, stealing their talk minutes and insisting on an open web for services, who will invest in infrastructure? Schmidt denies positioning operators as dumb pipes, but does not leave them room for much other than infrastructure; he says they might have a role in financial transactions.

How do we (both Google and the rest of us) make money? Two main areas, according to Schmidt. One is advertising. He says that online advertising spend is currently one tenth of the total, and that this proportion must grow since “consumers are moving from offline to online.” In addition, mobile advertising will be huge since you can target location as well as using other data to personalise ads. “The local opportunity is much larger, and largely unexplored,” he says.

The other big opportunity is apps. The number of apps that need to be installed locally is constantly diminishing, he says, leaving great potential for new cloud-based applications and services.

As for Google, Schmidt says it wants to be part of everything you do:

We want to have a little bit of Google in every transaction on the internet

Thought-provoking stuff, and a force that will be hard to resist.

So who can compete with Google? Making equally capable phones is easy; building an equally good database of human intentions not so much, particularly since it is self-perpetuating: the more we all use Google, the better it gets.

No wonder Microsoft is piling money into Bing, with limited success so far. No wonder Apple’s Steve Jobs is concerned:

On Google: We did not enter the search business, Jobs said. They entered the phone business. Make no mistake, they want to kill the iPhone. We won’t let them, he says. Someone else asks something on a different topic, but there’s no getting Jobs off this rant. I want to go back to that other question first and say one more thing, he says. This don’t be evil mantra: "It’s bullshit." Audience roars.

Eric Schmidt: we can literally know everything

I am watching Google CEO Eric Schmidt’s keynote at the Mobile World Congress today. I am only 10 minutes in, but I was struck by these comments, as he talks about improving connectivity across the internet:

Think of it as an opportunity to instrument the world. These networks are now so pervasive that we can literally know everything if we want to. What people are doing, what people care about, information that’s monitored, we can literally know it if we want to, [pauses, lowers voice] and if people want us to know it.

A comment full of resonance. Who is “we”? You and I? or Google? The enthusiasm for knowing everything about everything, the reluctant-sounding concession to privacy at the end. The sheer bravado of it; the word “literally”, which means in actual fact, without hyperbole; and yet which is obvious hyperbole.

For another view on this, see The Onion’s piece on Google’s opt-out village.

Buzz buzz – Google profile nonsense

Google has launched a new social media service called Buzz (as if you did not know) and I’m on it – here’s my profile.

You had better follow that link too; because whenever I visit the profile when signed into Google I see this not-too-subtle banner:

image

“Your profile is not yet eligible to be featured in Google search results”. This statement with its bold yellow highlighting seems intended to make me anxious, though I’m not sure why I should care about this deliberate defect in Google’s search algorithms. Having said which, it is not actually true, as a quick search verifies:

image

Still, let’s presume that I believe it and want to fix it. I click the link to learn more. Does it tell me how to make my profile “eligible”? Not as such. Without making any promises, Google suggests that I should add more details,

For example, include details such as the name of your hometown, your job title, where you work or go to school.

It also wants a little link exchange:

Link to your profile on another website (for instance, your blog or online photo album)

and finally

Verify your name, and get a "Verified" badge on your profile.

I’ve been round the verify circus before; if you try to do it, you wander round the near-abandoned Knol for a while before discovering that it only works, some of the time, for USA residents.

Frankly, it all seems a bit desperate. My Google profile is just as I want it already, as it happens, though I could do without the big deceitful banner.

That said, this profile nonsense does nothing to allay my sense that Google has designs on me and wants more of my personal data and internet identity than I am inclined to give.

Buzz is a hard sell for me. I like Twitter, because it is single-purpose, works well – in conjunction with one of the many desktop add-ons such as Twhirl – and I never feel that it wants to take over my life.

Still, I am buzzing now, especially since I’ve linked it to Twitter so all my tweets arrive there too. We’ll see.

Fear of Google

Shares in Rightmove, a UK web site for house sales, have dropped  by move then 10% over the last couple of days, following a report by the Financial Times that “Google is in talks with British estate agents to launch an online property portal.”

I do not know what chances of success this venture has. Google does not instantly dominate any market into which it moves; the combination of Google Checkout and Google Shopping hasn’t killed off PayPal/eBay or Amazon as yet.

What interests me though is the impact of the rumour. It is not  only Google’s size and profitability; it is the fact that Google is for many (most?) people the portal to the Web, giving it huge power to reach any Web-based market.

The restraining factor is that it is difficult for Google to exploit that power without falling foul of legislation that protects free markets. When I search for a product on Google, I don’t see any evidence that it favours its own shopping site, for example; though having said that I am only one click away from the Shopping link at the top of the page that searches only Google’s site.

Still, drawing the line between what is and is not reasonable is hard to do. For example, what if Google had a “search for houses on sale here” link in Google Maps?

If Android and later Chrome OS succeed, Google may become the portal to more than just the Web. Its services will have geo-location data as well. Its data-gathering algorithms might learn our shopping and eating out preferences and guide us with uncanny prescience towards things that we enjoy.

I am not surprised that Google rumours have the power to move markets.

Microsoft – Yahoo search deal: 2+2 makes 5, or 3?

Microsoft’s search deal with Yahoo makes more sense than the attempted full acquisition last year. The 10-year deal provides for Microsoft’s Bing to become the back-end search engine for Yahoo, while Yahoo becomes the exclusive sales force for premium search advertisers on both Bing and Yahoo.

Listening to today’s conference call, the rationale for the deal seems to be like this:

Maintaining a search engine is expensive. Yahoo has no appetite for it. So Yahoo saves some cash (in fact, makes some cash) while no longer having that cost burden.

On Microsoft’s side, it is convinced (probably rightly) that large scale is mandatory in order to compete with Google. As Ballmer put it:

the more searches you serve, the more you learn about what people search for and click on

When I was researching Bing, I was told that some of Bing’s features only work if there is sufficiently high usage. You cannot identify patterns of usage without a certain volume of data, which is easy to get for the most common searches, but not so much for those that are more specialist. The long tail applies – there are lots of niche searches.

The value of the data goes beyond search. Search and browsing patterns must enable some remarkable insights into human behaviour, which can inform product development.

A more humdrum fact is that advertisers like large audiences, and the combined search platform may appeal to some advertisers who would otherwise pass it by.

Microsoft is therefore relying on the combined value of the two companies’ search businesses being more than the sum of their individual values.

There is a risk though, which is that some users who like Yahoo’s current search engine may not like Bing so much. If they perceive Yahoo search as merely Microsoft search rebranded, they might jump ship, most likely to Google.

Still, you have to believe in your product. In theory, both companies could benefit from stronger search results and features.

It is important not to forget the context. Google is utterly dominant in search; this is two smaller players struggling to remain relevant in that market. I hope Yah-Bing succeeds because competition is good but the chances are that Google will sail on unperturbed.

Technorati Tags: ,,,

Search for virus help highlights lack of authority in Google, Wikipedia

A contact suffered a trojan infection on his Windows XP machine the other day. He was alerted to the infection by Windows Defender, but the Remove or Quarantine actions offered by Defender did not work. If he removed the trojan, it reappeared on the next reboot. The installed AVG security suite sat there unconcerned.

I am not sure exactly what path he took, but he did some clicking of links and ended up at a site which offered software that promised to fix the issue. The software was called SpyHunter, from Enigma Software. He purchased and installed SpyHunter, which proved no more effective than Defender. At this point he asked me to look at his machine.

A person who has discovered a virus on their PC will be anxious about the attack and its unknown consequences, and will want to fix it urgently. That makes them vulnerable to ill-considered downloads and purchases; and searching the web for assistance with a virus can be like trying to cure alcoholism with drinking. That said, there is good advice to be had; but assessing the authority and reliability of the assistance offered is critical.

My advice in general is only to visit sites that you know to be trusted, such as official Microsoft support, major security software vendors, and only those community sites with which you are already familiar. It is difficult advice to follow though, particularly for non-technical users.

The best course of action after a confirmed infection is to flatten and rebuild the operating system. Larger organizations do this efficiently by restoring a pre-configured image to standardised hardware, but this too is difficult for individuals and SMEs who want to get on with their work.

I digress. My first question: was SpyHunter bona fide, or could it have made the problem worse? The only quick way to find out: back to the search engines, source of all good and all evil. The top entries for SpyHunter on both Google and Bing are the official company site and a Wikipedia entry. Bing has Wikipedia first, while Google puts the company site top.

Note the large role Google (or your favourite search engine) is playing here, both in leading users to possible solutions, and in assessing their value. Although the high placement of the company site is somewhat reassuring, in that Google would probably try not to give a high ranking to known malware, it would be a mistake to rely entirely on a detail like this. Google makes no guarantees concerning the content of the sites it indexes.

Naturally I was more interested in the Wikipedia entry. The entry is annotated with warnings that the article is near-orphaned (though the search engines find it readily enough) and that it reads like an advertisement. There is little detail and it is out-of-date. Further, the language seems strange:

In early 2004, SpyHunter was blamed for producing false positives and using aggressive advertising techniques. This resulted in a lot of bad SpyHunter reviews published. Some of them were harsh, but fair, while others were simply ridiculous. We confirm that SpyHunter was promoted aggressively by some affiliates, but all of them were eventually banned by program makers in late 2004. Early SpyHunter versions had some obvious drawbacks. The product’s version 2.0 resolved all these issues.

This is a quote from a supposedly independent review on a site called 2-software.com. I don’t like the site, which seems (as are so many) dominated by its affiliate links.

SpyHunter is probably harmless, though ineffective. I used the Sophos command-line tool to remove the trojan, and deleted some rogue registry entries; the machine seems OK now though that might just mean that the other trojans are doing a better job of hiding. I also removed SpyHunter of course.

The state of security on the Internet remains lamentable, and security software is a partial solution at best. What interests me here though is the combination of two things:

1. The inadequacy of Wikipedia as an authoritative source, particularly in its less trafficked topics.

2. The high ranking accorded to seemingly any Wikipedia article by the leading search engines.

It is a dangerous combination – not only for virus victims, but for kids doing homework, or anyone researching anything.

Bing’s disappearing search share gain in the US

Web stats site StatCounter caused some excitement last week when it announced that Bing had overtaken Yahoo in search market share, as tracked by its site analysis tools.

I took a look at the figures today, and they make depressing reading for Microsoft:

I’ve annotated the image to show Live Search share on 29 May, compared to Bing share now. They are nearly the same; within the normal daily variation. Yahoo is actually slightly ahead of where it was. Note that all Live Search hits automatically became Bing hits on the day of transition (1st June). As for Google, it is back a little above where it was before.

One odd thing about the StatCounter figures is that at the beginning of this period there was around 5% share for “other”, which has now almost disappeared. Gone to Google? Who knows; and I don’t particularly trust these figures.

There are two organizations with more reliable numbers, one of them Google, because of the number of sites signed up for its Web Analytics, and the other Microsoft, which can count actual hits, but these numbers are not published.

Well, Ballmer said it was a long haul. I’m actually impressed with Bing; the results seem decent, there are some good UI features, and the re-branding is sensible. If StatCounter accurately reflects the market though, the immediate affect of the launch is vanishingly small.

Update: Things look a little better today – Bing is up to 8.52% (note that the figure changes dynamically during each day). A long haul; I’ll be tracking the figures with interest.

Technorati Tags: ,,,,

Bing, Blind Search and electoral fraud

It’s election fever in the UK: in dramatic results, the incumbent party is being pummelled at the polls. So too for search engines? Microsoft employee Michael Kordahi set up a blind search test. Perform a search, select your favourite from three columns of results. It started well for Bing, but market leader Google soon asserted a lead:

Blind search engine test at http://blindsearch.fejus.com Right now: "Google: 45%, Bing: 33%, Yahoo: 21% | 8,518 votes"

said Mr Google Matt Cutts.

Still, that’s not bad for Bing, considering that its market share is tiny in comparison to Google. 5.5% vs 81.5% according to stats I dug up for this Register piece. The real loser is Yahoo, whose second place in search is now under threat from the Microsoft juggernaut.

But can you trust the results? At some point last night Yahoo started an unlikely surge:

Internet search blind test: Google: 34%, Bing: 26%, Yahoo: 40% Try it out! http://blindsearch.fejus.com/

tweeted Bill Hamilton a few hours after Cutts. Someone was gaming the system:

not surprisingly, #blindsearch has been compromised you can still play, but i’m not currently showing results

said Kordahi, as Yahoo hit 57%.

Will Kordahi be able to insulate his test from fraudsters? Who knows; but it is still an interesting experiment.

I tried the test and found the results generally close, with a small edge to Google in my searches. Still, it would be interesting to measure not only which results are best, but also the margin of difference. In the past I’ve found Live Search almost useless, so Bing has made a substantial improvement from my perspective. The UI changes are important too. I’m a minimalist at heart, which again favours Google, but I like some of Bing’s features, especially the site and video previews.

Google’s Wave is of course more interesting from a technical perspective; but it would be a mistake to downplay the business significance of Microsoft improving its search market share. Search drives advertising income.

It’s also worth noting that in search, quantity drives quality. Program Manager Nathan Buggia explained to me how Bing’s categorisation feature works:

For the categorised results those are driven more off the search behaviour we see on our web site, not actually the semantic information that we infer from their web site. What we’ve done is to take all the queries that come into live search and analysed them to see what user intent those queries have. We take a look at the other search terms that they use to figure out where they go, we aggregate that information and use that to define categories, and we are able to draw on that.

Currently Bing only displays category tabs for around the top 10-20% of searches. The reason it is limited to that, according to Buggia, is insufficient volume of data. Using the Xbox as an example, he told me:

If we have a high enough volume of XBox data and we’ve seen that there are a specific set of intents that people are looking for, then we feel confident enough to show the quick tabs.

In other words, Bing could improve its results simply by more people using it.

What happens next? The easy prediction is that Bing will make at least small gains in market share, and that Yahoo will likely decline, perhaps to third place. For Microsoft, that would be no small achievement, but would do little to dislodge the big G. Further, if it sees significant traffic moving to Bing, Google will be quick to counter it with its own improvements. Personally I would like to see more competition in search, which for many users forms a portal that controls which sites they see and which they do not see, but a good launch for Bing is not enough to effect real change.

It could be the beginning of a change though, and that possibility makes Bing worth watching.

Technorati Tags: ,,,

A few good things about Bing – but where is the webmaster’s guide?

So Bing (Bing Is Not Google?) is Microsoft’s new search brand. A few good things about it:

1. Short memorable name, short memorable url

2. Judging by the official video at http://www.decisionengine.com/ Microsoft realises that it has to do something different than Google; doing the same thing almost as well or even just a little better is not enough.

3. Some of the ideas are interesting – morphing the results and the way they are displayed according to the type of search, for example. In the video we see a search for a digital camera that aggregates user reviews from all over the Internet (supposedly); whereas searching for a flight gets you a list of flight offers with fares highlighted.

This kind of thing should work well with microformats, about which Google and Yahoo have also been talking – see my recent post here. But does Bing use them? That’s unknown at the moment, because the Bing Reviewer’s Guide says little about how Bing derives its results. I don’t expect Microsoft to give away its commercial secrets,  but it does have a responsibility to explain how web authors can optimise their sites for Bing – presuming that it has sufficient success to be interesting. Where is the webmaster’s guide?

Some things are troubling. The Bing press material I’ve seen so far is relentlessly commercial, tending to treat users as fodder for ecommerce. While I am sure this is how many businesses perceive them – why else do you want users to come to your site? – it is not a user-centric view. Most searches do not result in a purchase.

There’s a snippet in the reviewer’s guide about why Bing will deliver trustworthy results for medical searches:

Bing Health provides you with access to medical information from nine trusted medical resources, including the Mayo Clinic, the American Cancer Society and MedlinePlus.

No doubt these are trusted names in the USA. Still, reliance on a few trusted brands, while it is good for safety in a sensitive area such as health, is also a route to a dull and sanitized version of the Internet. I am sure there are far more than nine reliable sources of medical information on the Web; and if Bing takes off those others will want to know why they have been excluded.

Back to the introduction in the Reviewer’s Guide:

In a world of excessive choice and too much information, it’s often difficult to make the right decision. What you need is more than just a search engine; you need a decision engine that provides useful tools to help you get what you want fast, rather than simply presenting a list of Web links. Bing is such a decision engine. It provides an easy way to make more informed choices. It organizes popular results by category to help you get the answers you’re looking for without having to guess at the right way to formulate your query. And built right into Bing is a set of intelligent tools to help you accomplish important tasks such as buying a product, planning a trip or finding a local business.

Like many of us, I’ve been searching the web since its earliest days. I found portals and indexes like early Yahoo and dmoz unhelpful: always out of date, never sufficiently thorough. I used DEC’s AltaVista instead, because it seemed to search everywhere. Google came along and did the same thing, but better. Too much categorization and supposed intelligence can work against you, if it hides the result that you really want to see.

Live Search, I’ve come to realise (or theorise), frequently delivers terrible results for me because of faulty localization. It detects that I am in the UK and prioritises what it things are UK results, even though for most of my searches I only care about the quality of the information, not where the web sites are located. It’s another example of the search engine trying to be smart, and ending up with worse results than if it had not bothered.

Still, I’ll undoubtedly try Bing extensively as soon as I can; I do like some of its ideas and will judge it with an open mind.

Technorati Tags: ,,,,,