Category Archives: internet

Knol questions

The internet is buzzing about Knol. Google no longer wishes merely to index the web’s content. Google wishes to host the web’s content. Why? Ad revenue. Once you click away from Google, you might see ads for which Google is not the agent. Perish the thought. Keep web users on Google; keep more ad revenue.

Snag is, there is obvious conflict of interest. Actually, there is already conflict of interest on Google. I don’t know how many web pages out there host Adsense content (mine do), but it is a lot. When someone clicks an Adsense ad, revenue is split between Google and the site owner. Therefore, it would pay Google to rank Adsense sites above non-Adsense sites in its search. Would it do such a thing? Noooo, surely not. How can we know? We can’t. Google won’t publish its search algorithms, for obvious reasons. You have to take it on trust.

That question, can we trust Google, is one that will be asked again and again.

Knol increases the conflict of interest. Google says:

Our job in Search Quality will be to rank the knols appropriately when they appear in Google search results. We are quite experienced with ranking web pages, and we feel confident that we will be up to the challenge.

Will Google rank Knol pages higher than equally good content on, say, Wikipedia? Noooo. How will we know? We won’t. We have to take it on trust.

On balance therefore I don’t much like Knol. It is better to separate search from content provision. But Google is already a content provider (YouTube is another example) so this is not really groundbreaking.

I also have some questions about Knol. The example article (about insomnia) fascinates me. It has a named author, and Google’s Udi Manber highlights the importance of this:

We believe that knowing who wrote what will significantly help users make better use of web content.

However, it also has edit buttons, like a wiki. If it is a wiki, it is not clear how the reader will distinguish between what the named author wrote, and what has been edited. In the history tab presumably; but how many readers will look at that? Or will the author get the right to approve edits? When an article has been edited so thoroughly that only a small percentage is original, does the author’s name remain?

Personally, I would not be willing to have my name against an article that could be freely edited by others. It is too risky.

Second, there is ambiguity in Manber’s remark about content ownership:

Google will not ask for any exclusivity on any of this content and will make that content available to any other search engine.

Hang on. When I say, “non-exclusive”, I don’t mean giving other search engines the right to index it. I mean putting it on other sites, with other ads, that are nothing to do with Google. A slip of the keyboard, or does Google’s “non-exclusive” mean something different from what the rest of us mean?

Finally, I suggest we should not be hasty in writing off Wikipedia. First mover has a big advantage. Has Barnes and Noble caught up with Amazon? Did Yahoo Auctions best eBay? Has Microsoft’s MSN Video unseated YouTube? Wikipedia is flawed; but Knol will be equally flawed; at least Wikipedia tries to avoid this kind of thing:

For many topics, there will likely be competing knols on the same subject. Competition of ideas is a good thing.

Then again, Wikipedia knows what it is trying to do. Knol is not yet baked. We’ll see.

Update

Danny Sullivan, who has been briefed by Google, has some answers. Partial answers, anyway. Here’s one:

Google Knol is designed to allow anyone to create a page on any topic, which others can comment on, rate, and contribute to if the primary author allows

The highlighting is mine. Interesting. I wonder what the dynamics would/will be. Will editable pages float to the top?

Second:

The content will be owned by the authors, who can reprint it as they like

You can guess my next question. If as the primary author I have enabled editing, do any contributions become mine? What if I want to include the article in a printed book? The GNU Free Documentation License used by Wikipedia seems a simpler solution.

Fun: Wikipedia already has an article on knol.

Amazon SimpleDB: a database server for the internet

Amazon has announced SimpleDB, the latest addition to what is becoming an extensive suite of web services aimed at developers. It is now in beta.

Why bother with SimpleDB, when seemingly every web server on the planet already has access to a free instance of MySQL? Perhaps the main reason is scalability. If demand spikes, Amazon handles the load. Second, SimpleDB is universally accessible, whereas your MySQL may well be configured for local access on the web server only. If you want an online database to use from a desktop application, this could be suitable. It should work well with Adobe AIR once someone figures out an ActionScript library. That said, MySQL and the like work fine for most web applications, this blog being one example. SimpleDB meets different needs.

This is utility computing, and prices look relatively modest to me, though you pay for three separate things:

Machine Utilization – $0.14 per Amazon SimpleDB Machine Hour consumed.

Data Transfer – $0.10 per GB – all data transfer in. From $0.18 per GB – data transfer out.

Structured Data Storage – $1.50 per GB-month.

In other words, a processing time fee, a data transfer fee, and a data storage fee. That’s reasonable, since each of these incurs a cost. The great thing about Amazon’s services is that there are no minimum costs or standing fees. I get billed pennies for my own usage of Amazon S3, which is for online backup.

There are both REST and SOAP APIs and there are example libraries for Java, Perl, PHP, C#, VB.NET (what, no Javascript or Python?).

Not relational

Unlike MySQL, Oracle, DB2 or SQL Server, SimpleDB is not a relational database server. It is based on the concept of items and attributes. Two things distinguish it from most relational database managers:

1. Attributes can have more than one value.

2. Each item can have different attributes.

While this may sound disorganized, it actually maps well to the real world. One of the use cases Amazon seems to have in mind is stock for an online store. Maybe every item has a price and a quantity. Garments have a Size attribute, but CDs do not. The Category attribute could have multiple values, for example Clothing and Gifts.

You can do such things relationally, but it requires multiple tables. Some relational database managers do support multiple values for a field (FileMaker for example), but it is not SQL-friendly.

This kind of semi-structured database is user-friendly for developers. You don’t have to plan a schema in advance. Just start adding items.

A disadvantage is that it is inherently undisciplined. There is nothing to stop you having an attribute called Color, another called Hue, and another called Shade, but it will probably complicate your queries later if you do.

All SimpleDB attribute values are strings. That highlights another disadvantage of SimpleDB – no server-side validation. If a glitch in your system gives an item a Price of “Red”, SimpleDB will happily store the value.

Not transactional or consistent

SimpleDB has a feature called “Eventual Consistency”. It is described thus:

Amazon SimpleDB keeps multiple copies of each domain. When data is written or updated (using PutAttributes, DeleteAttributes, CreateDomain or DeleteDomain) and Success is returned, all copies of the data updated. However, it takes time for the update to propogate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change.

Right, so if you have one item in stock you might sell it twice to two different customers (though the docs say consistency is usually achieved in seconds). There is also no concept of transactions as far as I can see. This is where you want a sequence of actions to succeed or fail as a block. Well, it is called SimpleDB.

This doesn’t make SimpleDB useless. It does limit the number of applications for which it is suitable. In most web applications, read operations are more common than write operations. SimpleDB is fine for reading. Just don’t expect your online bank to be adopting SimpleDB any time soon.

Live Workspace: can someone explain the offline story?

I showed the Asus Eee PC to a friend the other day. She liked it, but won’t be buying. Why? It doesn’t run Microsoft Office (yet – an official Windows version is planned).

It reminded me how important Office is to Microsoft. No wonder it is fighting so hard in the ODF vs OOXML standards war.

Therefore, if anything can boost Microsoft’s Web 2.0 credentials (and market share), it has to be Office. I’ve not yet been able to try out Office Live Workspace, but it strikes me that Microsoft is doing at least some the right things. As I understand it, you get seamless integration between Office and web storage, plus some extras like document sharing and real-time collaboration.

I still have a question though, which inevitably is not answered in the FAQ. What’s the offline story? In fact, what happens when you are working on a document at the airport, your wi-fi pass expires, and you hit Save? Maybe a beta tester can answer this. Does Word or Excel prompt for a local copy instead? And if you save such a copy, how do you sync up the changes later?

If there’s a good answer, then this is the kind of thing I might use myself. If there is no good answer, I’ll stick with Subversion. Personally I want both the convenience of online storage and the comfort of local copies, with no-fuss synch between the two.

That said, I may be the only one concerned about this. When I Googled for Live Workspace Offline, the top hit was my own earlier post on the subject.

Microsoft Volta: magic, but is it useful magic?

Microsoft has released an experimental preview of Volta, a new development product with some unusual characteristics:

1. You write your application for a single machine, then split it into multiple tiers with a few declarations:

Volta automatically creates communication, serialization, and remoting code. Developers simply write custom attributes on classes or methods to tell Volta the tier on which to run them.

2. Volta seamlessly translates .NET byte code (MSIL) to Javascript, on an as-needed basis, to achieve cross-platform capability:

When no version of the CLR is available on the client, Volta may translate MSIL into semantically-equivalent Javascript code that can be executed in the browser. In effect, Volta offers a best-effort experience in the browser without any changes to the application.

The reasoning behind this is that single-machine applications are easier to write. Therefore, if the compiler can handle the tough job of distributing an application over multiple tiers, it makes the developer’s job easier. Further, if you can move processing between tiers with just a few declarations, then you can easily explore different scenarios.

Since the input to Volta is MSIL, you can work in Visual Studio using any .NET language.

Visionary breakthrough, or madness? Personally I’m sceptical, though I have had a head start, since this sounds very like what I discussed with Eric Meijer earlier this year, when it was called LINQ 2.0:

Meijer’s idea is programmers should be able to code for the easiest case, which is an application running directly on the client, and be able to transmute it into a cross-platform, multi-tier application with very little change to the code.

What are my reservations? It seems hit-and-miss, not knowing whether your app will be executed by the CLR or as Javascript; while leaving it to a compiler to decide how to make an application multi-tier, bearing in mind issues like state management and optimising data flow, sounds like a recipe for inefficiency and strange bugs.

It seems Microsoft is not sure about it either:

Volta is an experiment that enables Microsoft to explore new ways of developing distributed applications and to continue to innovate in this new generation of software+services. It is not currently a goal to fit Volta into a larger product roadmap. Instead, we want feedback from the community of partners and customers to influence other Live Labs technologies and concepts.

Zoho users logging into other accounts by accident

Zoho users beware. There appears to be a nasty bug whereby a user logs in with their own credentials, but finds themselves logged into another user’s account:

I have the last couple of weeks experienced that I get logged on into another account that I do not know!
I can see the other account documents. Just a few minutes ago I tried to use my own logon but was logged in to the account of <…>

says a user on the Zoho forums.

Zoho says it is fixing this urgently:

We have analyzed the logs and found some race conditions that could happen under high load. We have a fix in, and are continuing to monitor it very closely. We have also launched a complete review of security, so that this type of issue does not recur. We are taking it very seriously and apologize profusely.

Food for thought nonetheless. This is the kind of reason people cite for sticking with on-premise applications. I argue that data is often safer in the cloud, but this kind of incident makes you wonder.

Technorati tags: , ,

Buzzword gets word count, easier sharing

Adobe’s online, Flash-based word processor has been revised and now includes word count, handy for journalists and essay writers:

buzz_count

It also has easier sharing: simply copy the URL at the top of the document you are editing to create a link. I intended to demo this here; but cannot because Buzzword does not yet support public sharing. You can only share with other specified Buzzword users identified by their email address. Buzzword says public sharing is “a feature that we’re actively investigating.”

Amusing: I noticed that my document had a number of flagged words, wiggly underlines that indicate likely spelling errors. Two of these unrecognized terms were “Google” and “Zoho”. “Buzzword” on the other hand is in the dictionary.

Technorati tags: , , , ,

Mobile data taking off at last

Excellent article from ARC on mobile data trends, mentioning that Vodafone has just reported a 50% jump in data traffic:

It is the culmination of attractive data pricing, improved usability and mobile demand for Web 2.0 services which is brewing to form the prefect data storm. As data pricing erodes along the same path travelled by voice, operators must now identify ways to tap into revenues from web services or else be left exposed when the data hurricane arrives.

Personally I don’t need convincing; I’ve been a heavy mobile data user for years; I can’t wait for the data price erosion mentioned. Now we are seeing great little apps from Google (Maps, GMail) and others, better mobile web browsers (iPhone etc) and faster data speeds, and the mass market is waking up to the potential for mobile Internet access. It is taking longer than anyone thought it would, but the trend is unmistakable.

Postscript: see Mobile Web- So Close Yet So Far for a more enigmatic view from Michael Fitzgerald at the New York Times:

For now, widespread use of the mobile Web remains both far off and inevitable.

Note that the piece criticizes the iPhone for not supporting Flash.

The dynamics of this are interesting. Flash is sufficiently entrenched that you can say iPhone is bad for not rendering Flash, not that Flash is bad because it does not work on an iPhone. 

Technorati tags: , , ,

CodeRage II: Windows only, login problems

I was surprised to learn that CodeGear’s online conference is apparently closed to Mac users, or anyone not on Windows:

coderage_mac

That’s odd, since the company has Java and Ruby development products that run cross-platform.

Further, even Windows users have had problems logging in. The conferencing software CodeGear is using is limited to 1500 attendees per session, but thanks to a glitch sessions were reported full even when they were not. A message posted to the borland newgroup explains:

It turns out the problem was that only the first 1500 people who registered for CodeRage were successfully registered to attend all of the InterWise events because of a 1500 person limitation for iSeminar events. Unfortunately, this meant that 1500 attendance spots were reserved for those 1500 email addresses even though less than that we’re actually attending. Long story short, I’ve removed all IW registrations from individual events so anyone should be able to get in.  You shouldn’t see anymore “Exceeded max number of participants” error messages unless we really hit 1500 people for any given session.

I had problems myself – I am not sure if it was this limitation, or just the Interwise conferencing software which, like so much out there, appears to be uncomfortable with Windows Vista/UAC and presented a variety of error messages. I didn’t record all the details, but I was constantly being told I had cancelled the setup when I had done no such thing.

Hmmm, I seem to recall technical problems with previous Borland/CodeGear online events as well. Surely it’s time the company got these things right?

Technorati tags: , ,

Microsoft vs Mozilla Javascript wars

My comment is here.

Note this debate is not only about the merits of different versions of Javascript/ECMAScript. It is also about power and responsibility. However you spin it, and however far Adobe and/or Microsoft succeed with Flash/Silverlight/AIR, I think we can agree that the browser has an important role for the foreseeable future. It is also likely (though less certain, I guess) that Internet Explorer will continue to have a large market share. The company does have a responsibility not to hold back the Web, and that surely includes not obstructing the evolution of a high-performance Javascript runtime.

It is disappointing that Microsoft says so little about IE8, presuming it exists. If the company sticks by its undertaking to leave no more than two years between IE releases, we should expect it no later than October 2008, less than one year away. It would help web developers to know more about what will be in it.

Is CodeRage the future of tech conferences?

CodeRage 2007 starts next week. It’s a technical conference covering CodeGear’s products, including Dephi, JBuilder, C++ Builder and 3rdRail, the new Ruby on Rails IDE.

The conference is both free and virtual.

A virtual conference is no substitute for human contact. I’ve learnt this paradox over many years: even if the same content is freely available on the Web, there is substantial benefit in physical attendance. You are more focused, you learn more, you can easily ask questions, and you pick up all those indefinable signals from others who are attending.

Equally, the global fuel crisis and concern about the environmental cost of travel surely means that virtual conferencing is an idea whose time has come. Another benefit is that it includes an array of people for whom a typical tech conference is just not feasible, for financial or other reasons.

I’d like to see more of these.

Technorati tags: , , , ,