Category Archives: web authoring

Why Internet Explorer users get the worst of the Web

Microsoft’s Chris Wilson has a post on Compatibility and IE8 which introduces yet another compatibility switch. IE8 will apparently have three modes: Quirks, Standards, and Even More Standard.

Here’s the key paragraph:

… developers of many sites had worked around many of the shortcomings or outright errors in IE6, and now expected IE7 to work just like IE6. Web developers expected us, for example, to maintain our model for how content overflows its box, even in “standards mode,” even though it didn’t follow the specification – because they’d already made their content work with our model. In many cases, these sites would have worked better if they had served IE7 the same content and stylesheets they were serving when visited with a non-IE browser, but they had “fixed their content” for IE. Sites didn’t work, and users experienced problems.

In other words, so many web pages have “If IE, do this” coded into them, that pages actually break if IE behaves correctly. Alternative browsers will do a better job, even if IE is equally standards-compliant, because they do not suffer the effects of these workarounds.

Microsoft’s proposed solution is to make the supposed Standards mode a new quirks mode, this time frozen to IE7 compatibility, and to force developers to add a further meta tag to enable the better standards compliance of which IE8 is capable.

It actually goes beyond that. Aaron Gustafson explains the rationale for the new X-UA-Compatible meta tag which enables web developers to specify what browser versions their page supports. The idea seems to be that browsers parse this tag and behave accordingly.

This sounds uncomfortable to me. Versioning problems are inherently intransigent – DLL Hell, the Windows GetVersion mess – and this could get equally messy. It is also imposing a substantial burden on browser developers.

Has Microsoft made the right decision? Trouble is, there is no right decision, only a least-bad decision. Personally I think it is the wrong decision, if only because it perpetuates the problem. It would be better for IE to do the correct thing by default, and to support meta tags that turn on quirks modes of various kinds, or an option in browser preferences, rather than doing the incorrect thing by default.

Still, Wilson makes a case for the decision and has some supporters. Nevertheless, he is getting a rough ride, in part because the IE team has failed to engage with the community – note for example the long silences on the IE blog. Why is Wilson telling us now about this decision, as opposed to discussing the options more widely before it was set in stone, as I suspect it now is? Even within the Web Standards Project, some of whose members assisted Microsoft, there is tension because it it appears that other members were excluded from the discussion.

Another point which I’m sure won’t go unnoticed is that Wilson makes a good case for using alternative browsers. IE users get inferior markup.

Technorati tags: , , ,

Use HTML not Flash, Silverlight or XUL, says W3C working draft

The W3C has posted its working draft for HTML 5.0. Interesting statement here:

1.1.3. Relationship to XUL, Flash, Silverlight, and other proprietary UI languages

This section is non-normative.

This specification is independent of the various proprietary UI languages that various vendors provide. As an open, vender-neutral language, HTML provides for a solution to the same problems without the risk of vendor lock-in.

Food for thought as you embark on your Flash or Silverlight project. I would have thought XUL is less proprietary, but still.

The contentious part is this:

HTML provides for a solution to the same problems

I doubt that HTML 5.0 can quite match everything that you can do in Flash or Silverlight, depending of course what is meant by “a solution to the same problems”. The other issue is that Flash is here now, Silverlight is just about here now, and HTML 5.0 will be a while yet.

Technorati tags: , , , ,

Escaping the Adobe AIR sandbox

Adobe’s Mike Chambers has an article and sample code for calling native operating system APIs from AIR applications, which use the Flash runtime outside the browser.

I took a look at the native side of the code, which is written in C# and compiled smoothly in Visual Studio 2008. The concept is simple. Instead of launching an AIR application directly, you start the “Command Proxy” application. The Command Proxy launches the AIR application, passing a port number and optionally an authorization string. Next, the Command Proxy creates a TCP socket which listens on the specified port. The AIR application can then use its socket API to send commands to the Command Proxy, which is outside the AIR sandbox.

It’s a neat idea though Microsoft’s Scott Barnes gave the design a C- on security grounds. He clarified his point thus:

The communication channel between the command proxy and AIR application looks like a potential vulnerability. One of the things application developers should worry about with security is insecure cross-process communication mechanisms hanging around on someone’s machine. For example if a process listens on a named pipe, and that named pipe has no ACLs and no validation of inbound communication, the process is vulnerable to all kinds of attacks when garbage is sent down the pipe. In the example on using the command proxy how do you secure it so that it doesn’t turn into a general purpose process launcher?

Barnes has an obvious incentive to cast doubt on AIR solutions (he’s a Microsoft RIA Silverlight evangelist), but nevertheless this is a good debate to have. How difficult is it to do this in a secure manner? It is also interesting to note the opening remarks in Chambers’ post:

Two of the most requested features for Adobe AIR have been the ability to launch native executables from an AIR application, and the ability to integrate native libraries into an AIR application. Unfortunately, neither feature will be included in Adobe AIR 1.0.

This is really one feature: access to native code. I remain somewhat perplexed by AIR in this respect. Is the inability to call native code a security feature, or a way of promoting cross-platform purity, or simply a feature on the to-do list? I don’t think it is really a security feature, since AIR applications have the same access to the file system as the user. This means they can execute native code, just not immediately. For example, an AIR app could download an executable and pop it into the user’s startup folder on Windows. That being the case, why not follow Java’s lead and provide a clean mechanism for calling native code? Adobe could add the usual obligatory warnings about how this breaks cross-platform compatibility and so on.

Sun gets a database manager, but Oracle owns its InnoDB engine

Sun now has a database manager. It’s been a long time coming. Oracle has … Oracle, IBM has DB2, Microsoft has SQL Server; it’s been obvious for years that Sun had a gap to fill. Now Sun has MySQL.

This is interesting to me as I was a relatively early user of the product. I didn’t much like it. It was missing important features like transactions, stored procedures and triggers. I still used it though because of a few appealing characteristics:

  • It was free
  • It was very fast
  • It was lightweight
  • It was the M in LAMP

I should expand slightly on the last of these. The great thing about MySQL was that you did not need to think about installation, PHP drivers, or anything like that. It all came pretty much by default. If you decided that you could not bear MySQL’s limitations, you could use Postgres instead, but it was more effort and less quick.

The ascent of MySQL is a sort of software development parable. Like PHP, MySQL came about from one person’s desire to fix a problem. That person was Michael “Monty” Widenius. He wanted something a little better than mSQL, a popular small database engine at the time:

We once started off with the intention to use mSQL to connect to our own fast low level (ISAM) tables. However, after some testing we came to the conclusion that mSQL was not fast or flexible enough for our needs. This resulted in a new SQL interface to our database but with almost the same API interface as mSQL. This API was chosen to ease porting of third-party code.

Why did MySQL take off when there were better database engines already out there? It was partly to do with the nature of many LAMP applications in the early days. They were often not mission-critical (mine certainly were not), and they were typically weighted towards reading rather than writing data. If you are building a web site, you want pages served as quickly as possible. MySQL did that, and without consuming too many resources. Many database engines were better, but not many were faster.

MySQL today has grown up in many ways, though transactions are still an issue. To use them you need to use an alternate back-end storage engine, either InnoDB or BDB. BDB is deprecated, and InnoDB is included by default in current releases of MySQL. InnoDB is owned by Oracle, which could prove interesting given how this deal changes the dynamics of Sun’s relationship with Oracle, though both MySQL and InnoDB are open source and published under the GPL. Will Sun try to find an alternative to InnoDB?

While I agree with most commentators that this is a good move for Sun, it’s worth noting that MySQL was not originally designed to meet Enterprise needs, which is where most of the money is.

Update: as Barry Carr comments below, there is a planned replacement for InnoDB called Falcon.

Detailed look at a WordPress hack

Angsuman Chakraborty’s technical blog suffered a similar attack to mine – the malicious script was the same, though the detail of the attack was different. In my case WordPress was attacked via Phorum. Chakraborty offers a detailed look at how his site was compromised and makes some suggestions for improving WordPress security.

In both these cases, WordPress was not solely to blame. At least, that is the implication. Chakraborty thinks his attack began with an exploit described by Secunia, which requires the hacker first to obtain access to the WordPress password database, via a stray backup or a SQL injection attack. Nevertheless, Chakraborty says:

One of the challenges with WordPress is that security considerations were mostly an afterthought (feel free to disagree) which were latched on as WordPress became more and more popular.

I have huge respect for WordPress. Nevertheless, I believe its web site could do better with regard to security. The installation instructions say little about it. You really need to find this page on hardening WordPress. It should be more prominent.

Technorati tags: ,

Wikia Search is live

You can now perform searches on Wikia, the open source search engine from the founder of Wikipedia.

This is from the about page:

We are aware that the quality of the search results is low..

Wikia’s search engine concept is that of trusted user feedback from a community of users acting together in an open, transparent, public way. Of course, before we start, we have no user feedback data. So the results are pretty bad. But we expect them to improve rapidly in coming weeks, so please bookmark the site and return often.

I tried a few searches for things I know about, and indeed the results were poor. I am going to follow the advice.

Wikia’s Jimmy Wales says there is a moral dimension here:

I believe that search is a fundamental part of the infrastructure of the Internet, and that it can and should therefore be done in an open, objective, accountable way.

There are several issues here. The power of Google to make or break businesses is alarming, particularly as it seeks to extend its business and there are growing potential conflicts of interest between delivering the best search results, and promoting particular sites. Google’s engine is a black box, to protect its commercial secrets. Search ranking has become critical to business success, and much energy is expended on the dubious art of search engine optimization, sometimes to the detriment of the user’s experience.

Another thought to ponder is how Google’s results influence what people think they know about, well, almost anything. Children are growing up with the idea that Google knows everything; it is the closest thing yet to Asimov’s Multivac.

In other words, Wales is right to be concerned. Can Wikia fix the problem? The big question is whether it can be both open and spam-resistant. Some people thought that open source software would be inherently insecure, because the bad guys can see the source. This logic has been proven faulty, since it the flaw is more than mitigated by the number of people scrutinizing open source code and fixing problems. Can the same theory apply to search? That’s unknown at this point.

It is interesting to note that Wikipedia itself is not immune to manipulation, but works fairly well overall. However, if Wikia Search attracts significant usage, it may prove a bigger target. I guess this could be self-correcting, in that if Wikia returns bad results because of manipulation, its usage will drop.

I don’t expect Wikia to challenge Google in a meaningful way any time soon. Google is too good and too entrenched. Further, Google and Wikipedia have a symbiotic relationship. Google sends huge amounts of traffic to Wikipedia, and that works well for users since it often has the information they are looking for. Win-win.

Knol questions

The internet is buzzing about Knol. Google no longer wishes merely to index the web’s content. Google wishes to host the web’s content. Why? Ad revenue. Once you click away from Google, you might see ads for which Google is not the agent. Perish the thought. Keep web users on Google; keep more ad revenue.

Snag is, there is obvious conflict of interest. Actually, there is already conflict of interest on Google. I don’t know how many web pages out there host Adsense content (mine do), but it is a lot. When someone clicks an Adsense ad, revenue is split between Google and the site owner. Therefore, it would pay Google to rank Adsense sites above non-Adsense sites in its search. Would it do such a thing? Noooo, surely not. How can we know? We can’t. Google won’t publish its search algorithms, for obvious reasons. You have to take it on trust.

That question, can we trust Google, is one that will be asked again and again.

Knol increases the conflict of interest. Google says:

Our job in Search Quality will be to rank the knols appropriately when they appear in Google search results. We are quite experienced with ranking web pages, and we feel confident that we will be up to the challenge.

Will Google rank Knol pages higher than equally good content on, say, Wikipedia? Noooo. How will we know? We won’t. We have to take it on trust.

On balance therefore I don’t much like Knol. It is better to separate search from content provision. But Google is already a content provider (YouTube is another example) so this is not really groundbreaking.

I also have some questions about Knol. The example article (about insomnia) fascinates me. It has a named author, and Google’s Udi Manber highlights the importance of this:

We believe that knowing who wrote what will significantly help users make better use of web content.

However, it also has edit buttons, like a wiki. If it is a wiki, it is not clear how the reader will distinguish between what the named author wrote, and what has been edited. In the history tab presumably; but how many readers will look at that? Or will the author get the right to approve edits? When an article has been edited so thoroughly that only a small percentage is original, does the author’s name remain?

Personally, I would not be willing to have my name against an article that could be freely edited by others. It is too risky.

Second, there is ambiguity in Manber’s remark about content ownership:

Google will not ask for any exclusivity on any of this content and will make that content available to any other search engine.

Hang on. When I say, “non-exclusive”, I don’t mean giving other search engines the right to index it. I mean putting it on other sites, with other ads, that are nothing to do with Google. A slip of the keyboard, or does Google’s “non-exclusive” mean something different from what the rest of us mean?

Finally, I suggest we should not be hasty in writing off Wikipedia. First mover has a big advantage. Has Barnes and Noble caught up with Amazon? Did Yahoo Auctions best eBay? Has Microsoft’s MSN Video unseated YouTube? Wikipedia is flawed; but Knol will be equally flawed; at least Wikipedia tries to avoid this kind of thing:

For many topics, there will likely be competing knols on the same subject. Competition of ideas is a good thing.

Then again, Wikipedia knows what it is trying to do. Knol is not yet baked. We’ll see.

Update

Danny Sullivan, who has been briefed by Google, has some answers. Partial answers, anyway. Here’s one:

Google Knol is designed to allow anyone to create a page on any topic, which others can comment on, rate, and contribute to if the primary author allows

The highlighting is mine. Interesting. I wonder what the dynamics would/will be. Will editable pages float to the top?

Second:

The content will be owned by the authors, who can reprint it as they like

You can guess my next question. If as the primary author I have enabled editing, do any contributions become mine? What if I want to include the article in a printed book? The GNU Free Documentation License used by Wikipedia seems a simpler solution.

Fun: Wikipedia already has an article on knol.

Amazon SimpleDB: a database server for the internet

Amazon has announced SimpleDB, the latest addition to what is becoming an extensive suite of web services aimed at developers. It is now in beta.

Why bother with SimpleDB, when seemingly every web server on the planet already has access to a free instance of MySQL? Perhaps the main reason is scalability. If demand spikes, Amazon handles the load. Second, SimpleDB is universally accessible, whereas your MySQL may well be configured for local access on the web server only. If you want an online database to use from a desktop application, this could be suitable. It should work well with Adobe AIR once someone figures out an ActionScript library. That said, MySQL and the like work fine for most web applications, this blog being one example. SimpleDB meets different needs.

This is utility computing, and prices look relatively modest to me, though you pay for three separate things:

Machine Utilization – $0.14 per Amazon SimpleDB Machine Hour consumed.

Data Transfer – $0.10 per GB – all data transfer in. From $0.18 per GB – data transfer out.

Structured Data Storage – $1.50 per GB-month.

In other words, a processing time fee, a data transfer fee, and a data storage fee. That’s reasonable, since each of these incurs a cost. The great thing about Amazon’s services is that there are no minimum costs or standing fees. I get billed pennies for my own usage of Amazon S3, which is for online backup.

There are both REST and SOAP APIs and there are example libraries for Java, Perl, PHP, C#, VB.NET (what, no Javascript or Python?).

Not relational

Unlike MySQL, Oracle, DB2 or SQL Server, SimpleDB is not a relational database server. It is based on the concept of items and attributes. Two things distinguish it from most relational database managers:

1. Attributes can have more than one value.

2. Each item can have different attributes.

While this may sound disorganized, it actually maps well to the real world. One of the use cases Amazon seems to have in mind is stock for an online store. Maybe every item has a price and a quantity. Garments have a Size attribute, but CDs do not. The Category attribute could have multiple values, for example Clothing and Gifts.

You can do such things relationally, but it requires multiple tables. Some relational database managers do support multiple values for a field (FileMaker for example), but it is not SQL-friendly.

This kind of semi-structured database is user-friendly for developers. You don’t have to plan a schema in advance. Just start adding items.

A disadvantage is that it is inherently undisciplined. There is nothing to stop you having an attribute called Color, another called Hue, and another called Shade, but it will probably complicate your queries later if you do.

All SimpleDB attribute values are strings. That highlights another disadvantage of SimpleDB – no server-side validation. If a glitch in your system gives an item a Price of “Red”, SimpleDB will happily store the value.

Not transactional or consistent

SimpleDB has a feature called “Eventual Consistency”. It is described thus:

Amazon SimpleDB keeps multiple copies of each domain. When data is written or updated (using PutAttributes, DeleteAttributes, CreateDomain or DeleteDomain) and Success is returned, all copies of the data updated. However, it takes time for the update to propogate to all storage locations. The data will eventually be consistent, but an immediate read might not show the change.

Right, so if you have one item in stock you might sell it twice to two different customers (though the docs say consistency is usually achieved in seconds). There is also no concept of transactions as far as I can see. This is where you want a sequence of actions to succeed or fail as a block. Well, it is called SimpleDB.

This doesn’t make SimpleDB useless. It does limit the number of applications for which it is suitable. In most web applications, read operations are more common than write operations. SimpleDB is fine for reading. Just don’t expect your online bank to be adopting SimpleDB any time soon.

Buzzword gets word count, easier sharing

Adobe’s online, Flash-based word processor has been revised and now includes word count, handy for journalists and essay writers:

buzz_count

It also has easier sharing: simply copy the URL at the top of the document you are editing to create a link. I intended to demo this here; but cannot because Buzzword does not yet support public sharing. You can only share with other specified Buzzword users identified by their email address. Buzzword says public sharing is “a feature that we’re actively investigating.”

Amusing: I noticed that my document had a number of flagged words, wiggly underlines that indicate likely spelling errors. Two of these unrecognized terms were “Google” and “Zoho”. “Buzzword” on the other hand is in the dictionary.

Technorati tags: , , , ,

Mobile data taking off at last

Excellent article from ARC on mobile data trends, mentioning that Vodafone has just reported a 50% jump in data traffic:

It is the culmination of attractive data pricing, improved usability and mobile demand for Web 2.0 services which is brewing to form the prefect data storm. As data pricing erodes along the same path travelled by voice, operators must now identify ways to tap into revenues from web services or else be left exposed when the data hurricane arrives.

Personally I don’t need convincing; I’ve been a heavy mobile data user for years; I can’t wait for the data price erosion mentioned. Now we are seeing great little apps from Google (Maps, GMail) and others, better mobile web browsers (iPhone etc) and faster data speeds, and the mass market is waking up to the potential for mobile Internet access. It is taking longer than anyone thought it would, but the trend is unmistakable.

Postscript: see Mobile Web- So Close Yet So Far for a more enigmatic view from Michael Fitzgerald at the New York Times:

For now, widespread use of the mobile Web remains both far off and inevitable.

Note that the piece criticizes the iPhone for not supporting Flash.

The dynamics of this are interesting. Flash is sufficiently entrenched that you can say iPhone is bad for not rendering Flash, not that Flash is bad because it does not work on an iPhone. 

Technorati tags: , , ,