Genii Weblog

Another update on ODF vs. OOXML file counts

Mon 13 Aug 2007, 12:01 AM



by Ben Langhinrichs
It is now three months since I first posted the table of OOXML file types found by Google.  I was curious how the numbers have changed, so here is the updated file:


Format
Count (May 10, 2007)
Count (August 12, 2007)
ODT
85,200
90,400
ODS
20,700
21,600
ODP
43,400
50,700
Total ODF
149,300
162,700



DOCX
516 (12% on Microsoft.com)
1010 (16% on Microsoft.com)
XLSX
68 (6% on Microsoft.com)
216 (2% on Microsoft.com)
PPTX
80 (13% on Microsoft.com)
767 (47% on Microsoft.com)
Total OOXML
664 (11% on Microsoft.com)
1993 (26% on Microsoft.com)

I guess a dedicated Microsoftie could spin this in a positive way (rah! rah! 100% growth in DOCX format in three months), but it is getting harder.  In eight months since Office 2007 was released to th general public (10 months since release to enterprise customers), there are under 2000 of these office documents posted on the web.  In three months, 13,400 more ODF documents have been added to the web, with only 1,329 OOXML documents added.  It is hard to spin ten times as many ODF documents added as OOXML documents,  especially as 451 (34%) of those new documents were added on Microsoft.com.  That isn't what I would call good traction for the overwhelmingly dominant office suite.

And all of this before IBM rolls out Notes 8 with the ODF productivity editors included as part of tha package.  Since Notes/Domino is enterprise software, and enterprises move slowly, I don't predict an immediate surge, but I would expect a steady increase after six months or so.  By that time, if OOXML keeps up its torrid growth rate of 200% every three months, they should have an amazing 7,972 OOXML documents on the web, while ODF with its measly 9% growth rate should have about 193,215 ODF documents.  Eventually, of course, OOXML would likely catch up, except that the IT industry tends to reward winners, and the preponderance of ODF documents and ODF compatible office products is likely to start being noticed.  Also, that awkward percentage of the OOXML documents on Microsoft.com (especially going from 11% to 26%) is not going to fool anyone long.  When ODF documents are all over the place, and when Microsoft Office documents are still all really in the older binary format... I think people will get the message.

Copyright © 2007 Genii Software Ltd.

What has been said:


614.1. Wu MingShi
(08/14/2007 10:03 AM)

I see your point but I do not share your optimism for ODF, even though I am a pro-ODF, anti-OOXML fellow.

I do not think Notes 8 will have a big impact on the ODF figures. Enterprise uses them, but they are those who are more unlikely to put their documents on the web. The same probably explains why the OOXML figure is still so low.

I think the chances that OOXML catch up and surpass ODF is higher than both being at the same level in the long run. However, it is true that there is an opportunity now for non Microsoft format.


614.2. Peter H
(14/08/2007 22:31)

I wonder what file format Gogle desktop works in. Given the take-up I've seen in SMEs this could have quite an impact if it was ODF (haven't tried myself yet).


614.3. Rick Jelliffe
(08/15/2007 01:47 AM)

Or perhaps it means that people understand that ODF is suitable for document interchange use between unknown suites (i.e. putting documents on the web) and Open XML is more suitable for "full-fidelity" uses (baseline format for transformation, archiving).

The anti-Open XML side has been saying that DIS 29500 should not be passed, because people would be confused by two standards, and use them inappropriately. I am sorry to be Mr Glass-half-full, but why isn't this evidence of that?


614.4. Thomas Downing
(08/15/2007 05:20 AM)

Rick,

I admit that I have not read everything you have posted as to ODF vs OOXML, but I have read a fair bit about the issues. So I have a question for you.

I often here the line 'OOXML is more suitable for "full fidelity"...', or variations on that theme. Can you provide links to objective studies that indicate that this is so?

Let me add that the mere presence of a a tag such as "SpaceLikeWord97" (pardon me if that is not the actual tag, but you get my drift) has nothing to do with fidelity per se, as these tags have no definition in OOXML - hence they cannot be used by anyone other than MS for the purposes of fidelity, and hence, do not qualify the OOXML as better for fidelity, when it is viewed as a standard.


614.5. Wu MingShi
(08/15/2007 05:59 AM)

Rick,

[To moderator: A bit off-topic, but this is the first time I see Rick specifically mention this line of reasoning. I hope you will tolerate it]

I like your argument that ODF is suitable for document interchange but OOXML is more suitable for "full-fidelity" use. This statement, in the context of document exchange here and in my small introvert mind, suggests that you are advocating "full-fidelity" as transfer to and from MS Office only. If so, we do not need an ISO standard, just Microsoft publishing everything needed to transfer to and from MS Office.

The truth is OOXML is position in the market as a document exchange + editing format.


614.6. Ben Langhinrichs
(08/15/2007 06:45 AM)

I'll tolerate pretty much anything that isn't spam or abusive to others.

As for Rick's argument, I have seen it frequently stated that OOXML is better at fidelity, but I have yet to see much evidence of that. In any case, I don't see how it has much relevance to becoming an ISO standard, but I guess people can disagree.


614.7. Aaron Trevena
(08/15/2007 10:51 AM)

OOXML is redundant as a full fidelity document standard - PDF fulfils that role well - it's already widely deployed and accepted, with multiple 100% compatible tools from multiple vendors on multiple platforms.

OOXML's sole useful role is extending the useful lifetime of obselete MS Office documents. Not really of value to anybody else and therefore not really worth becoming an ISO standard as nobody really can implement it apart from Microsoft.


614.8. Mike Brown
(08/15/2007 05:17 PM)

What search terms are you entering into Google for the Microsoft searches, Ben? I'm using:

filetype:docx site:*.microsoft.*

and getting only 135 docx files on the Microsoft site.

Cheers,

- Mike


614.9. Elroy
(16-08-2007 00:56)

Hmm, lies damned lies and statistics again I guess.

Microsoft office 2007 is simply not as widespread yet, so companies that spread documents in a microsoft format will still opt for doc over docx because of the larger userbase. For those using OO.o this is not much an issue as an upgrade to the latest version is 'free'.

Don't expect your figures to change until the majority uses Office 2007. But also don't make the mistake of calling this a victory for odf.


614.10. Jan
(16.08.2007 03:40)

I don't think Microsoft really has an interest in either ODF or OOXML replacing the binary formats. Their semi-monopoly is best protected if as many files as possible are converted to neither.

The aim of OOXML mainly seems to be to harm ODF adoption. As long as there is no obvious choice for a file format guaranteed to be useful for everyone, people will stick with the old binary formats.


614.11. Ben Langhinrichs
(08/16/2007 05:20 AM)

Mike - I used "filetype:docx site:microsoft.com" which returns 152 .DOCX files now (15%) instead of the 161 it returned before (16%). It seems to vary a lot.

Elroy - Keep telling yourself that. I remember the WordPerfect people making the same argument.

Jan - I think this is fairly true, or at least started as true. While Microsoft seems to get the value of XML formats, they are rightly very concerned about losing a position of strength (proprietary lock-in). I think Microsoft has been very surprised that their OOXML format didn't immediately swamp ODF by sheer numbers. Unfortunately, they don't really have time to go back, lick their wounds, and make a far better OOXML format (which they could), but have to stick to the standard which they created too hastily. Even now, it would be in their best interest if the standards process DID compell them to improve the specs, but they are probably very worried about the numbers and trying to consolidate first, fix later. I would, in their shoes.


614.12. TemporalBeing
(08/16/2007 06:54 AM)

Ben,

One thing to remember about Office and file formats is that (a) there is usually a long delay for many companies to upgrade, and (b) even after upgrading, many companies will still use the older format until the new format has been widely enough accepted. Suffice to say, it will be 3 or 4 years before OOXML will get up to speed with its predecessor binary formats.

So, ODF does have a great advantage. While it will take Microsoft 3 to 4 years to get their format into usage, ODF will start to be used a lot sooner. Why? Because the software that supports it is more vast, and that software is seeing a better acceptance rate. Plus, people are more likely to use ODF in going to that software.

With Microsoft, they know they can use the new formats, but they are afraid of losing the ability to share documents with those that use the older formats. Thus they weight. With the software that defaults to ODF, it is generally known that they read and support the older formats without a problem - so sharing is not so much an issue.

So, yeah - I'd expect ODF to have great growth now compared to OOXML. Hopefully, it will be enough growth that by the normal time companies switch over to the new formats, they'll switch to ODF instead. Unfortunately, it will be 3 to 4 years before we find out who the real winner will be.

Also, expect Microsoft to continue its big fight for OOXML acceptance - it has a lot to lose (nearly 1/3 of its revenue & profits).


614.13. e.p.
(08/17/2007 12:00 PM)

yea, but how many .doc files have been posted? :)


614.14. Dogboy
(08/17/2007 12:38 PM)

As it is a fairly new format, it's more likely that the reason there aren't as many OOXML documents is because people are saving back to Office 2003 formats for compatibility with people who haven't upgraded yet. This is bound to happen with any new file format, Microsoft or otherwise.

If someone uploads a file that won't need to be edited, they'll probably upload a PDF. If they expect it to be edited, they'll probably save it back to a more common format. That's why the numbers are as they are. In fact, you have no idea how many people are running Office 2007 from this data.

You can say what you want about why people haven't all run out to buy Office 2007, but I think your reasoning that ODF is beating OOXML is pretty far-fetched. And, as has been mentioned by someone else here already, once you take .DOC files into account (for the reason I mention above), your numbers don't hold water at all.

In fact, it's likely that a lot of OpenOffice users are saving to .DOC when they upload files for common use, so your numbers could even be detracting from your argument all the more.


614.15. Ben Langhinrichs
(08/17/2007 12:44 PM)

e.p. - I don't know. I wish I had tracked those as well. Still, even then, I don't know what it would prove, since I don't know how fast that grows normally.

Dogboy - I don't follow your argument at all. I am not arguing that people aren't buying Office 2007. I imagine they are buying it in droves, or upgrading as part of their Enterprise agreements. My argument is only that the OOXML files aren't following the Office 2007 use, and I think you are completely correct as to why. People are more likely to use the binary formats or PDF than OOXML. And yes, I imagine a fair number of OpenOffice users are saving to MS binary formats as well, but it doesn't stop the fact that there are ten times as many ODF files being posted as OOXML files, even though Office 2007 usage may well be much higher than OpenOffice and StarOffice and the rest combined. Maybe that is due to Google Docs or maybe it is a government thing, but it is still true.


614.16. Dogboy
(08/17/2007 01:45 PM)

Yes, there are ten times the number of ODF files posted as OOXML files. But again, how many .DOC files are there? Comparing OOXML and ODF is fine, but it doesn't paint an accurate picture of usage (either of open formats or of office suites in general). Therefore I wonder if it's worth comparing at all.

Personally, I use OpenOffice for some things and Office 2007 for others. I like and hate them both in various ways. And I've done some file generation coding using ODF, and I'll probably end up doing the same with OOXML as its adoption becomes more widespread. So I know the basic ins and outs of the formats (not nearly as well as some, but better than the average joe).

I didn't mean to imply that you had said fewer people were or weren't buying Office 2007. I was the victim of my own lousy phrasing -- sorry about that. What I'm trying to say is that eight months isn't enough time to measure adoption of brand-new, replacement file formats for something as pervasive as the Office binary formats. If it were, say, proprietary label printing software, you could argue that lack of adoption indicates that it's not taking off properly. But it seems to me that something as big and everywhere (and corporate) as Office can't be measured this soon. I suspect home users aren't the ones posting the lion's share of binary (or ODF) files right now --

it's corporate folk, churches, schools, and so on, and none of those is likely to jump to roll out a brand new format if they have something that works and is paid for.

ODF folks use ODF software not only for its function but as a point of pride and advocacy. So it follows that they would proudly post native file formats more regularly than, say, a Word user.

I'm not sure we're arguing or not -- maybe you're not implying as much with your numbers as I inferred (so to speak). If that's the case, I apologize. But really, I just don't think this is a big Friday smackdown on OOXML.


614.17. Me
(08/17/2007 05:31 PM)

How about adding the numbers for DOC, PPT, XLS. The point being that at some point all of the docs will end up being DOCX. PPTX, XLSX


614.18. Me
(08/17/2007 05:32 PM)

XLS 14,600,000

PPT 13,900,000

DOC 45,200,200


614.19. Ben Langhinrichs
(08/17/2007 05:45 PM)

The counts for .doc, .ppt and .xls are beside the point. At that rate, why not list counts for .htm (1,580,000,000) or .html (4,220,000,000) and conclude that .html is a more popular format that .doc by 100 times. The whole point of the original was very focused on ODF and OOXML, not on open source vs. Microsoft. In case you folks have not noticed, Genii Software is a commercial ISV selling software to corporations. I am not an open source zealot, and my major interest in ODF is for its use in Lotus Notes, where it isn't free.

Anyway, it is not a foregone conclusion that all those .doc, .xls and .ppt files will eventually be in OOXML formats. Microsoft is competing with itself, and the binary formats may win. Similarly, somebody else may win. I know people tend to think MS Word is too dominant to ever be replaced, but people thought this of WordPerfect at one point. See many WordPerfect documents recently? They thought Lotus 123 couldn't be replaced either, but Excel seems a bit more popular now, wouldn't you say? So, Microsoft understands that dominance is not guaranteed. The number of documents in a format on the web is not a good metric, but it is not completely meaningless, just a single metric. Take that for what it is worth. I am not arguing that people are not using Office 2007, as I don't know about their adoption rates. I just know that on the web, ODF is doing surprisingly well, and I don't know the reasons, although I can guess a few.


614.20. RMD
(08/17/2007 08:37 PM)

Hmmm. Seems to me that the only thing this suggests is that the whole ODF/OOXML "Debate" is one that only a very small number of people really care about.

Most people still save their documents as the native Office file format, whether that's .doc or docx. And this works great since virtually everybody uses Office.

It is you who are spinning.


614.21. Ben Langhinrichs
(08/17/2007 09:03 PM)

I am an ISV developing tools for both ODF and OOXML, but I need to figure out which to focus on. Microsoft claims that the whole world is going to OOXML, while others claim that the whole world is (or should be) going to ODF. It is not a matter of spin, it is a matter of finding some independent measure that is not one side or the other saying "It is obvious..."

Of course the majority of the world is still using MS Office 2003 binary formats. That basically goes without saying, but I'd be happy to restate it if you like. I just have no interest in developing tools or solutions in that area, so I am asking the much more focused question, which XML format is worth developing for. If MS Office users are going to keep using binary formats, but others will use ODF, I should develop for ODF, even if it is a small part of the overall market. I am only interested in the market share of ODF vs. OOXML, not the market share of the binary formats or HTML or anything else. Just because people jump on one bandwagon or the other on slashdot does not mean that they are reflecting my intent.


614.22. Bonzo
(29.08.2007 07:23)

There is an issue that is not presented on this page so far: MS Office 2007 is not using OOXML, but rather MSOXML as its XML format. However, MSOXML is a (not fully but partly) implementation of OOXML. Maybe you are measuring MSOXML instead of OOXML?


614.23. Rene Knuvers
(02/19/2009 03:15 AM)

At this moment the OOXML files have outnumbered the ODF files by far (using google to count them) so back in 2007 it was probably the issue of MS Office being penetrating the market too slow. note that web-distributed documents are mostly created by enterprises, and they tend to swap software at the latest moment. Even today many companies do not use '07 office!

I think however the whole ODF/OOXML discussion gave a great boost to OpenOffice.org and with the 3.0 version advantages over MSO07 it works out well. Now we ended up with two 'standards' though, of which non I can use to distribute my documents to the world... Since we use OOo, we'll probably stick to ODF for internal and PDF/a for external documents, and deliver either legacy MSO03 or OOXML/MSO07 files on demand...

Oh well, at least we HAVE a standard now! ;)


614.24. Raymond Didriksen
(2009-04-23 08:58)

Want to know how to find out on how many people are opeining an excel document.