Every few months, new statistics about Linux users on desktop platform come out. The methodologies used for each one varies according to its provider. However, they all share one thing: Being wrong.
Measuring number of users who use a certain operating system on desktop is totally different thing from servers or other devices. E.g for web servers, you may have a list of static IP addresses which you can analyze and try to reach. You may check hosting companies or huge enterprises for additional data. Lot of methodologies can be used.
However, for desktop. It seems like most statistics providers don’t have any scientific methodology to rely on so far. What they depend on is that they try to make partnerships with some famous advertising networks (which include thousands of websites) and try to analyze the visitors of those websites to provide them with data.
Such methodology is so far from being accurate. This article tells you why.
The latest stats about Linux marketshare which reported that it’s around 2.36% was depending on an advertising network including 40000 publishers (or 160 million visit per month). Despite being true or not, that number is so small comparing to the actual web users. We are expecting at least billions of visits per month in order to take that statistic seriously. They are all coming from an ad network:
We collect data from the browsers of site visitors to our exclusive on-demand network of HitsLink Analytics and SharePost clients.
40K websites are actually nothing. If the numbers were coming from an advertisement network like Google Adsense-which includes millions of websites-I would say that the population is fair enough. However, for a small network with maybe local/unknown websites, the data is way out of consideration.
In order to have an accurate data analysis, samples must be large enough in order to get correct readings. If your sample is so small, you won’t get the real results. The problem is that this number is coming not from the most famous websites on the web (like HuffPost, Nytimes, Google, Facebook, Amazon..) which the normal user would visit each day. It’s coming from a small ad-network. Which is why it’s biased.
This methodology doesn’t keep in mind offline installations. There are millions of computers out there which uses Windows or Linux as a desktop operating system without an Internet connection.
Governments, local shops, schools, universities and many other institutions may not always offer Internet connection. Even if they do, they won’t visit those Internet websites. It may be only for updates and official work.
The widely-used methodology is depending on advertising networks and other tracking networks/scripts in order to provide data about visitors to those marketshare tracking companies. The data set coming from those advertising networks can’t be considered as a reliable source.
An advertising network data set may be not representative. It may include publishers from a certain country or a certain language only. It may include websites about a specific topic or category. It doesn’t represent all users on the web.
Users of a certain website which has millions of visits per month may only be Windows users because of the website’s targeted audience. Linux users may not even be there. So choosing some advertising networks and bringing data from them and creating a statistic about Linux users won’t work that simple.
It’s known for anybody working in the field that a good data sample must be representative for all the types of entities which you want to consider in your work. You can’t have that when you are depending on online networks to provide you with data.
Windows users are Windows users not because they downloaded Windows from the Internet. Windows users are Windows users because the OS came pre-installed with almost all the computers they have seen in their lives. This is not the case for Linux desktop. Keep that in mind.
In order to be a Linux user, you have to download the OS yourself from distributions’ websites. Of course, there are hundreds of laptops from Dell, System76 and other vendors which come with Linux pre-installed. However, the number of those is so tiny comparing to normal users who basically have been downloading the OS for the past 20 years.
Because of that, if you are a Linux user, someone who downloaded a complete OS from the Internet and installed it on his/her computer for a lot of different technical reasons, there’s is very much higher potential that you are using some browser addons to block tracking scripts and advertisement networks, or switching user agent. Which those statistics are depending on to provide data.
So far there doesn’t exist any accurate (or even close to accurate) statistics about Linux desktop users. All the used methodologies are not depending on correct scientific methods to measure.
While Linux desktop users are surly increasing, believing those numbers given by those tracking companies from time to time is not recommended. The actual numbers may be very much bigger or very much smaller. In both cases, asking for sources and questioning the used methodology to come out with such analysis is a must.
Update: The structure of the article was updated by moving the “Doesn’t measure enough” section into the top. And by adding an extra clarification about it.