Skip navigation.
KDE Developer's Journals

Why does Linux need defragmenting?

lubos lunak's picture

This so often repeated myth is getting so old and so boring. And untrue. Linux doesn't need defragmenting, because its filesystem handling is not so stupid like when using several decades old FAT. Yadda yadda, blah blah. Now, the real question is: If Linux really doesn't need defragmenting, why does Windows boot faster and why does second startup of KDE need only roughly one quarter of time the first startup needs?

Ok, first of all, talking about defragmenting is actually wrong. Defragmenting is making sure no file is fragmented, i.e. that every file is just one contiguous area of the disk. But do you know any today's application that reads just one file? The thing that should be talked instead should be linearizing, i.e. making sure that related files (not one, files) are one contiguous area of the disk.

Just in case you don't know, let me tell you one thing about your the thing busily spinning in your computer: It's very likely it can without trouble read 50M or more in a second - if it's a block read of contiguous data. However, as soon as it actually has to seek in order to access data scattered in various areas of the disk, the reading performance can suddenly plummet seriously - only bloody fast drives today have average seek time smaller than 10ms, and your drive is very likely not one of them. Now do the maths, how many times do 10ms (or more) fit in one second? Right, at most 100 times. So your drive can on average read at most 100 files a second, and that's actually ignoring the fact that reading a file usually means more than just a single seek (on the other hand that's also ignoring the drive's built-in cache that can avoid some seeks). Some of the pictures explaining how Linux doesn't need defragmentation actually nicely demonstrate that with files scattered so much the disk simply has to seek.

Now, again, how many files does an average application open during startup? One? It's actually hundreds, usually, at least. And since Linux kernel (AFAIK) at the present time has next to none support for linear reading of several files, you can guess what happens. Indeed, kernel developers will undoubtedly tell you that it's the applications' fault and that they shouldn't be using so many files, but then kernel developers often have funny ideas about how userspace should work and seriously, why do we have filesystems if they're not to be used and applications should compress all their data into a single file? For people who don't know about this (and most don't, actually) it feels kind of natural to structure data into files.

Nothing is perfect and just blaming kernel developers for this wouldn't be quite fair, but then it sometimes can really upset me when I see people "fixing" problems by claiming they don't exist. I am a KDE developer, not a kernel developer, so it may very well be that some of what I've written above is wrong, but the single fact that the problem exist can be easily be proved even by you:

Boot your computer, log into KDE, wait for the login to finish. Log out. Log in again. Even if you use a recent distribution that may use some kind of a preload technique that reduces this problem, there should be still a visible difference. And the only difference is that the second time almost everything is read from kernel's disk caches instead of the disk itself. Which avoids reading of the data and which avoids seeking. And the difference is the seeking, not the reading of the data: KDE during startup should be very unlikely to read more than 100M of data and that's 2 seconds with 50M/s disks - is the difference really only 2 seconds for you? I don't think so.

So, who still believes this myth that everything in the land of Linux filesystems is nice and perfect? Fortunately, some kernel developers have started investigating this problem and possible solutions.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
claes's picture

Simple test

It is quite easy to get an idea what files are read during login. Before logging in, wait at least one minute to let the access time expire on the files in your home directory (at least if you were previously logged in). Directly after logging in, execute

find .kde -amin 1

That will list all files accessed in the last minute in the .kde directory.
Right now, when I run it, it lists 254 files and directories.

zander's picture

Anyone profiled ReiserFS4?

Its one of the least known FSes out there (or should I say, least used?) but there are a lot of optimalizations in there that seem to be specifically targeted at this.
I had the pleasure of seeing Hans do a presentation at Fosdem a couple of years back and have to say he is an excellent speaker. But more importantly he went into things like compressing files into less blocks and moving data into the directory blocks to avoid seeks and otherwise moving things around to get higher data-density.

I'd be interrested how much time is gained by his techniques.

sad eagle's picture

This is very, very, true.

A while back I did a project for my systems class where I profiled the seeks done by the kernel on my machine during the KDE startup.
It turned out the file system did a horrible job: the files kbuildsycoca stat'd ended up split on two opposite ends of the disk,
with seeks alternating between the two after 5 files each or so....
http://www.cs.cornell.edu/~maksim/trace35.pdf is the picturel; in it a cross represents a request and the circle connected to it its completion (so you can see how long it took from the app asking for it, and the drive delivering it). X axis is time (ms, if I recall), Y axis has the LBA numbers -- roughly the head position along the disk.

To be fair, the machine was around forever, and being a development machine it had far heavier FS activity than normal, but still...

As a fact of the matter, I am not even sure this whole "anti-fragmentation" heuristic is actually a good idea: while it makes fragmentation less likely, it also can spread out files widely through the disk, increasing seek times. One can probably find that in a favorite install of $DISTRO, on a clean hard-drive, files KDE needs to start would be all over the disk.

And there is a further caveat to the whole use-one-file "solution":
the application does not have access to information on disk geometry and control over placement, so it does not have a way of structuring its indexing structures for good seek locality. And, of course, there is zilch guarantee that the one-big-honking-file, which often needs to change size, would not get badly fragmented, resulting in a whole lot of head ping-pong (though perhaps madvize(WILLNEED) can help with that...). And there are further complications because a file is a sort of a natural unit of atomicity. If one has multiple atomic units of information in separate files, suddenly one has to do all sorts of concurrency control in user applications --- whaaa?

The bottom-line in my view, really, is that the OS-provided file abstraction can not provide decent performance.

sapphirecat's picture

Proof seeks are costly

Last October, I did some experimenting with putting Gentoo on flash media. I don't have a snazzy BIOS that can boot off USB-Storage, so I built a kernel, put it in /boot, and ran it with root=/dev/sda1.

I did a little writeup at the time, but to summarize: Linux is slower at initializing USB-storage devices than IDE, so you have to wait 6 seconds for the card reader to come online, and the end result is a tie. Which I consider not too bad for a device with 1/4 of the linear read speed of the hard disk.

superstoned's picture

control

i think the most important reason why windows boots faster than linux is the amount of control microsoft has, and the fact they can simply assign someone a boring task and fix stuff. Windows DOES reordering of files, and linux could do it too - just nobody ever wrote it, and if someone did, you're right: he/she would have a hard time getting it into the kernel.

also in windows you have a base set of libraries, i guess - linux has a more diverse set of em, and so more has to be loaded. a clean KDE/Qt only system could be very fast, i think, but it gets bloated with GTK/Wx/etc stuff.

caglar10ur's picture

What about dog slow init-systems?

One possible solution (as you already know) may fcache (http://lkml.org/lkml/2006/5/15/46) for kernel space. Currently it works well but also has some limitations (supports only ext3).

But i think main problems are init-systems not disk defragmentation or drives seek time.

When we started to investigate what causing this so long boot times we found there is tons of realy ugly/unmaintained code laying on nearly every init system. And also coldplug/hotplug systems are really slow for same reason.

So we decided to give it a try with a high level language (in our case its python) instead of awk+sed+etc/bash or C. We start to change our init and cold/hotplug systems and here is the results;

http://cekirdek.uludag.org.tr/~caglar/blog/?file=mudur.blog

Pardus 1.1 alpha2 boots nearly in 16 seconds on my sony laptop (ide disk, 3400 rpm ) (our old init takes ~ 1:35), and with help of fcache logging into KDE takes 2 seconds most.

So i think there are some progress going on waiting to used by distros.

rudd-o's picture

But init-related optimizations are part of the story

and the other part, summarized in one sentence, is this: why do I have to wait up to 60 seconds for KWrite to start on my computer, when I have 768 MB of RAM? Sure, I got a ton of apps that I use and keep open all the time, but why does everything in RAM get pushed to disk if it hasn't been used in a couple minutes? And, more importantly, what is KWrite reading from the frigging disk that it takes so long to start up? It's a DAMN TEXT EDITOR!

vdboor's picture

The linker is the problem here

> why do I have to wait up to 60 seconds for KWrite to start on my compute

Because the GNU linker does a bad job with C++ code, and code that uses the same prefix for all it's functions (e.g. gaim_ in GAIM ). For the first few letters it uses a hashing algorithm, but then it starts with text-compares, comparing every function with the other one. Most of the startup time of KDE applications (also OpenOffice, Mozilla and apps like gaim) can be traced back to this problem.

ratta's picture

This is a big problem since longtime

I have heard this many time, how is it possible that it hasn't been fixed yet?
If not, why is it so difficult?

[If this post is offtopic please excuse me, probably we should start a separate thread about this problem]

helloworld82's picture

60 seconds to start?

There must be something wrong about you set up. I have 512 MB of rams

daniel@HelloWorld82 ~ $ time kwrite

real 0m1.342s
user 0m0.820s
sys 0m0.176s

So, starting kwrite is much faster for me on a system with less ram. I think your kde is screwed.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.