Showing posts with label Debian. Show all posts
Showing posts with label Debian. Show all posts

26 September 2012

Worth repeating; Trust and Open Source

I first encountered Mark Shuttlesworth in person at an Ottawa Linux Symposium a few years ago, and passed along a reply from Dag, responding to some controversial comment Shuttlesworth had made at the time.  I choose not to use Ubuntu or Debian as my primary X desktop, but that said, there are 6 machines running one of those two distributions powered on in my office at the moment, so I am not a stranger there, either
He was being 'up front' about the fact that Amazon search results are being trialled for an upcoming Ubuntu version
He points out, and it bears repeating, the following:
[Question: ] Why are you telling Amazon what I am searching for?
[Answer: ] We are not telling Amazon what you are searching for. Your anonymity is preserved because we handle the query on your behalf. Don’t trust us? Erm, we have root. You do trust us with your data already. You trust us not to screw up on your machine with every update. You trust Debian, and you trust a large swathe of the open source community. And most importantly, you trust us to address it when, being human, we err.

The boldface are important, but I carry the context as well here.  When you use any computer operating system, you in the role of: user are  implicitly placing trust in the decisions and the commitment of those who put it together to 'do the right thing', or to make it right when things go awry
Do you trust your vendors?  Your actions may be pointing out a dissonance, if you said: no

24 June 2010

Debian mkfs is working again

It's been a long June. I noticed early on that an update in Debian testing had moved mke2fs from one package to another without getting all the library dependencies right. As such I spent June without the ability to lay down a filesystem on a new partition with the 'proper' tool. Part of my series on logfile reading includes a task to review the 'percent full' for each partition (and to relocate or clean out fat ones) to avoid running out of room in a self-inficted denial of services attack

I tried the obvious fallback to build a new filesystem: busybox but the version found in Debian Testing was lacking a needed build time switch. I filed the bug, and considered a local patch, or perhaps whether to rebuild of part of the chain needed to fork mkfs for a bit, but my need for space to reorganize a host's files was not that great nor urgent. Just pesky each day to see

I knew from reading the bug reports that the fix had been committed and 'ageing' in the Debian fashion to its move from an Unstable 'nightly' to a mildly tested (or at least not black-balled) state and promotion into Testing

nfs2:~# apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages have been kept back:
ksysguard libdevmapper1.02.1
The following packages will be upgraded:
bsdutils e2fslibs e2fsprogs iptables iso-codes libblkid1 libcomerr2
libenchant1c2a libffcall1 libmime-tools-perl libnetpbm10 libss2 libuuid1
lockfile-progs mount mutt netpbm shared-desktop-ontologies util-linux
19 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
Need to get 9,841kB of archives.
After this operation, 115kB disk space will be freed.
Do you want to continue [Y/n]? y
...
nfs2:~#

I've been running repository data update operations daily .. the Debian approach is more measured in its pace than we use with CentOS, and I think we may have something to learn there. It is a rare package update that cannot wait for a daily repo data update, push and mirror overnight in our space, and it would avoid much confusion to casual sysadmins

Those bolded packages in that clutch of upgrades looks promising ...

nfs2:~# mkfs /dev/sda12
mke2fs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
237568 inodes, 949835 blocks
47491 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=973078528
29 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
nfs2:~# date
Thu Jun 24 10:13:17 EDT 2010
nfs2:~#

Lovely; I'm back in business

01 April 2009

I propose that women have 28 teeth

teeth to count
Why have men more teeth than women?
By reason of the abundance of heat and blood which is more in men than in women.
  -- "Of the Teeth.", Aristotle

One of the mysteries behind the quote above, was why Aristotle did not simply find a near-by woman, and ask her to permit him to count her teeth

How do we know what we 'know' to be true? The difference here is of course that between 'deductive' and 'inductive' analysis

Political 'debate' and flame wars on which Linux distribution (package manager, editor, MTA, and so on ad infinitum) is better, often degenerate to deductive reasoning from a firmly held (perhaps from ideological basis, perhaps from prior experience) 'Theory'. Then one is to state a testable 'Hypothesis', and actually perform field or experimental 'Observation' to validate or disprove that hypothesis, and finally, reaching a 'Conclusion' that the Theory is supported or not. Aristotle omitted the critical stages of testing his hypothesis, and so fell into error with his assertion. Pure reason lead him astray

It is just as easy to fall into error from the inductive reasoning side. I have noted for many years now that in early February, I see newspaper reports that the groundhog ("Punxsutawney Phil") is reported as seeing his shadow (consider the hints from the Bill Murray movies, 'Caddyshack' and 'Groundhog Day'). That he sees his shadow seems to cause Winter to continue for six weeks or so

The cardinal birds also must read the newspaper and observe the shadow sighting report in timing their return to north of the Mason-Dixon Line. When the timing is right, the cardinals return to my town. It takes a week or two, but once the cardinals have reported back to the southern over-wintering havens, the robins follow them

The return of the cardinals also cause the forsythia bush out back to bloom (I suspect there is some needed chemical agent in the bird droppings). This is important because it needs to snow on the forsythia three times before it is safe to plant the vegetable garden to avoid the seedlings being frozen and killed

My chain of 'Observation' is most careful, taken over many years. A 'Pattern' emerged that I could see, and so I formed a 'Hypothesis' as to what was occurring. My 'Theory' seems to explain nature well. The 'inductive' results are of course completely wrong, untestable, and confuses co-incidence (sequentially timed events) with causation

The XKCD website has this:
Correlation
and if you are not reading that site regularly, you should be. We'll be using statistics soon enough here

At the end of all the back and forth about deductive and inductive methods, we have to end up at the conclusion that pure logic is but an organized way of committing error. Nothing can replace putting forth a testable hypothesis, and getting down and dirty in the data testing it to confirmation or refutation

Critical note. — Of a piece with the absurd pedagogical demand for so-called constructive criticism is the doctrine that an iconoclast is a hollow and evil fellow unless he can prove his case. Why, indeed, should he prove it? Is he judge, jury, prosecuting officer, hangman? He proves enough, indeed, when he proves by his blasphemy that this or that idol is defectively convincing — that at least one visitor to the shrine is left full of doubts. The fact is enormously significant; it indicates that instinct has somehow risen superior to the shallowness of logic, the refuge of fools. The pedant and the priest have always been the most expert of logicians — and the most diligent disseminators of nonsense and worse. The liberation of the human mind has never been furthered by such learned dunderheads; it has been furthered by gay fellows who heaved dead cats into sanctuaries and then went roistering down the highways of the world, proving to all men that doubt, after all, was safe — that the god in the sanctuary was finite in his power, and hence a fraud. One horse-laugh is worth ten thousand syllogisms. It is not only more effective; it is also vastly more intelligent.
  — The American Mercury. p. 75., Henry Louis Mencken (1880-1956)

broken idol
But then you get a lot of angry letters, from those whose clay idol you have smashed


edit: two typo fixes

26 March 2009

IPv6 eats kittens (and distcc) on Debian Testing

Flikr domo and kitten

This can only end badly

I spent a good 5 hours this week, tracking down a problem with distcc hanging up in our Debian Testing build farm. We use distcc to speed up compilation of the c++ sources in the development of the trading shim. Interestingly, our end user community forced us to this decision of developing on Debian testing, as they are using later gcc versions than we were on CentOS, and it was useful to be able to see their errors, BEFORE they reported them to us

On the new compile farm, sometimes we would get a compile in, say, 44 seconds; other times it would drag out for several minutes. This is a problem as we had just slotted a new unit into harness, and expected better results

In checking the logs in the client doing the distribution of compilation tasks, we were seeing a symptom of 'segfaults' in that client's process; other times, the client would stall, seemingly blocked waiting for a compilation result to come back from a remote buildfarm peer, that never came back. Checking on the remote build unit, one of the distccd children would die for mysterious reasons, leaving a message in the dmesg record. Once that failed build timed out, the needed file would be built locally, and the build proceed. Checking the log files nothing obvious jumped out

The obvious debugging technique is to get a minimal reproducer, and then to partition the problem into smaller and smaller possible causes using that reproducer tool. the issue will manifest on one setup, but not the other, ans so one can rule out more and more issues, until the answer is left, staring you in the face

Looking at my Debian helper tool, it had rotted, and was in sorry need of removal of some constraints: It did not use distcc when available; it did not use proper -J parallel compiles; it did not use -O3 optimization in the compiles. My test tool was not set up to see what I needed to see

Time to pay down some 'technical debt' (If you've not read martinfowler piece, and viewed Ward Cunningham's video, stop now, and do so). And so I made some payment there. After testing, I got these results:

MasterClientsElapsed time (real)
 pippin  nfs2, 10.16.1.231  0m23.281s 
 nfs2  10.16.1.231, pippin, localhost  0m23.702s 
 10.16.1.231  pippin, nfs2, localhost  0m22.551s 

My first thought looking at this: Well, that pretty conclusively rules out machine specific errors, or network path issues. It must be something different in the setup of the user provoking the issue that my tool does not duplicate. NOTE: This is wrong-headed, of course, as: 'An absence of evidence is not evidence of absence of a problem' but was an easy trap to fall into

For every complex problem, there is a solution that is simple, neat, and wrong.

  — H. L. Mencken

For every problem there is a solution which is simple, obvious, and wrong."

  — Albert Einstein

I tossed my results at that user for their thoughts on the results, and went back to work on another issue

Later in the day, doing some thought experiments with the user, we could not pin down where to look yet. But as a team, I had him provoke the issue with his setup, while I watched the logs on the various machines through several consoles. And the error appeared, and then jumped out and tickled my eyeballs. I was watching nothing in particular, until I saw the failure on process 29673, and then traced that back up. A successful and a failed session looked like this, respectively:


distccd[29673] (dcc_check_client)connection from :ffff:10.16.1.249:41771
distccd[29673] (dcc_r_file_timed)909179 bytes received in 0.078651s, rate 11289 kB/s
distccd[29627] (dcc_collect_child) cc times: user 1.132070s, system 0.144009s, 23039 minflt, 0 majflt
distccd[29673] (dcc_collect_child) cc times: user 1.092068s, system 0.104006s, 22481 minflt, 0 majflt
distccd[29673] (dcc_check_client) connection from ::ffff:10.16.1.249:41775
distccd[29673] (dcc_r_file_timed) 818437 bytes received in 0.071648s, rate 11155
kB/s
distccd[31248](dcc_check_client)connection from ::ffff:10.16.1.249:41779
distccd[31248](dcc_r_file_timed)886761 bytes received in 0.076688s, rate 11292 kB/s
distccd[29627](dcc_collect_child)cc times: user 1.068066s, system 0.112007s, 23890 minflt, 0 majflt
distccd[29673](dcc_collect_child) cc times: user 1.108069s, system 0.112007s, 22012 minflt, 0 majflt
distccd[29673](dcc_pump_sendfile)Notice: sendfile: partial transmission of 15868 bytes; retrying 344332 @15868
distccd[1995] (dcc_log_child_exited)ERROR: child 29673: signal 11 (no core)

A-ha! Now we know what to look for:


dhcp-231:/var/log# grep dcc_pump_sendfile distccd-transition-log
distccd[29673] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 15868 bytes; retrying 344332 @15868
distccd[31248] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 15868 bytes; retrying 586732 @15868
distccd[30262] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 15868 bytes; retrying 4655916 @15868
distccd[2005] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 16384 bytes; retrying 74824 @16384
distccd[2128] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 16384 bytes; retrying 286560 @16384
distccd[2170] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 16384 bytes; retrying 97440 @16384
distccd[2129] (dcc_pump_sendfile) Notice: sendfile: partial transmission of 16384 bytes; retrying 301000 @16384
dhcp-231:/var/log#

The TCP process of shuttling code to compile, and the binary results of such compiles are failing the same way, over and over again: partial transmission of 15868 bytes is present every time. Looking at the log entry again, the form of the connecting hosts is unusual: ::ffff:127.0.0.1 and ::ffff:10.16.1.249. Why that is IPv6 notation? And I reach back to my logs as I remember I had an issue like this a year or so on a Debian box

And so, Google with the search argument: debian ipv6 distcc confirms as its first result: 1. #481951 - distcc: zeroconf support broken wrt IPv6 - Debian Bug ... ... and the bug is still open. Killing off IPv6 is the obvious next step, and so, back to Google with: debian disable IPv6 to find: Disabling IPv6 under a 2.6 kernel. Reading the post, there is some back and forth, and the answer seems to be, there is not an 'official Debian answer', but this is what people are doing. Back to Google with: site:debian.org debian disable IPv6 seems to confirm that there is not a single well documented answer which has floated up in Google's searching

Compare: CentOS addresses the matter directly, and as the first Google hit with: site:centos.org disable IPv6
7. How do I disable IPv6?

* Edit /etc/sysconfig/network and set "NETWORKING_IPV6" to "no"
* Add the following to /etc/modprobe.conf :

alias ipv6 off
alias net-pf-10 off

* Run chkconfig ip6tables off to disable the IPv6 firewall
* Reboot the system

Alternative (which might be easier and works on any release with /etc/modprobe.d):
echo "install ipv6 /bin/true" > /etc/modprobe.d/disable-ipv6


Sadly, there is something else on Debian testing in play as well, and it is not just an IPv6 issue (although turning off IPv6 has drastically reduced the frequency of the issue). When I look in today to make sure the 'fix' is working


[74988.951989] distccd[8671]: segfault at 1 ip 7fdd2250e030 sp 7fff2b025da8 error 4 in libc-2.7.so[7fdd22493000+14a000]
[74989.017836] distccd[8651]: segfault at 1 ip 7fdd2250e030 sp 7fff2b025da8 error 4 in libc-2.7.so[7fdd22493000+14a000]
[74989.518050] distccd[8664]: segfault at 1 ip 7fdd2250e030 sp 7fff2b025da8 error 4 in libc-2.7.so[7fdd22493000+14a000]
[74994.152461] distccd[8659]: segfault at 1 ip 7fdd2250e030 sp 7fff2b025da8 error 4 in libc-2.7.so[7fdd22493000+14a000]

Where is that coffee cup? I knew this would not end welldomo eating a kitten