27 September 2010

Unit test shepards

I read with interest over the weekend this unit testing and TDD blog post from Douglas Hubler. I met him in real life a few weeks up in Chicago at the annual ClueCon, and was very impressed

I tracked down his email address and started to write a private email, but then as I re-read my draft and his piece, I noticed that it was a 'talking draft' by him. As such, I decided to surface my thoughts here


Hi -- Russ herrold (ex CentOS) here -- we met at cluecon

You put your finger on the problem well here:

"Project Maintainers" were always in fear of holding the bag on contributions that introduced bugs while not advancing their employer's goals

which is the well known 'capture by the employer problem' in FOSS. I am not saying (and would never suggest) that employer sponsorship of an interested 'Project Maintainer' is undesirable -- just the opposite, as it funds getting SOME motion in some cases (i.e., when it suits the employer's goals, or is not a clear 'CLM' -- career limiting move). Of course this path leads to 'freeze ups' similar to what we see in Debian Stable, where nothing short of dynamite (or a working remote exploit) seems to work to pry some forward progress into the main trunk

I put on my 'agile' thinking cap, to scope out the implications of your post

To work, the "Unit Test Sheppards" need to have global mandate to commit at least unit tests, via a Version Control System, and there needs to be a working Continuous Integration server. If this 'breaks the build' either the test is wrong, or the code is wrong. In the first response to 'breaking the build' the CI server has to revert the test, and file an exception report, to be owned by the UTS in the first instance, with a CC to the PM

This gets a 'heads up' in front of the PM, and a careful UTS will at a minimum either: 1) acknowledge that the test was ill-considered, withdraw it, and close the bug; 2) amend their code to correct misunderstanding that resulted in a broken test and re-attempt the commit [closing the bug, with the possibility of a 're-entry' of a new bug on the revised test], or 3) add documentation to the bug filing that indicates why the test is right [perhaps something as simple as pointing to a release target milestone, or part of the Requirements document] in preparation to handing the bug off to the PM (staying on the bug as a CC), and handing it along to the PM's queue

One problem is that when there is only a single PM, there is also only a single point of blockage, and 'real life' intervening, or a work-plan to do a substantial refactoring (perhaps even already partially working in a private tree), or even a non-public agenda on the part of one's employer may prevent the PM from ** wanting ** to respond 'just now' if a well-form test and bugreport gets dropped on them ...

... but if the unit test is 'right', usually it is proper to add to a test suite. I put to one side whether one should run all unit tests every pass; Tests do rot and one may well need to trim obsolete tests away, or refactor old ones to match code reorganizations; clearly one answer when the suite gets 'too big' is to start prioritizing, adding stochastic selection to generally omit tests related to rarely encountered failure modes and so forth

But a well written test never fully 'goes away' by default. At some predictable interval, of course, the 'full boat' of ALL tests, as well as more rigorous end to end functional tests are needed. Beck's TDD book glosses over this to some extent as his focus was development, but 'testing' means much more than 'unit testing'

One additional avenue toward a solution would be to convert the single PM 'person' into a trellised PM 'role' or 'team' containing two or more non-affiliated project members

By and large, FOSS works better when there is a consensus approach to management of a resource. It is basic group dynamics that achieving consensus is easier in a small team, able to consult in the 'stand up five minute meeting, and to come to a tactical 'what is the simplest thing that we can do' to conform to a well-formed test, write (or adopt) the unit test, apply it, and move on ;) With only two people in the PM trellis, or a senior and a junior relationship, the group dynamics may result in impasse, which is only visible to the 'outsider' UTS as 'nothing is happening on this bug'

Lots of inter-person political approaches exist here, but ultimately and in most projects, there is a agreed-to Release Manager team with global commits, that has to be willing and able to 'take up the reins', intervene to intentionally 'break the build' in HEAD when an impasse continues 'too long' [I assume here a model of a stable release, and a developmental HEAD], and force the PM to respond (perhaps by relinquishing participation as a co-PM)

I don't have an obvious candidate solution to suggest here, as there as many approaches are possible, and I've seen the issue as a project lead, as well as a mere participant, and sometimes simply as a concerned onlooker

22 September 2010

lost in the bowels of Google Groups

A post I made earlier today to a mailing list seems to have been held up for an hour, even though I am a subscriber to the mailing list in question, have proper and meticulously preened DNS A, PTR, MX, and even TXT records, publishing SPF details properly, because of prior problems with Google's mailservice's erroneous markings of some pieces as 'spammy' in the past ...

Received: by 10.90.14.22 with SMTP id 22mr127029agn.36.1285171616911;
Wed, 22 Sep 2010 09:06:56 -0700 (PDT)
X-BeenThere: puppet-users@googlegroups.com
Received: by 10.91.83.8 with SMTP id k8ls391483agl.0.p; Wed, 22 Sep 2010
09:06:54 -0700 (PDT)
Received: by 10.150.51.21 with SMTP id y21mr255924yby.58.1285171614696;
Wed, 22 Sep 2010 09:06:54 -0700 (PDT)
Received: by 10.229.192.137 with SMTP id dq9mr33711qcb.14.1285167411800;
Wed, 22 Sep 2010 07:56:51 -0700 (PDT)
Received: by 10.229.192.137 with SMTP id dq9mr33709qcb.14.1285167411749;
Wed, 22 Sep 2010 07:56:51 -0700 (PDT)
Received: from bronson.owlriver.com (bronson.owlriver.com [198.49.244.50])
by gmr-mx.google.com with ESMTP id
c41si5677929qcs.12.2010.09.22.07.56.51;
Wed, 22 Sep 2010 07:56:51 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
herrold@owlriver.com designates 198.49.244.50 as permitted sender)
client-ip=198.49.244.50;
Received: from localhost (localhost.localdomain [127.0.0.1])
by bronson.owlriver.com (8.13.8/8.13.8) with ESMTP id o8MEumOR020433
(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
Wed, 22 Sep 2010 10:56:49 -0400
Date: Wed, 22 Sep 2010 10:56:48 -0400 (EDT)

... anti-spam measures, one assumes. I understand taking such measures, but sure wish the scoring 'downticks' Google was marking were published and findable (compare, to the good: AOL's current practices)

But then, I am told from time to time that my world view and some of my approaches are 'too utopian'. Humph -- a little bit 'utopian is all right, but one can overdo it? Who knew?'

Change control in operations

This crossed the puppet-users mailing list earlier today:

We have an engineering environment of around 200 CentOS servers, plus a production environment of roughly the same size. Currently, when we roll out a new server, we do a 'yum update' so the new server has the latest packages; however this means that just about every server has a different set of package versions - a system rolled out today will have different versions from one rolled out last month, and that will have different versions from one rolled out last year.

...

Has anybody else been faced with this problem, and if so, how did you resolve it?

Let's consider just the problem of 'package version skew' in operations, and come up with a solution for it. [The questioner is also 'starting' with a couple of deployment targets that vary over time because of a poorly considered 'start image' creation ... An obvious approach here is to have a couple of stable base deployment image, and a set of defined transforms to produce a basic engineering workstation or server, per to specification, and is largely uninteresting here]

  1. Set up a local mirror of the centos external mirrors, and call it 'incoming'
  2. Optionally, set a sub-mirror of 'incoming' called 'vault', and mirror in a fashion that does NOT delete old content no longer present on 'incoming'
  3. Set a third mirror called 'testing', which 'picks and chooses' selected packages to test, and their dependencies (see the package: yum-utils for some tools to permit confirming that one has 'closure' of those dependencies)
  4. Test on your pre-deployment 'bench' against 'testing' until you have a change-set you wish to deploy throughout the universe of your boxes under management. Obviously, several 'testing' mirrors can be set up, for differing classes of machines
  5. FINALLY, have a master distribution mirror called 'rtm' that has a change-set from a 'testing' mirror deployed to it. Remove the stock repository specification files from
            /etc/yum.repos.d/ 
    and deploy local variants to taste, that point at 'rtm'. Again, several 'rtm' mirrors can be set up, for differing classes of machines

Something like this to ensure coherency of a enterprise wide deployment is usually mandated by a Change Control Board (explicitly, or implicitly). Obviously, other aspects of an IT policy document will attend to getting the various mirrors properly recoverable in one's backup strategy. [there, the 'testing' mirrors are often NOT covered, as they are ephemeral as to their usefulness, and recoverable out of 'vault' (top down) or from a 'rtm' (bottom up)]

21 September 2010

sitting in great connectivity ...

... sure makes a difference, seemingly

I do daily checkouts from the FreeSwitch project, and run the same build script on a CentOS box inside our local network (which is nominally down a data link that is 3 x T-1 wide), and another that is up at a data center, and has the ability to sustain a 3.5 GByte/sec transfer rate indefinitely (it has been the disaster failover site for the periodic 'Victoria's Secret' soft pr0n 'strut their stuff' webcast)

I synchronized builds on the two boxes yesterday, so they happened to be at the exact same checkout from upstream's version control system level. Today, I opened a couple of consoles, and fired off the build commands within a second of one another. The first part of that script is to checkout current to HEAD, and then off into the builds. I've marked the two units in alternating colors so the comparisons stand out better

Unit A:

Unpacking objects: 100% (38/38), done.
From git://git.freeswitch.org/freeswitch
184f395..f7d16ec master -> origin/master
Updating 184f395..f7d16ec
Fast-forward
libs/freetdm/src/include/private/ftdm_types.h | 2 +-
src/mod/applications/mod_spandsp/mod_spandsp_fax.c | 6 +-
src/mod/codecs/mod_codec2/Makefile | 14 ++
src/mod/codecs/mod_codec2/mod_codec2.c | 161 ++++++++++++++++++++
src/mod/endpoints/mod_sofia/mod_sofia.c | 23 +++
src/mod/endpoints/mod_sofia/mod_sofia.h | 1 +
src/mod/endpoints/mod_sofia/sofia_glue.c | 21 +++
src/switch_ivr.c | 4 +-
8 files changed, 226 insertions(+), 6 deletions(-)
create mode 100644 src/mod/codecs/mod_codec2/Makefile
create mode 100644 src/mod/codecs/mod_codec2/mod_codec2.c

real 0m1.105s
user 0m0.425s
sys 0m0.090s
/home/herrold/vcs/git/freeswitch

Unit B:

Unpacking objects: 100% (38/38), done.
From git://git.freeswitch.org/freeswitch
184f395..f7d16ec master -> origin/master
Updating 184f395..f7d16ec
Fast-forward
libs/freetdm/src/include/private/ftdm_types.h | 2 +-
src/mod/applications/mod_spandsp/mod_spandsp_fax.c | 6 +-
src/mod/codecs/mod_codec2/Makefile | 14 ++
src/mod/codecs/mod_codec2/mod_codec2.c | 161 ++++++++++++++++++++
src/mod/endpoints/mod_sofia/mod_sofia.c | 23 +++
src/mod/endpoints/mod_sofia/mod_sofia.h | 1 +
src/mod/endpoints/mod_sofia/sofia_glue.c | 21 +++
src/switch_ivr.c | 4 +-
8 files changed, 226 insertions(+), 6 deletions(-)
create mode 100644 src/mod/codecs/mod_codec2/Makefile
create mode 100644 src/mod/codecs/mod_codec2/mod_codec2.c

real 0m15.607s
user 0m0.168s
sys 0m0.096s
/home/herrold/vcs/git/freeswitch

One box is running an 386 kernel, and the other x86_64; memory is somewhat smaller on the x86_64. The 'horsepower' of each is roughly the same

Unit A:

[herrold@centos-5 ~]$ ssh freeswitch.pmman.com uname -a
Linux freeswitch.pmman.com 2.6.18-194.11.3.el5PAE #1 SMP Mon Aug 30 17:02:48 EDT 2010 i686 i686 i386 GNU/Linux
[herrold@centos-5 ~]$ ssh freeswitch.pmman.com free
total used free shared buffers cached
Mem: 6226068 4427212 1798856 0 303156 3936312

Unit B:

[herrold@centos-5 ~]$ uname -a
Linux centos-5.first.lan 2.6.18-194.11.3.el5xen #1 SMP Mon Aug 30 16:55:32 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[herrold@centos-5 ~]$ free
total used free shared buffers cached
Mem: 3072000 3036352 35648 0 291852 1790652

Unit A:

[herrold@centos-5 ~]$  ssh freeswitch.pmman.com dmesg \| grep -i bogo
Calibrating delay loop (skipped), value calculated using timer frequency.. 3990.15 BogoMIPS (lpj=1995079)
Calibrating delay using timer specific routine.. 3990.04 BogoMIPS (lpj=1995020)
Total of 2 processors activated (7980.19 BogoMIPS).

Unit B:

[herrold@centos-5 ~]$ dmesg | grep -i bogo
Calibrating delay using timer specific routine.. 6652.60 BogoMIPS (lpj=13305207)

Unit A:

[herrold@centos-5 ~]$ ssh freeswitch.pmman.com cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
stepping : 6
cpu MHz : 1995.224
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3990.44

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz
stepping : 6
cpu MHz : 1995.224
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 ssse3 cx16 xtpr lahf_lm
bogomips : 3990.02

Unit B:

[herrold@centos-5 ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz
stepping : 6
cpu MHz : 2660.050
cache size : 4096 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu tsc msr pae cx8 apic mtrr cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc pni est ssse3 cx16 lahf_lm
bogomips : 6652.60
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel(R) Core(TM)2 CPU 6700 @ 2.66GHz
stepping : 6
cpu MHz : 2660.050
cache size : 4096 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu tsc msr pae cx8 apic mtrr cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc pni est ssse3 cx16 lahf_lm
bogomips : 6652.60
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

But ...

Unit A:

Wrote: /home/herrold/rpmbuild/RPMS/i386/freeswitch-sounds-0.0.20100921.git-1.i386.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.3898
+ umask 022
+ cd /home/herrold/rpmbuild/BUILD
+ cd freeswitch-20100921
+ '[' /var/tmp/freeswitch-0.0.20100921.git.root '!=' / ']'
+ rm -rf /var/tmp/freeswitch-0.0.20100921.git.root
+ exit 0

real 17m56.699s
user 13m18.982s
sys 3m10.880s

real 24m50.468s
user 18m2.521s
sys 4m56.827s
[herrold@freeswitch freeswitch]$

Unit B:

Wrote: /home/herrold/rpmbuild/RPMS/x86_64/freeswitch-sounds-0.0.20100921.git-1.x86_64.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.90424
+ umask 022
+ cd /home/herrold/rpmbuild/BUILD
+ cd freeswitch-20100921
+ '[' /var/tmp/freeswitch-0.0.20100921.git.root '!=' / ']'
+ rm -rf /var/tmp/freeswitch-0.0.20100921.git.root
+ exit 0

real 27m27.666s
user 8m27.160s
sys 3m25.909s

real 48m25.064s
user 11m34.027s
sys 5m15.264s
[herrold@centos-5 freeswitch]$

That is, the older, 2GHz Xeon is running away from the newer 2.6 GHz Core Duo. Quite the discrepency there, but the numbers don't lie. Perhaps due to the local load of being a X-desktop on 'centos-5' [no local xen domU are presently running on it], and NOT running X on the remote server. Interesting 'food for thought' of a problem to research as to the why's and wherefore's on causation

12 September 2010

What do you discuss?

Great minds discuss ideas
Average minds discuss events
Small minds discuss people
  -- Eleanor Roosevelt

09 September 2010

office background noise

A question in IRC: Do you listen to music online?

17:07 =orc_orc> xmms is playing: Peshay / Pacific atm
17:08 =orc_orc> the library has more to 'rip' than I will ever be able to grow
tired of, for free
17:08 =orc_orc> NFS makes the OGG files available freely, throughout the LAN
17:08 =orc_orc> (through an RO export)

As I recall, I used 'grip' build under CentOS 4 to populate that music archive, which xmms randomly wanders through

07 September 2010

an interesting forgery

It is quite common for an online service provider to suggest adding their 'email sending address' to a end user, so that spam filters let pieces from know senders avoid spam filtering

This piece came in. Here are the headers:

Return-Path: 
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
bronson.owlriver.com
X-Spam-Level:
X-Spam-Status: No, score=-87.1 required=4.0 tests=BAYES_05,
HTML_IMAGE_ONLY_24,
HTML_MESSAGE,MIME_HTML_ONLY,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_PSBL,
SPF_HELO_PASS,
T_SURBL_MULTI1,T_SURBL_MULTI2,T_SURBL_MULTI3,T_URIBL_BLACK_OVERLAP,
URIBL_BLACK,URIBL_DBL_SPAM,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SC_SURBL,
URIBL_WS_SURBL,USER_IN_WHITELIST autolearn=no version=3.3.1
Received: from shadow.apd.hu (shadow.apd.hu [195.70.36.72])
by bronson.owlriver.com (8.13.8/8.13.8) with SMTP id o8224mbp009823
for <rpm@owlriver.com>; Wed, 1 Sep 2010 22:04:50 -0400
Date: Thu, 2 Sep 2010 04:04:49 +0000
From: Twitter <twitter-notification-rpm=owlriver.com@postmaster.twitter.com>
Reply-To: noreply@postmaster.twitter.com
To: rpm@owlriver.com
Message-Id: <6aba5bca4c284_51e06cbd75096ceb8@mx001.twitter.com.tmail>
Subject: You have 5 unread direct messages from Twitter!
Mime-Version: 1.0
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: Quoted-printable
Content-Disposition: inline
X-Campaignid: twitter20100902312977
Errors-To: Twitter
<twitter-notification-rpm=owlriver.com@postmaster.twitter.com>
Bounces-To: Twitter
<twitter-notification-rpm=owlriver.com@postmaster.twitter.com>
X-Envelope-To: rpm@owlriver.com
X-Munge: added X-Envelope-To
X-Orig-Subject: You have 5 unread direct messages from Twitter!
X-Loop: herrold@owlriver.com
X-ORC: antiloop

The body is heavily obsfucated HTML, but the clear text is:

HI, RPM.

You have 5 unread direct messages from Twitter!
http://twitter.com/account/messages/rpm/RKQYA-KU4GO-417167
[medicinete.info]

The Twitter Team

If you received this message in error and did not sign up for a
Twitter account, click not my account [medicinete.info].

Please do not reply to this message; it was sent from an unmonitored
email address. This message is a service email related to your use of
Twitter. For general inquiries or to request support with your
Twitter account, please visit us at Twitter Support
[medicinete.info].

Clever enough -- the "[medicinete.info]" is added by my MUA -- Mail (reading) User Agent, alpine, and so the link to a forged site is obvious. But the use of the forged sender address, and the fact that I have a global 'whitelist' pass rule on that mail server, rather than 'per user' pass rules for the custom spamassassin on this CentOS 5 box, means that the forgery was treated as though it was from a trusted sender and favorably scored 100 points

Of course there IS no such user 'rpm' here sending email, but that was scraped off a web page in the domain, and so it draws content from hopeful spammers

05 September 2010

"Okay, not a problem"

It drives me nuts in a store or when contacting telephone support somewhere, when the clerk or call center denizen replies to my social courtesy of thanking them for some service, to receive in return:

Okay, not a problem

D*mn it -- In such a circumstance, I have usually just made a purchase, or have previously paid good money to get their firm's attention. I could care less if they were pleased to not to have had to work hard doing their appointed tasks. I know darn well they are drawing some salary to boot

I rather feel that I am entitled, instead, to:

Thank you

or perhaps,

You are welcome and it was a pleasure

as the back and forth of the interaction suggests

Oh, yes, and "No worries" usually works just about as well with me, except when used as an affirmation that all is well