02 October 2013

About those doughnuts ...

In a recent post, I closed:
Dollars to doughnuts, there is 'more than one roach' lurking.
I'll cover a bet that there are not tested backups in that shop
I had occasion to speak with that person again the other day. No backups, no thought to plan for such by that person's predecessor, and so by definition, no Disaster Recovery Plan
Staufs storefront
I am looking forward to getting a box of doughnuts -- from the DK would be nice. Please make sure two Apple Fritters are in there, as the other coffee vultures always grab the one I wanted. The Doughnut Kitchen is close to Staufs

20 September 2013

win win habits

PMman was designed for long lived VM's (compare contra: ephemeral AWS or OpenShift type instances) , and so our communication needs vary from other cloud instance control interfaces.  This applies for the client tenant side views, but also for the sysadmin / devop side view

One of the convenience features of the PMman cloud product we run is the ease of communicating with 'tenants' on a given dom0.  When I came in this morning, I had this notice waiting for me:
From: mdadm monitoring
To: root@kvm-nNNN.pmman.com
Subject: DegradedArray event on /dev/md0:kvm-nNNN.pmman.com

This is an automatically generated mail message from mdadm
running on kvm-nNNN.pmman.com

A DegradedArray event had been detected on md device /dev/md0.

Faithfully yours, etc.
It is not the end of the world -- there are four members in that raid array; all mutable data is backed up nightly, and it is easy enough to fix with a hot drive swap.  BUT, we did a drive swap for a different raid member on that same unit within the last month.  I want to totally remove that unit from production, and put it into 'on the bench' mode, so I can see if there is some deeper hardware issue

The neat part from my point of view is that part of our design included a way for contacting the tenant VM owners on JUST that box alone.  It is as easy as clicking a couple buttons and typing
The following message will be sent to the to the list of users [on box: kvm-nNNN.pmman.com]. Here is the opportunity to fine tune the list.

From:     support@pmman.com
Subject:     Raid array member maintenance needed
Notice Level:     N
Message:     We have had some raid array member failures on the underlying dom0 of a machine you run at PMman. We replaced a drive hot three weeks ago on this same unit, and now have a new failure.

This may be a portent of a failing drive control subsystem, rather than the drive (although the previously removed drive tested bad and has been RMA'ed)

One new feature of PMMan of which you may not be aware is the 'no extra charge' weekly reboot and Level 0 backup. This is called a 'cyclic' backup with a 7 day repetition interval. To the extent you have NOT enabled this feature, I strongly suggest you test it and take advantage of this enhanced functionality of the interface

It is accessible off the 'Backups' button of the VM control web interface. Feedback is welcome of course. A nice side benefit from OUR point of view of offering this to customers is that it enables us to do invisible migrations from one dom0 to another as the backup and reboot are occurring, and so 'clear out' a dom0 host of running client instance VMs


-- Russ herrold
User list: ... [elided] ...
Emails go out, and a history item gets dropped into each VM's logfile as well as in our admin side logs.  Optionally I can have the tool turn on an 'attention needed' flag in the end tenant's console, that will persist until they acknowledge it.  We already do that as to 'too long between updates' and' 'too long between reboots' and such

We can of course do invisible 'hot' migrations of machines around, but even safer is to encourage the good habit of encouraging tenant VM owners to  take (and we automatically test) Level zero backups


19 June 2013

I am not Harry Truman

I received a email from a customer, followed by a phone call, to the effect they had received huge number of email 'return' bounces to a general intake email address.  He and I have had this discussion before

I have written about email sender forgery (There is probably NOT an email account: godzilla@microsoft.com) and its fallout ("Customer: My cousin says that his email to me is not going through") before.  So let's take the time to think it through yet again

Takeaway: He wanted me to stop such pieces from cluttering their email box, but he is unwilling to have 'heavy' spam filtering

As a personal matter, and also wearing my sysadmin hat, I would like to stop seeing this cruft as well

But as a technical matter, it seems that it cannot easily be done without constant 'tuning' of rejection rules or some other rather serious matching of 'Message-ID' of pieces sent against return pieces offered.  An attempt to do so through filtering tools with no prior knowledge of Message-ID's sent, is to always 'play defense' against the spammers, without an ability ever score a 'win'.  The effort to match Message-Id's in offered return pieces is perhaps more promising

But, so far, no-one has been sufficiently vexed by it in the FOSS community to publish such a tool and to commit do doing do the ongoing 'tuning' of message parsers needed.  Perhaps we can design around it with existing tools, and amending our outgoing pieces by adding a certification that a given candidate email is truly from us

As a design matter, building a milter, writing some procmail rules, and parsing sendmail logs, probably into a database backend, as my first thought as to how I would approach the matter.  The database constraint is troubling, though.  I have other work that I need to attend to first, but I went through the thought process.  I memorialize that process in part in case someone is interested.  Even more, I will provide webspace, mailing list support, and a VCS gratis, if someone 'feels the itch'.  It would be useful to have, but is not urgent to attain -- Seven Habits Quadrant 2 or 4 stuff.  Absent such a volunteer effort or a paying customer, for me, Quadrant 4

Or, version two, a trusted cohort of outbound mailservers could build a MAC MIME Multipart attachment for each outbound message, and also a second MIME attachment that is cryptographically validated 'clearsign' of that MAC part.  Possibly bundle this up into a Multipart Related set of structured attachments.  Add these two new MIME attachments to all messages on every outbound piece.  The first part -- the MAC part -- would be based a hash of the message body, plus a timestamp of seconds since Epoch or such, and other optional entropy, to avoid forgery and replay attacks

Later then, when a putative return is offered, only accept for further processing those returns that had a validating pair of MIME attachments, produced based on a re-hash the message body in chief, and that MAC section's timestamp; and  that had previously clearsigned by it. Discard stale stuff, and non-validating content. This gets rid of the need for the database and simplifies the procmail rules.  A well-formed candidate return piece can carry around all that is needed to known to decide if one will pass a mail return message along to human eyes

Not free, as it will burn up compute cycles on every send, and a few more at return time, but also complete and under controllable locally so resistant to spammers.  Avoids the database requirement, so it can scale out. Most of the needed tools already exist as FOSS.  hmmm

The protocols governing what constitutes: email permit a sender to enter whatever 'return address, and 'sender address' they wish on a piece of email.  It is trivial to find a 'open' relay to accept email to send to any third party.  Consider the analogy:
  • All the while being careful to not leave a fingerprint or other biometric, I use cash to purchase a post card at the corner store, along with a stamp
  • I address it to someone of tender sensibilities, and assert that I noticed that their car was parked outside the local 'adult entertainment' establishment
  • I sign it: Harry S Truman
  • I enter a 'return address' of:

  Harry S Truman
  President Emeritus
  1600 Pennsylvania Avenue
  Washington DC  20500
  • I mail it

  • The recipient is outraged to find such a libelous assertion, visible for their letter carrier to see, and demands that the person who did so be identified, and stopped.  Also, for good measure, they want the Postal Service Inspectors to get on the matter to prevent such heartbreaking assertions to never happen again

    About all the Postal Service will offer to do in the usual case is to return the piece to its nominal sender. And he no longer receives mail at that address

    (I note parenthetically that the Postal Service DOES seem to scan images of ALL paper mail passing through their system)

    Stopping spam (here: bounce backsplatter and 'joe jobs') is just not going to turn out to have a durable, easy, and comprehensive solution, without re-thinking what we send looks like.  Spammers and legitimate receivers are in a 'arms race' and today's fix will rot if senders can re-engineer around the fixes.  If this state of affairs distresses a person greatly and until I can get that MIME solution going to test my hypothesis: stop reading email; hire a full time, 24x7 secretary to pre-read all email and toss the junk.; turn up the filtering and accept the false positives; grow a thick skin

    Or, of course, start coding and beat me to it

    13 June 2013

    Phone call: 'I've got this sick machine ...'

    me:  well, why it is sick?

    them:  yum complains about a missing signing key

    me: so install the key; it is down in /etc/pki/rpm-gpg/, and rpm --import ... will do the trick

    them: that directory is not there

    me: who set up the machine?

    them:  well, I was handed it, and ...

    me: so, take a level zero backup and then clean up the machine before trying to work on it, or deploy a new one

    them: well, I can't

    I just got off that call from a friend in a new employment situation

    The technical fix was outlined by me long ago, and I sent an email with the link along to the person calling

    BUT: Fixing the mindset inside the caller's head: do not try to work in a undefined (here: broken) environment is harder

    But the caller has a problem in their work-flow process; a fix has to be done; sooner is probably better than later; a broken machine in production is 'technical debt', pure and simple.  Fundamental expectations are not met; binary partition will not work well to isolate problems, as more than one piece is probably broken.  It will break again, and a perception may well form that the caller may be the problem, rather than the broken environment they were handed

    Be sure to make a note to yourself to also address the broken process that permitted that machine to escape into production.  Dollars to doughnuts, there is 'more than one roach' lurking.  I'll cover a bet that there are not tested backups in that shop

    04 January 2013

    Another pet died across the holidays

    I wrote before about un-maintained and orphaned WordPress sites being exploited.  That same frantic user from two months ago, called again.  The TL;DR summary is:
    • cPanel administration with multiple accounts in a single host without protections 
    • OS Updates not being run
    • WordPress updates not being run
    • Random add-on's being used without an awareness of security issues
    • No SELinux (disabled)
    An exploit un-gzip-ping a hostile payload from cache was used, and the machine taken over

    The absence of good sysadmin skills, well packaged content, and updates 'for the loss' ...