09:11 wxzy> Ugh. I think I should have chosen the X from pkgsrc.Yeah. Package 'Managers' that unroll tarballs, but lack a strong Requirements / Dependency system are like that
09:11 wxyz> The one I have now lacks fonts that applications need.
11 July 2014
The problem with older packaging managers in two lines ...
02 October 2013
About those doughnuts ...
Dollars to doughnuts, there is 'more than one roach' lurking.
I'll cover a bet that there are not tested backups in that shop
I had occasion to speak with that person again the other day. No backups, no thought to plan for such by that person's predecessor, and so by definition, no Disaster Recovery Plan 
I am looking forward to getting a box of doughnuts -- from the DK would be nice. Please make sure two Apple Fritters are in there, as the other coffee vultures always grab the one I wanted. The Doughnut Kitchen is close to Staufs
20 September 2013
win win habits
One of the convenience features of the PMman cloud product we run is the ease of communicating with 'tenants' on a given dom0. When I came in this morning, I had this notice waiting for me:
From: mdadm monitoringIt is not the end of the world -- there are four members in that raid array; all mutable data is backed up nightly, and it is easy enough to fix with a hot drive swap. BUT, we did a drive swap for a different raid member on that same unit within the last month. I want to totally remove that unit from production, and put it into 'on the bench' mode, so I can see if there is some deeper hardware issue
To: root@kvm-nNNN.pmman.com
Subject: DegradedArray event on /dev/md0:kvm-nNNN.pmman.com
This is an automatically generated mail message from mdadm
running on kvm-nNNN.pmman.com
A DegradedArray event had been detected on md device /dev/md0.
Faithfully yours, etc.
The neat part from my point of view is that part of our design included a way for contacting the tenant VM owners on JUST that box alone. It is as easy as clicking a couple buttons and typing
The following message will be sent to the to the list of users [on box: kvm-nNNN.pmman.com]. Here is the opportunity to fine tune the list.Emails go out, and a history item gets dropped into each VM's logfile as well as in our admin side logs. Optionally I can have the tool turn on an 'attention needed' flag in the end tenant's console, that will persist until they acknowledge it. We already do that as to 'too long between updates' and' 'too long between reboots' and such
From: support@pmman.com
Subject: Raid array member maintenance needed
Notice Level: N
Message: We have had some raid array member failures on the underlying dom0 of a machine you run at PMman. We replaced a drive hot three weeks ago on this same unit, and now have a new failure.
This may be a portent of a failing drive control subsystem, rather than the drive (although the previously removed drive tested bad and has been RMA'ed)
One new feature of PMMan of which you may not be aware is the 'no extra charge' weekly reboot and Level 0 backup. This is called a 'cyclic' backup with a 7 day repetition interval. To the extent you have NOT enabled this feature, I strongly suggest you test it and take advantage of this enhanced functionality of the interface
It is accessible off the 'Backups' button of the VM control web interface. Feedback is welcome of course. A nice side benefit from OUR point of view of offering this to customers is that it enables us to do invisible migrations from one dom0 to another as the backup and reboot are occurring, and so 'clear out' a dom0 host of running client instance VMs
Thanks
-- Russ herrold
User list: ... [elided] ...
We can of course do invisible 'hot' migrations of machines around, but even safer is to encourage the good habit of encouraging tenant VM owners to take (and we automatically test) Level zero backups
Win-win
13 June 2013
Phone call: 'I've got this sick machine ...'
them: yum complains about a missing signing key
me: so install the key; it is down in /etc/pki/rpm-gpg/, and rpm --import ... will do the trick
them: that directory is not there
me: who set up the machine?
them: well, I was handed it, and ...
me: so, take a level zero backup and then clean up the machine before trying to work on it, or deploy a new one
them: well, I can't
I just got off that call from a friend in a new employment situation
The technical fix was outlined by me long ago, and I sent an email with the link along to the person calling
BUT: Fixing the mindset inside the caller's head: do not try to work in a undefined (here: broken) environment is harder
But the caller has a problem in their work-flow process; a fix has to be done; sooner is probably better than later; a broken machine in production is 'technical debt', pure and simple. Fundamental expectations are not met; binary partition will not work well to isolate problems, as more than one piece is probably broken. It will break again, and a perception may well form that the caller may be the problem, rather than the broken environment they were handed
Be sure to make a note to yourself to also address the broken process that permitted that machine to escape into production. Dollars to doughnuts, there is 'more than one roach' lurking. I'll cover a bet that there are not tested backups in that shop
04 January 2013
Another pet died across the holidays
- cPanel administration with multiple accounts in a single host without protections
- OS Updates not being run
- WordPress updates not being run
- Random add-on's being used without an awareness of security issues
- No SELinux (disabled)
The absence of good sysadmin skills, well packaged content, and updates 'for the loss' ...
27 September 2012
Feeding the pet
We spent a couple of hours looking into it. And then a couple hours looking into the WordPress security notification system. Perhaps, I should say: non-notification system as to getting subscribed to a formal notification mailing list from the WordPress folks, proper
The WordPress model seems to be: treat your WordPress site as though it is a pet that needs daily feeding. And to be 'put down' when you lose interest in it, move on, or forget about it -- Oops. Log in daily as an administrator, and look for a notification
that you need to apply the 'latest and greatest' update. Run the update process manually whenever it appears. Oh yeah, did you remember to take a backup FIRST, and test that you can roll back to it if the 'update' breaks anything? Oops
This of course RULES OUT using a packaged approach to managing such sites, as the lag for stabilizing a new RPM package, accounting for potential database changes, and the like 'take too long'. Just unroll a tarball, and trust that it will not break any local customizations
I see fourteen open tabs in my browser panel still open, related to trying to track down a central and formal notification feed that I (or any person seeking to get 'push' notification) might subscribe containing only 'Security' notifications. Weeding through the tabs, ...
- The 'Famous 5-Minute Install' for WordPress -- Nope, no useful outlink for hardening, nor to subscribe to notifications, beyond a pointer to a third-party Ubuntu appliance with an 'automatic security updates'. That appliance's page has pointers to a tool to enable taking database backups, adding PHPMyAdmin, and Webmin. Not good choices for a person caring about security
- Perhaps FAQ items tagged with: Security -- Nope, clearly incomplete, as for example a Google search turns up this third-party alert for version 3.3.2, but the Release Notice does not get titled with: Security
- This bug (#10253) lingered for three years with a Security tag in their Trac issue tracker as to the current release series (3.4), and was amended ten days ago; But the latest release (for 3.4.2) was twenty days ago when this is written. Should an update have been release? Who Knows?
- Perhaps their FAQ Security -- Nope, no push notification link suggested there, but lots of clutter as to copyright infringement notification handling, and miscellaneous topics
- Perhaps watch the Releases News in an RSS reader - Oops, no sub-tag feed offered, and there has not been an "Important" Security release since December 2010, if one used that approach
- Run a Google search daily, and look for third-party commendary - Nope, although nuggets may be found, for it is not viable as: Not Authoritative, irregular and partial as to updates, and wading through search engine hit, or RSS feed clutter will kill your productivity
A quick Google search for: turns up lots of vulnerable candidate installations, and a handy, dandy code fragment for parsing information out of potential victims so found, to automate take-overs. No criticism of the author of that code publishing his work; a knife can heal (as a scalpel), prepare dinner, or injure, depending on the intent of its holder
I see an official recovery outline suggestion, anyway
19 July 2012
Right, Left, down the middle
A couple weeks ago, a 'Derecho' blew through Columbus, on its way to the metro DC area. Amazon had some failures that cascaded through to people who did not have site redundancy. People know that the East Coast was hit hard, but as we are out in 'fly-over' country they did not perhaps realize that we had several hundred thousand people around here without electricity for a couple weeks as well
I've mentioned before that the primary datacenter that we run our PMman product out of is at the Tier IV level -- multiply redundant cooling, power grid, power backup, fiber entrances, carriers. The owner, a friend, is just a fiend that he does not HAVE outages
Me, too. In our after-event review, I see that one of our secondary sites here in town fell back to its generators, but the rest were all fine. But all sites we use are well covered, all fiber, all multi-homed. Planning for failure was in our deployment planning checklist; we pay for (and we charge for) that coverage; and I consider it worth it
A national footprint customer based in Canada agrees. And their lead technical person reports that our connectivity is haster than their datacenter eighty miles from their home ofice. Not surprising, as oAltantaur main DC is on a 'main line' fiber route between Chicago, NYC, DC, and Atlanta -- financial markets and federal government presences can help, that way
If the availability of your online presence matters to you, feel free to ask for a quote
13 April 2012
LOPSA at the PMman DC
I went up to a meeting at our North datacenter for PMMan, where local group of system admins held a meeting, starting up a local LOPSA chapter. Food and soda were provided by the DC operator, along with salad ... since when did sysadmins starting more healthy food, rather than a diet of high sugar, high caffeine junk food?
The presentation slide deck was fine, and the presenter (a 'long timer' at a local credit-card clearance operation) ran through his bullet list of what to look for in the 'build vs. buy (lease space)' decision, and then a number of siting concerns.
Now I am familiar with his firm's site from prior visits, and it is adjacent to a major highway with regular closures for accidents; adjacent to a major rail yard where chemical spills have caused evacuations; and sole serviced into the power company grid
Our North site was chosen after a survey of all offerings within a radius we were willing to drive to for 'end of the world' 'hands on' intervention; is jacked into two independent power grids along with the on-site generators, is a premier demonstration location for the former Liebert (now owned by Emerson) power and site conditioning
I happen do drink coffee daily at Staufs with Liebert's representative here in town. My evaluation team suggested the location as a finalist, and when I checked, it turned out that I already knew the owner / developer from long, long ago telephony days, and when I have time, I'll go up and 'shoot the bull' with him on Saturday mornings at the DC
We have had a grand total of ZERO power related outages, and only one network connectivity issue in the last three years, that lasting less than 15 minutes, and that, due to human error in not handling a BGP fail-across migration properly [the cut-over protocol was changes, as I noticed the drop from my monitoring and called the owner's cell phone at once ;) ]. Well suited to our 'enterprise' customers
It is 'carrier neutral' and hugely connected -- multiple entrances of up to 88 x 100 G fiber spread across six or seven principal carriers. Native IPv6 to all drops we run through multiple carriers, along with the IPv4. I helped with the IPv6 design and cut-over some 18 months ago, and it has been seamless. The facility, and our services, just do not have outages except that human error causes
It is not the cheapest in town ... but it is fairly priced for the value we have received
I had not sat down and reflected on how satisfied I was with that shift of our center of operations to the DC, but as I think about it, I am well pleased