Reposted with kind permission from Robert Moir
… I wrote ages ago back when I could still be bothered to blog.
This question keeps being asked repeatedly by the victims of hackers breaking into their web server. The answers very rarely change, but people keep asking the question. I’m not sure why. Perhaps people just don’t like the answers they’ve seen when searching for help, or they can’t find someone they trust to give them advice. Or perhaps people read an answer to this question and focus too much on the 5% of why their case is special and different from the answers they can find online and miss the 95% of the question and answer where their case is near enough the same as the one they read online.
That brings me to the first important nugget of information. I really do appreciate that you are a special unique snowflake. I appreciate that your website is too, as it’s a reflection of you and your business or at the very least, your hard work on behalf of an employer. But to someone on the outside looking in, whether a computer security person looking at the problem to try and help you or even the attacker himself, it is very likely that your problem will be at least 95% identical to every other case they’ve ever looked at.
Don’t take the attack personally, and don’t take the recommendations that follow here or that you get from other people personally. If you are reading this after just becoming the victim of a website hack then I really am sorry, and I really hope you can find something helpful here, but this is not the time to let your ego get in the way of what you need to do.
Do not panic. Absolutely do not act in haste, and absolutely do not try and pretend things never happened and not act at all.
First: understand that the disaster has already happened. This is not the time for denial; it is the time to accept what has happened, to be realistic about it, and to take steps to manage the consequences of the impact.
Some of these steps are going to hurt, and (unless your website holds a copy of my details) I really don’t care if you ignore all or some of these steps but doing so will make things better in the end. The medicine might taste awful but sometimes you have to overlook that if you really want the cure to work.
Still hesitating to take this last step? I understand, I do. But look at it like this:
In some places you might well have a legal requirement to inform the authorities and/or the victims of this kind of privacy breach. However annoyed your customers might be to have you tell them about a problem, they’ll be far more annoyed if you don’t tell them, and they only find out for themselves after someone charges $8,000 worth of goods using the credit card details they stole from your site.
Remember what I said previously? The bad thing has already happened. The only question now is how well you deal with it.
Nobody wants to be offline for longer than they have to be. That’s a given. If this website is a revenue generating mechanism then the pressure to bring it back online quickly will be intense. Even if the only thing at stake is your / your company’s reputation, this is still going generate a lot of pressure to put things back up quickly.
However, don’t give in to the temptation to go back online too quickly. Instead move with as fast as possible to understand what caused the problem and to solve it before you go back online or else you will almost certainly fall victim to an intrusion once again, and remember, “to get hacked once can be classed as misfortune; to get hacked again straight afterward looks like carelessness” (with apologies to Oscar Wilde).
The first thing you need to understand is that security is a process that you have to apply throughout the entire life-cycle of designing, deploying and maintaining an Internet-facing system, not something you can slap a few layers over your code afterwards like cheap paint. To be properly secure, a service and an application need to be designed from the start with this in mind as one of the major goals of the project. I realise that’s boring and you’ve heard it all before and that I “just don’t realise the pressure man” of getting your beta web2.0 (beta) service into beta status on the web, but the fact is that this keeps getting repeated because it was true the first time it was said and it hasn’t yet become a lie.
You can’t eliminate risk. You shouldn’t even try to do that. What you should do however is to understand which security risks are important to you, and understand how to manage and reduce both the impact of the risk and the probability that the risk will occur.
For example:
If you decide that the “risk” of the lower floor of your home flooding is high, but not high enough to warrant moving, you should at least move the irreplaceable family heirlooms upstairs. Right?
I’ve probably left out no end of stuff that others consider important, but the steps above should at least help you start sorting things out if you are unlucky enough to fall victim to hackers.
Above all: Don’t panic. Think before you act. Act firmly once you’ve made a decision, and leave a comment below if you have something to add to my list of steps.
You’re a knowledge worker.
A fancy term that just means you use your computer for actual honest real creative work. Not talking about time-sheets and a contact list here. That spreadsheet that serves as your companies ERP. The irreplaceable original files making up your portfolio. The curriculum for the class you’re teaching. The knowledge that you’ve gained and reified into something communicable. These things have value. But only as long as they exist.
This feels like a good time to point out that your hard drive is probably going to die this year. “Oh.”
Okay, maybe not this year. Really, it’s only a 5% chance or so. But a 5% chance of an unrecoverable loss of data is enough to keep me up at night. There are many things that can cause you grief in this department.
The point is to acknowledge that the universe tends towards maximum irony and to start acting like failures are expected, so that when they happen you’re not left with the choice of redoing a month’s work, or a $5,000 recovery bill, or simply being forced out of business because that information was both truly irreplaceable and irrecoverable. All you need is a minimum standard of care. You wear your seatbelt when driving. You have working smoke detectors where you sleep. And you have automatic nightly backups of your data.
Well, you will, soon enough. :)
The common wisdom: “You need backups! What if the building burns down?”
This is not why you should have backups. The common wisdom is really just a ready-made statement to show that you think about Big Problems, while giving you an excuse procrastinate and generally ignore the issue. Yes, it’s a problem that should be addressed, but our goal today is the minimum standards of hygiene, and worrying about redundancy and geographic distribution and backup windows is just going to overwhelm you and give you an excuse to give up on the whole thing. And we’re not giving up today.
No, you’re going to get your backup situation figured out because your hard drive is going to die this year. Yes, really. Hard drives have lifetimes measured in years, not decades. And your machine isn’t exactly new, is it?
We’re going to do a daily backup. Backups are annoying because they take a long time to copy everything, and the machine is slow due to the extra load on the disk while they’re running. But if you back up every day, then you only ever have one day’s worth of extra data to move.
We’re going to back up everything. Every day. Getting selective and only backing up particular files is a very good way to ensure that you’re missing only the most vital files. Trust me, you don’t want there to be any question that that work you did last week for the first time in a new program is being included.
And, we’re going to back everything up every day, automatically. It’s vitally important that the backup happens whether or not you remember to start it. And running it by hand will tempt you into doing changing the process by hand, and for this task inconsistency is your absolute enemy.
We want an automatic process that backs everything up every day.
The common approach is to use a nice point-and-click tool to run the backups. You should not use one of these tools.
Transparency is your ally in this task. You need to understand each link in your backup process in as much detail as you can, and this means minimizing the number of links in the chain. Point-and-click tools excel at creating intricate setups that are not the simplest thing that could possibly work.
You’re going to need three things. You already have two of them.
You need an external drive that plugs into your computer using a USB cable or similar. This shouldn’t set you back more than $100-$150, but it’s not optional.
Backing up to CDs or DVDs practically guarantees that you won’t perform the backups on a regular basis, and makes the whole process far for painful than it needs to be.
On any modern unix (Linux, BSD, Apple’s OS X, and so on), the scheduler will be cron. Commonly, there will be a folder called /etc/cron.daily, and any script placed in that folder will be run once per day at a suitable time. Exactly what we need.
On windows, there’s typically a built-in scheduler service which is adequate for the task.
Again, the tool we need is already available on any modern unix. The rsync tool will reliably copy everything, automatically checking that everything was written correctly, and keeping any special data necessary that other tools may not include.
On windows, I’d strongly recommend grabbing a copy of rsync from cygwin or wherever.
What we want is a very simple script, so simple that you can understand it.
#!/bin/bash
# -v Print the names of the files to the screen as we back them up.
# -a Do the things necessary to give a nice complete archive of a set of files:
# -x Don't go exploring mount points that we run across
# --delete Delete files files from the backup if they're no longer found.
# / The source: copy everything from the root drive.
# /media/disk/backup/root The target: copy everything to here.
rsync -vax --delete / /media/disk/backups/root
Save this in a file called “backup”, and added it to your /etc/cron.daily folder. Tomorrow morning, check that your external drive has a copy of all your data on it, and bask in a warm glow knowing you’re doing better than 90% of your peers.
Much better, right?
I’ve been playing with duplicating some of the results on the emergence of cooperation in variations of iterated prisoner’s dilemma.
An interesting (if somewhat trivial) thing to note is how easily behaviour similar but distinct from the Game of Life. A small grid randomly populated only with Defectors and Cooperators, for instance, often produces glider-like patterns.
My goal is to implement some dynamic programming approach to generating strategies, and the attached source betrays that goal with some complexity that is unnecessary for the behaviour I’m talking about, but anyways.
Or, “How to resolve revocation with an immutable capability-secure world.”
A path is stored not by minting a new reference to the target (a hardlink), but rather by storing the path itself (a symlink). Each segment of a path represents a node that serves as a proxy for the rest of the path (onion routing)
Now, where does this leave me if I want permissions to be baked into a capability, and generally immutable?
On the one hand, I can now manage mutability in a sane way in the UI layer, because now the size of local neighborhood is under the control of node. This is bit of a return to the heavier weight approach to linking that I was originally considering, although it still only requires write access to one side of think, plus mint access (which could be a limited form of write in the mutable case, but it doesn’t have to be).
On the other hand, immutability is really really nice. Specifically, being able to decode the permissions and determine if an operation is allowable offline is a big win, as it allows for some fairly aggressive caching even in the worst case, and in the best case may actually reduce the time-complexity through memoization.
Publishing is currently a three step process consisting of writing an entry perfectly, publishing it to a staging point, and then committing contents of that stage point to the real blog. This causing me a small amount of grief:
Approaches I’ve considered to fix this:
However, understand that when I say “replace blogger”, I’m not starting from square one. Some time ago, I actually used a framework of my own invention based on a capability security model inspired by Richard Kulisz (don’t ever tell me I didn’t give credit where credit is due :p). The problem being that I wasn’t following my own advice, and didn’t have a backup of the material in a usable form when the inevitable happened. Ah well, live and learn, it wasn’t that fun to work with anyway.
The idea here is to do it again, but with a focus on creating on top of that architecture that ends up feeling more like a blog than a wiki.
So, how do you create a blog on top of a CSM architecture? Why, I’m glad you asked…
A question about PyPy’s JIT
Although I’m sure this is already obvious to the PyPy people, I’m quite interested to see how close they are to a system that would be capable of efficiently executing interpreters written on top of the existing system.
PyPy is a python implementation written in python. The translation and jit architecture (as I understand it) uses manually inserted hints [pdf] to indicate what variables belong to the interpreter vs the interpreted program, so that the jit can accurately determine when the interpreted program has looped (as opposed to the interpreter itself). This is important because, in general, optimizing the code executed of the interpreter has fairly limited gains: you gain a faster interpreter, but execution is still interpreted. The hints allow the system to distinguish between the accidental work of interpretation from the essence of what is being interpreted.
But it’s not recursive. Even though the runtime has the required logic, it is missing the hints, and so an interpreter running on top of this stack will run faster, but it won’t make the jump to direct execution.
The question that intrigues me is this:
Would it be possible to generate those hints dynamically to make the gains available to higher level interpreters?
Interesting service being launched today by Wolfram Research
Any teenager is capable of making a machine from scratch that can seriously injure a person. The difference between a motor that you can stop with a bare hand, and one that will simply take your hand off can remarkably incremental.
I wonder how close we have to get to that threshold in order to have enough experience to see how close we are.
No, I don’t think the imminent launch implies an imminent hard takeoff. I just find myself wondering if we’ll recognize the potential of the technique, or whether we’ll just make two big lumps of fissile uranium by accident (“Oooo! Shiny!”) and get on with the business of bashing them together.
Aside from the obvious issue of compensating for errors, I don’t have a good intuitive understanding
of why one wouldn’t want to maintain a bet as close to the kelly bet as possible.
It seems that the usual complaint seems to be that you want to minimize your downside risk in the short term, and a kelly bet is concerned only with maximizing the long term gains.
What doesn’t sit well:
I need to think about this more.
I’m kinda surprised that there haven’t been more projects taking a translation/compilation approach to working around IE’s rendering deficiencies. We have a good specification of how things are supposed to work, and many years of experience with how they actually work in various browsers.
Is it that the folks interested in compilers and such just aren’t interested in web technology? (Well, when I put it that way…)
IE6 CSS Fixer is an example of the sort of approach that I think could yield big benefits, especially with guidance from someone with some compiler-writing experience. Splitting the general problems of graphic design and application development from the tinkering needed for cross-browser issues would mean one less rather annoying pebble in many folks’ shoes.
I’m a bad blackjack player. Bad enough that I refrain from playing anywhere except a friendly home situation with no money at stake,
where I definitively demonstrate how bad I actually am.
That said, I find myself intrigued:
Given a population of players based on existing counting techniques, with crossover and mutation, and a fitness function that included optimizing for function size and minimum worst-case runtime memory usage, what sort of counting technique might we turn up?