zondag 16 november 2014

For the Sake of Pleasure Alone

In the past few years I have made acquaintance with several people who, despite being both smart and educated, continue to worship the false god of preference utilitarianism.  I too consider myself both smart and educated, but worship hedonistic utilitarianism instead.  Sometimes I wonder if our differing views aren't merely different representations of the same thing.

I have been wanting to explore this for a long time, but never knew where to start.  Today, less is more, and I will just take an excerpt from Not for the Sake of Happiness (Alone) and go from there:
It is an undeniable fact that we tend to do things that make us happy, but this doesn't mean we should regard the happiness as the only reason for so acting.  First, this would make it difficult to explain how we could care about anyone else's happiness - how we could treat people as ends in themselves, rather than instrumental means of obtaining a warm glow of satisfaction.

Second, just because something is a consequence of my action doesn't mean it was the sole justification.  If I'm writing a blog post, and I get a headache, I may take an ibuprofen.  One of the consequences of my action is that I experience less pain, but this doesn't mean it was the only consequence, or even the most important reason for my decision.  I do value the state of not having a headache.  But I can value something for its own sake and also value it as a means to an end.

For all value to be reducible to happiness, it's not enough to show that happiness is involved in most of our decisions - it's not even enough to show that happiness is the most important consequent in all of our decisions - it must be the only consequent.  That's a tough standard to meet.
Yes, there are multiple reasons for taking the ibuprofen, and it would be a stretch to say that the other reasons boil down to maximizing expected happiness.  But just because they are reasons for acting one way or another does not mean there is something valuable there.  This strikes me as a confusion of terminal and instrumental values.

I can spend hours and hours aimlessly clicking around on Facebook or aimlessly walking around the grocery store.  Anyone with a similar affliction can confirm that these activities yield more dolor than hedon; why then do I keep doing them?  Whatever the cause, I don't feel at all compelled to elevate it or any aspect of its outcomes to Terminal Value.

This feels like a strawman because writing a blog post has higher expected value than aimlessly clicking around on Facebook.  But if that's the crux, then it's a confusion over instrumental and terminal values.

The more substantial disagreement is that I don't see why our conception of terminal values should explain our behaviour.

---

In Terminal Values and Instrumental Values, Yudkowsky writes:
Consider the philosopher who asserts, "All of us are ultimately selfish; we care only about our own states of mind.  The mother who claims to care about her son's welfare, really wants to believe that her son is doing well - this belief is what makes the mother happy.  She helps him for the sake of her own happiness, not his."  You say, "Well, suppose the mother sacrifices her life to push her son out of the path of an oncoming truck.  That's not going to make her happy, just dead."  The philosopher stammers for a few moments, then replies, "But she still did it because she valued that choice above others - because of the feeling of importance she attached to that decision."

[...]

When we object that people sometimes do sacrifice their lives, the philosopher's reply shifts to discussing Expected Utilities over Actions:  "The feeling of importance she attached to that decision."  This is a drastic jump that should make us leap out of our chairs in indignation.
And this is basically how I see it, except that I don't try to be cool and cynical and deep by making accusations of selfishness, and that my stammering wouldn't be relevant to the argument.  The mother is an adaptation-executor just like the rest of us; endowed with reinforcement-learned genes that make no distinction between expected utility and actual utility or between instrumental and terminal values.  Worse, her adaptations have been optimized with respect to an irrelevant goal: inclusive genetic fitness.

If the terminal value function we are looking for should explain our behaviour, then there you have it: inclusive genetic fitness.

(As I am writing this, it occurs to me that I may see it differently from the paraphrased philosopher after all.  I would not have to make the jump from talking about utility to talking about expected utility because I would be talking about expected utility from the start.)

---

Years ago when I first started reading Yudkowsky's writings, I came upon
How to Convince Me That 2 + 2 = 3:
I find it quite easy to imagine a situation which would convince me that 2 + 2 = 3.  Suppose I got up one morning, and took out two earplugs, and set them down next to two other earplugs on my nighttable, and noticed that there were now three earplugs, without any earplugs having appeared or disappeared [...].
(Note: I have omitted essential parts of his argument because I'm not going to argue with it anyway.)

I remember reading over that second sentence several times.  You have two earplugs, you add two earplugs, and you end up with three earplugs.  That's impossible!  What on Earth is he not telling me?  I think it took close to a dozen fruitless re-readings of the first few paragraphs of the post before it dawned on me.

It's unimaginable, and that's the point.  The hypothesis that two plus two equals four is tested possibly more often than any other, and has never had evidence -- no matter how unexpected -- cast doubt on it.  We simply can't see things being any other way.

And so I feel about pain and pleasure being the only terminally valuable things.  I can't even fathom how something could possibly be terminally valuable if it were neither pleasurable nor painful.  But I would know it if it happened, and then I would be convinced.

---

I feel like I should taboo "terminally valuable", but I am not sure what to replace it with.  I terminally value good sensations because they are good to me.  I terminally disvalue bad sensations because they are bad to me.

I could go on for a while, rephrasing it in different ways, all conveying simply "I like what I like and I dislike what I dislike".

That statement is tautological but not useless.

There is no objective morality.  Even if there were, there would be nothing to enforce it.  If there were something to enforce it, this whole discussion would be pointless.

On the flip side, there is plenty of subjective morality.  Different people like different things.  Different people want different things.

The nice thing about subjective morality is that it's nobody's business but your own.  But it's a leaky abstraction.  We share an objective reality, and what happens there violates most people's subjective moralities.

That much is obvious.

Now suppose we want to do something about this.  Ideally, we'd like for everyone to get what they want.  Failing that, we can make them want what they get.

That last option is one that Yudkowsky seems to be heavily opposed to.  If he wants chocolate ice cream, then he doesn't want to want vanilla ice cream, because he doesn't want vanilla ice cream.  He shudders at the thought of eating vanilla ice cream.  Eww!

But I hope that it is uncontroversial to say that there is nothing valuable about chocolate or vanilla ice cream per se.  The value comes from the experience of eating it and liking it.

There is no objective morality, and so you should consider your attachment to your current wants (whatever they are) to be instrumentally irrational.  In fact, your wants are worse than useless: there is an opportunity cost to insisting on chocolate ice cream when vanilla ice cream is much more abundant and your wants are easily changed.

The extent to which you insist on keeping your wants the way they are determines your... say... preferential footprint.  If, as I assumed earlier, we want for everyone to get what they want, then we want to minimize our preferential footprints.

Doing so will leave only that which we cannot help but value: pleasures and pains.  If it turns out we can do away with those as well, then great, but just like I don't expect two plus two to stop equalling three, so I don't expect pain to stop hurting.

---

So far I have assumed that we want to cooperate so that everyone gets what they want.  Given that the conception of value presented here is in some sense selfish, it is a whole 'nother minefield whether and how this cooperation can occur.  I believe I have written plenty of disagreeable things already; let's leave it at this for now.

zondag 6 april 2014

Hyperbola Quintessence for rooks along ranks

Hyperbola Quintessence is a quick and simple alternative to Magic Bitboards, but it does not work for rooks or queens moving along ranks (assuming LERF mapping).  It applies the o^(o-2r) trick to generate attacks in positive ray direction (for rooks, east or north).  The attacks in negative ray direction are generated by applying the same trick to the bit-reversed board.

There is no fast (i.e. branchless) way to reverse bits, but in many cases (attacks along a diagonal or along a file), the relevant bits are in separate bytes.  In those cases, the relevant bits can be reversed by a simple byte swap, which is sufficient.

In the case of a rook moving along a rank, the relevant bits are all in the same byte.  The same trick can't be exploited here because the byte swap will have no effect on the relative order of the bits.

I'm new to the world of computer chess, so I don't know what people have been falling back to.  I guess look-up tables are an okay solution, with 8 * 256 entries covering all combinations of rook position on the rank and presence of other pieces on the rank.

In any case, here's a way to apply Hyperbola Quintessence in the rook-along-rank case.  The crux is that we don't need to be able to reverse all the bits of bitboard but only the relevant ones.  Since there are as many bits in the byte (representing the rank) as there are bytes in the board, we can map the bits to their own consecutive bytes and swap the bytes as usual.

Here are the steps (dots are zeros):
  1. Mask the occupancy and the piece position to keep only the relevant rank.
  2. Map the relevant rank of the occupancy and the piece position to the a1-h8 diagonal:
    occupancy         piece
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    
    >> rank_index * 8
    
    occupancy         piece
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    . . . . . . . .   . . . . . . . . 
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0 
    
    * 0x0101010101010101
    
    occupancy         piece
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0  
    0 0 1 0 0 1 1 0   0 0 1 0 0 0 0 0 
     
    & 0x8040201008040201
    
    occupancy         piece
    . . . . . . . 0    . . . . . . . 0 
    . . . . . . 1 .    . . . . . . 0 . 
    . . . . . 1 . .    . . . . . 0 . . 
    . . . . 0 . . .    . . . . 0 . . . 
    . . . 0 . . . .    . . . 0 . . . . 
    . . 1 . . . . .    . . 1 . . . . . 
    . 0 . . . . . .    . 0 . . . . . . 
    0 . . . . . . .    0 . . . . . . . 
  3. Apply Hyperbola Quintessence to get the attacks along the rank as if it were the a1-h8 diagonal.
  4. Map the attacks back onto the relevant rank:
    attacks
    . . . . . . . 0
    . . . . . . 0 .
    . . . . . 1 . .
    . . . . 1 . . .
    . . . 1 . . . .
    . . 0 . . . . .
    . 1 . . . . . .
    1 . . . . . . .
    
    * 0x0101010101010101
    
    attacks
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    1 1 0 1 1 1 0 0
    
    / 0x0100000000000000
    
    attacks
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    1 1 0 1 1 1 0 0
    
    << rank_index * 8
    
    attacks
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
    1 1 0 1 1 1 0 0
    . . . . . . . .
    . . . . . . . .
    . . . . . . . .
Note, this only works one rook at a time.

vrijdag 20 december 2013

How to create a logical array in numpy

This took me a while to find: the numpy equivalents of Matlab's true and false constructors.

To create a 3x3 array of False: np.zeros((3, 3), dtype=bool)
To create a 3x3 array of True: np.ones((3, 3), dtype=bool)

dinsdag 29 oktober 2013

Playing HexDame, 1: Introduction

For one of my university courses, I will be implementing an automated HexDame player.  This post is the first in a series that will document the process in a way that is hopefully easy to understand.

HexDame is a variant of draughts/checkers on a six-sided board with six-sided tiles:

Because the board shape differs from that used in checkers, some of the rules of checkers need to be adapted.  In HexDame, pieces can move in six directions rather than four: north-east, north, north-west, south-west, south, south-east.  Like in checkers where men (as opposed to kings) can only move forward, in HexDame men can only move away from the player's initial corner.  For example, white's pieces can only move away from the tile labeled "a1" in the picture.  A man gets promoted to king when it reaches "the other side", which in HexDame is the two sides furthest away from the player's initial corner.

Now let's think about how to play this game as a computer.  You might start brainstorming about how to set up a trap, or how to develop a strong position, or that promoting a man should have highest priority if it can be done in one move.  You could do this, but you would end up with a bunch of isolated ideas with no guidance as to how to combine them.  Worse, the ideas you come up with are tailored to this particular game.

That's right, there are ways to automate the playing of games that are general, in that they work for many games.  The methods that I am thinking of rely on the concept of a game tree.

As an example, here's an image of part of the game tree for tic-tac-toe:


At the top ("root") of the tree, there is the empty tic-tac-toe board.  Below it are all the possibilities for how the tic-tac-toe board will look after one move.  Actually there are more possibilities, but each of the ones not shown can be rotated to match one of those shown.  The rotated variants lead to the same game results, and so do not need to be considered separately.

Below the possible boards after the first move are the possible boards after the second move.  Again, not all are shown.  The image stops there, but you can imagine how the game tree continues further down.

Game trees are a useful idea in many ways.  They model the whole game, revealing all the possible consequences of every possible move.  If you know the game tree, you know what to play.  So what's the catch?  Let's digress...

Lay conversations about the game of chess often touch on game trees.  Wild claims are routinely made about "how many moves ahead" a master looks.  It is a good rule of thumb to believe that stronger players look further ahead than weaker ones, but looking ahead is not the only thing to chess.

Well, actually it is, but it takes a lot of time to consider every possible move and countermove.  In fact, every additional step looked ahead costs roughly 35 times as much time!  So if it takes you on average one minute to look two steps ahead, it will take you more than an hour to look three steps ahead, and more than a day to look four steps ahead.  The magic number 35 here is the average number of moves you have to choose from in chess.  In some situations there are fewer possible moves, in others there are more, but the average is around 35.

So what do masters do?  They don't consider each possible move, because some moves are obviously stupid.  They consider forcing moves first (moves that put the king in check or attack the queen), because forcing moves constrain the opponent, which reduces the number of moves they have to choose from, which reduces the time spent thinking ahead.  In the beginning of the game, they following opening theory, which determines their early moves much like a phone script determines a telemarketer's sequence of questions.  Toward the end of the game, they similarly follow endgame theory.  For the rest of the game, they use their highly developed intuition for chess, and a large set of explicit principles, such as the ones listed at http://www.geocities.com/siliconvalley/lab/7378/principl.htm.

In principle there is no difference between humans and computers.  Both have to comply with the laws of physics, and both are subject to the tyranny of computational complexity, which measures the amount of thinking time and the amount of memory that is required to compute some result.  The practical difference is that out of the box, humans have the ability to recognize patterns.  This is a complicated ability that developed over the years through the process of evolution.  In the context of games, it is good for two things; de-greasing engines and killing brain cells developing intuition and applying existing knowledge.

Intuition in games is a cool thing.  A decent chess player can look at a position and instantly see in whose favor it is.  Not with perfect accuracy, but perfect accuracy would take more time than has passed since time began.  It is better to lose the occasional game due to inaccuracy than to lose every game by running out of time.

The other way in which human brains have a headstart for playing this kind of game is that they can apply existing knowledge to it.  Knowledge that has been learned for one task can sometimes be reused for another.  For instance, if you're good at playing soccer, it will be easier for you to pick up table soccer.  Wait.. let me find a better example.  How about: if you're good at playing badminton, it will be easier to learn tennis.  If you know how to play chess, you have knowledge that can be repurposed for the game of checkers.  The rules are different, but the games are pretty similar.  Both have the concepts of capturing and promoting pieces, and, less superficially, both have positional and tactical aspects.

Artificial Intelligence researchers are working on mathematical analogues to these abilities.  The general field is reinforcement learning, in which an agent (such as a robot) learns to perform a task by trying things and being told how well it is doing.  In the subfield of transfer reinforcement learning, the agent tries to reuse knowledge from previously learned tasks.  Also lots of progress has been made on neural networks recently, which could substitute for human intuition.  These topics will probably pop up again in future blog posts; I'm not going to go into them here.

For the beginnings of our automated HexDame player, let's forget about these complications and just create a program that looks ahead.  The game tree provides a nice, systematic way for doing this.  More later.