1. the players move alternately, assume the opening player is a girl, why not?
2. each player chooses a nonempty pit and empties it by distributing one pebble into each pit including her kalah moving in an anticlockwise direction, noting that if the player holds sufficient pebbles pebbles are placed in the opponent's pits but never into the opponent's kalah
3 if the last pebble lands in her kalah then the player has another turn and this process can continue many times
4 if the last pebble lands in an empty pit on her side opposite a nonempty pit on the opponent's side then the move comes to an end but all the pebbles in these two pits are placed in her kalah
5 if either player finds that all their pits are empty then the other player can place all the pebbles in their pit into their kalah
6 the winner is the player with most pebbles in their kalah.
CHOICE OF A PROBLEM
1. The activity must not be deterministic in the practical sense. There exists no human algorithm to guarantee a win or draw and the complete game involves a large number of choices.
2. A definite goal must exist and at least one criterion or intermediate goal must exist which has a bearing on the final goal e.g. to stop the opponent moving.
3. The rules of the activity must be definite and known. Games satisfy this criterion.
4. There should be a background of knowledge concerning the activity against which the learning process can be tested.
5. The activity should be familiar to a substantial body of people so that the behaviour of the program can be made understandable to them.
Action of a General Computer Program in Game Playing
a. The human inserts an arbitrary position in the game. Goto b or e according to whether it is human's or program's move.
b. The human inserts his move.
c. Generate the next position.
d. If the game is over print the result and stop otherwise goto e.
e. Generate all or some of the successors of the top computer position.
f. Evaluate each position. For some programs each evaluation involves elaborate searching with look-ahead.
g. Move to the successor.
h. Print the Computer's move.
j. If the game is over print the result stop otherwise gotob
One of the simplest ways for a program to learn is to save all the information it has gained or deduced about its previous games. If the program can refer to this information at any stage a great deal of computing time can be saved. Playing at ply n it is possible to achieve look-ahead at a level considerably greater . The program saves the board positions that have actually occurred in playing the game with their associated scores. When a terminal node is reached at look-ahead level n its actual value is found in a look-up table giving an effective look-ahead of many times n.
As storage is limited and search times can soon become disastrous systems must be devised to
1. catalogue boards that are saved.
2. delete redundancies,
3. discard board positions of little value.
4. standardise all board positions so that black is to move factor=2,
5. can reduce the number of positions if all pieces are kings by symmetry, factor=4,
6. store positions on the basis of number of pieces on the board, the presence of piece advantage, the side with advantage, the presence of kings, the first moments of pieces about the diagonals.
To limit the number of records held an interesting feature has been introduced ageing, and refreshing. All board positions are given an age of 1 and after every turn the age is increased however when any board position is consulted its age is halved. Those positions which are too old are archived. i.e. forgetting.
The important parameters are
1. piece advantage
2. denial of occupancy
4. control of centre.
Played on an IBM 704 which could handle a milliion 36-bit words and up to 100 references per minute.
A possible evaluation function for checkers is
6k + 4m + u
where k is the king advantage m is the man advantage and u is the undisputed mobility, which means the number of moves that can be made without leading to an immediate capture.
If we consider the position shown in the attached diagram then BLACK has a man on 16 and a king on 28 whereas WHITE has kings on 15 and 32. The computer is white and the score is
u = 4 - 2, k = 2 - 1, m = 0 - 1. Thus score is 6k + 4m + u or 6 - 4 + 2 = 4.
WHITE is in play and has to make a move ; there are five choices A,B,C,D or E. Using 1-ply we can score the moves as
A 15-10 score 4
B 15-11 score 5
C 15-18 score 4
D 15-19,16-23 score -6 this initial move leads to an immediate capture 16-23
E 32-27 score 6
The best move is actually 15-11 from a 3-ply analysis but E appears best so far.
From 1946 Samuel has devoted a considerable amount of computer and human effort in developing a checkers computer program to beat human opponents. The work has been of such success that the program can now beat most opponents and has reduced the excitement in other programmers attempting this particular problem.
Samuel introduced the topic of machine learning and his program improved its performance up to a limit against human opponents. Basically it uses the concept of rote learning. Suppose a tree is terminated at 3-ply then each of the nodes is evaluated statically using the evaluation function which consists of about 38 parameters. However if the root node that is encountered is stored with its backed up value re occurs then it will have a firmer value. Provided enough games are played then most if not all of the nodes in the future game trees will have been seen before and correctly valued. Thus the root node will have a virtual value of 6-ply and as the process continues and the program learns so the values given to each node will improve. An additional learning technique is to play two programs against one another and the program that wins has one set of parameters and the loser another. The new program must have parameters that emphasise the good points of each in some way. The storage of previous games was catalogued as follows;
the number of pieces on the board, the presence or absence of piece advantage, the presence or absence of kings, the side with advantage. Positions of little value, those that are not referred to very often, are deleted and symmetry is used to cut down the number of stored positions by various factors of 2. In 1959 the program used 60,000 board positions and an evaluation choosing 16 parameters from 38, whilst in 1967 alpha beta pruning had been introduced and only the most promising nodes were investigated. Book moves were also incorporated.