**Symbols, Connections, and Optimizations by Trial and Error**

This chapter begins by looking into symbols in trees and networks. The first example of the first philosophies regarding intelligence: symbol processing. This was the first approach that was used for artificial intelligence. This methodology relies on symbols to be clearly categorized. They discussed the grounding problem for tree structures as well. The second assumption they look into is fuzzy logic. Which means that connections are not either true or false. They are based on a degree of accuracy. Another method that was discussed was constraint-satisfaction models.

Commonly in traditional symbol processing, symbols are arranged in tree structures. With this structure, each step leads to another decision. However, since fuzzy logic does not rely on clear-cut decision making, it may look erratic, where it is strong in some places, and week in others. Trees cannot depict feedback. So, network matrices are used to depict feedback relationships. There is further discussion about representing networks using matrices (include mapping of trees on matrices). There is also discussion about graph patterns that cause feedback. Paul Smolensky has an equation to find the optimal state of such networks. There is discussion about maintaining harmony of a network. Particularly, there is discussion about a method of optimization formulated by John Hopfield.

The discussion of discovering patterns of connections among elements (learning) is covered. They begin by taking a look at what it means for two things to be correlated. They then begin to talk about how this is implemented using correlation matrices. The two types of organization they talk about are symmetrical and asymmetrical connections. They begin to talk about how you can use complex logical interactions to achieve different results. They focused on creating feed forward networks, and used them to create logical propositions. Interpretation is then talked about; the sigmoid function is used to manipulate information into a tolerable range.

Neural networks are then discussed. These networks are used to try to provide a theory of cognition referred to as connectionism. This method assumes that cognitive elements are made up as patterns distributed through a network of connections. This model more closely resembles human thinking, since it does not require elements to be separated for categorization.

The next section begins to take a look at problem solving, and optimization. They begin by trying to establish that characteristics of problems can be used to generate a degree of goodness by which an answer can be estimated. In this example they look at methodologies for selecting a solution to satisfy an algebra equation. Then they begin to look at the three spaces of optimization. The first is parameter space, which is the legal value of all elements. The function space contains the results of operations on the parameters. The fitness space contains the degrees of success by which patterns of parameters are used in function space. This is measured as a degree of goodness, or badness. They then begin to talk about evaluating solutions by establishing a fitness landscape. The idea is to reach the highest point on the graph (which can be multi-dimensional).

High dimensional cognitive space and word meanings are examined in the next section. This section begins by talking about computer algorithms used for word meaning recognition. The first approach is called semantic differential (Osgood, Suci, and Tannenbaum – 1950). They gave people words, and had extensive questionnaires about their feelings pertaining to the word (57). They linked words into groups by “halo effect,” which results in the categories: evaluation, potency, and activity. Another group created a database of word relations based on Usenet samplings in order to remove the factor of tester bias (Colorado 1990). They organized word associations into a matrix so they could establish more relationships, such ad Euclidean distance. They talk about how this closely relates to the way human’s process word definitions by understanding context rather than consulting a dictionary.

NK Landscapes, and factors of complexity are discussed in the following section. This section deals with interdependent variables and problem space. The foundations of complexity lie in N, the size of the problem, and K, the amount of interconnectedness of the variables (Stuart Kauffman). Increasing N results in combinational explosion, and increasing K results in epistasis. There are extensive examples using bit strings to illustrate various problem complexities.

Combinational optimization is the focal point of the next section. The idea here is to either minimize or maximize a result. First simple permutations are talked about. Then, they talk about breadth-first and depth-first search of simple permutation diagrams (trees). Heuristics are used as shortcuts to reduce the search space. The whole idea appears to boil down to trying to find the best way to guess where you are going to find the right answer.

The next section deals with developing binary optimizations. They first discuss how various things can be encoded in binary so they can be used for binary optimizations. Then there is discussion about how binary search space doubles for every additional element, and how you represent binary strings in various dimensions. There is also discussion about the meaning of hamming distance. Since binary strings can become intractable it becomes necessary to determine which bits are the most important in order to narrow the search space. They then go into discussion of various searches such as random and greedy, hill climbing, and simulated annealing. Additionally, they talk about various implementation concepts such as binary vs. gray coding, and examining step sizes and granularity.

The final section is a brief overview of optimization using real numbers. Essentially these problems appear to be similar to combination, and binary optimization methods. However, distance (include step size) is no longer determined in Hamming distance. Typically they are represented using Euclidean distance as calculated in n-dimensional space.

hi how ru