We may consider Google as a “big eye” not only because of the images provided by Google Maps and Google Earth but also because it seems “to see” everything that is written on the web. But in fact the way Google indexing process works is more like a huge brain.
Google and the PageRank
The details of the processes used by Google to give such great results in searches are not public; moreover, as Google says, such processes are being continuously updated. Nevertheless, we do know that an essential part of them is the PageRank algorithm, which assigns a numeric value to public Internet pages; pages with greater PageRank will appear before in the result pages, although there are many other “non public” factors to compute those results (for example the frequency of query terms in the pages). Leaving aside those details, the point here is that PageRank algorithm works in a similar way to neurons in brain. Let’s see why.
Every indexed page by Google gets a PageRank between 0 and 10. That values are in a logarithmic scale (i.e., there’s much less difference between 3 and 4 than between 4 and 5), and they are computed depending on the number of incoming links from other pages; linking to an external page is like voting for it.
Unlike in democracies, not every vote has the same value: a link from a page with a high PageRank has more value than a link from a page with lower PageRank. Somehow PageRank is a measure of the prestige of the page: you get more prestige by getting references from other pages with high prestige (in fact, a similar algorythm was proposed to assess the prestige of scientific papers).
And prestige is distributed: a link from a page with links to many other pages has less value than a link from a page with the same PageRank, but with less number of links.
The brain as a network of neurons
Through a great simplification we can see the central nervous system (and specifically the brain) as a network of many many cells known as neurons interconnected between them. Every neuron is composed by a nucleus (or soma) which is the starting point of multiple branches called dendrites; there’s also a long nervous fiver called axon which has its own ramifications at the opposite side.
Every neuron’s dendrites are in contact with endings belonging to the axon of other neurons in a zone called synapse; in fact they are not in direct contact but they are very very near.
Electrochemical signals travel in neurons from the nucleus through the axon towards its ramifications, where they fire the release of some substances called neurotransmitters which are detected by the dendrites of close neurons.
So each neuron receives through its dendrites different stimulus with different strength coming from other neurons; some of them are positive and “encourage” the neuron to propagate the signal; other are negative and “discourage” the neuron. Eventually the stimulus are added and the neuron “fires” just if the result if greater than a given threshold; in that case the signal goes trough the axon and propagates to neighboring neurons.
The association of ideas is clear: every web page could be seen as a neuron and its PageRank as its activation value (on or off); incoming links would be the connections from other neurons through the dendrites; and outgoing links would be the connections through the end of the axon. So PageRank calculation (activation value of the neuron) depends on the incoming links (received signals from other neurons) and, at the same time, that value is communicated to the linked pages (signals transmitted to other neurons). Moreover, every page has a different weight in PageRank calculation in the same way as signals received from different neurons have different strength.
In both cases the system information resides not in the single elements but in the interconnections between them. And more important connections are not fix: links between pages change and so do connections between neurons (in fact, some theories state that those changes are the basis of memory and/or learning).
Let’s see some aspects in which both systems are different:
- Activation level in neurons is on/off type whereas PageRank is a numeric value between 0 and 10.
- The algorithm for PageRank calculation does not include links with negative value (which would be equivalent to “discouraging signals”) although it is known that Google penalizes link farms, which are pages created just to artificially increase PageRank.
- Activation between neurons propagates continuously (“in real-time”) whereas PageRank Calculation is performed by Google once a month approximately (that is sometimes called “Google Dance”).
- Related with the previous point, information is physically distributed in neurons whereas PageRank value is in fact centralized in Google servers (although it is calculated from links between pages).
Extending the metaphor
Let’s speculate about the results of applying neuron networks characteristics to PageRank algorithm (we cannot do it vice versa since we still cannot modify how neurons work!).
- Reducing the range of PageRank values from a 0-10 scale to on/off (like neurons) doesn’t seem to give any advantage; rather the opposite, it would make the PageRank calculation less flexible.
- Including links with negative value (similar to “discouraging signals”) would be dangerous since every punished website could easily harm other sites just by linking them.
- Calculating PageRank values every time someone launches a search would be impossible since the amount of resources and computing and network speed necessary would be absolutely huge; that’s why it’s done periodically and the results are stored in servers (although we could say the more often they are computed, the better).
- Theoretically every server could store its page’s PageRank to optimize search processes, but this would require to trust on every server capability and honesty, which is really hard to do.