A DIARY OF THE COMPUTER ANALYSIS BY ALLAN WEBBER OF 340,000 WORDS FOUND IN WELL KNOWN TEXTS.
This series of 2 papers shows my efforts to establish a set of measures for gauging the reality of anagrammatic coding. Hopefully it will prove useful to all those interested in coding.
The first stage examines the statistical evidence when comparing a variety of ancient texts.
The second stage examines patterns between the larger anagrams found in the texts.
This exercise was designed with the aim of definitively testing Nostradamus' text for coding.
|Tuesday 2nd November
2004- The US presidential
campaign ends tonight, the Melbourne Cup is run this afternoon, its
raining in what has been a dry and hot spring.
The analysis arises out of my scepticism, a scepticism that many would consider at odds with the efforts I have put into my Nostradamus research.
I can see no convincing evidence that any living person can see into the future or is psychic. If such effects were possible then in this populous world there should be some evidence that did not wilt when placed under scrutiny. Instead there is only folk wisdom that relies on its delicate nature to explain its inability to be confirmed. Yet if there was really any person with such insight they would be the subject of great respect especially in the fields of investment and military adventurism.
I am a sceptic but only where reason and evidence support this view. I don't like to take a view based on lack of evidence for that seems a mentally lazy course.
And the present offers ample evidence, testable in a way that the past is not. Purported prophets of the past usually gain strength not from the evidence but from the clouding of reality that tends to come with time.
Having due regard to what is apparent in the present world I cannot believe in the ability of individuals to prophesy despite what others might make of my writings. But in this one case, that of Nostradamus, I have found it much more difficult to find the flaw. Each of my writings published on the internet serves its purpose in my ongoing attempt to find the evidence that Nostradamus could or could not see into the future, to resolve one way or the other this enigmatic writing. I have no doubt Nostradamus' writing is coded but was it, is it. prophetic? If Nostradamus was able to truly prophesy it seems counter to normality and reason.
For over thirty years I have established ground rules to place Nostradamus' work in a testable modern setting. This should have let me relinquish any thought that Nostradamus could see into the future. Yet on every occasion I have been left with a bigger mountain of coincidence to overcome.
My work has led me into a view that there is an unusual and inexplicable patterning in Nostradamus' writings. It seems to hold too many beautifully arranged English anagrams.
For example there is a verse where the anagrams
"American" and "Disaster" occur complete and alongside
each other. To what extent is this coincidence? Can it be considered to
be 'proof'' of coding?
Only recently I had cause to use my programs to
search for the complete anagram of "acronyms", a form of coding
using the first letter of lines to generate messages. One occurrence
occurs and in the same line are clearly seen anagrams saying "My
How many such coincidences add up to proof? At what point can I say "There is a code in Nostradamus.". And at what point would I be forced to link this with the much more unnatural statement "Nostradamus could really see into the future."
Of course it is possible to prove that a person has incorporated a code without it supporting any other claims as to their abilities. But in this case my method of proof makes it difficult to distinguish the two for the codes appear to be in Modern English and Nostradamus was a 16th century Frenchman. Is this premise wrong and the code really in French? To what extent is this my delusion, my falsely seeing what I seek?
Over the last year I have concentrated on finding whole anagrams ( see ANAGRAMS to clarify term). It has been somewhat of a surprise to me to find the superabundance of anagrams in Nostradamus' ancient text. Superabundant, highly complex and often seemingly inexplicably connected with each other and the original text.
The question arises as to how important is this data. It would seem at a casual glance that the finding of words such as "American" and "Disaster" alongside each other is very significant when there are only 4 anagrams of "American" and 2 of "Disaster" in all of the 3742 lines of text of Nostradamus. But is it?
It is easy to be misled about the nature of chance and to include false probabilities into our assumptions. The classic example of this is the probability of 2 people having the same birth date in any class of 30 people. The real odds of this turn out to be extremely likely not the one in 12 chance (1/365 x 30) that people mostly assume. The false calculation occurs when the question is approached from the point of view of it occurring rather than from the view of it not occurring.
So over time I have experienced increasing dissatisfaction at the ease with which I have found material in Nostradamus' text that supports my hypothesis. It seems too easy to get, much easier than people might assume.
I knew I was able to analyse Nostradamus' work in a much more scientific way than the 'ad hoc' approach I had been using and I set out to test the texts of Nostradamus to their fullest extent. My gut-feeling was that these anagrams in Nostradamus were more than chance but the left-hand side of my brain chided me that disciplined testing of this proposition would show my feelings to be falsely based..
In order to settle this conflict within myself I set out to gain approval of my brain's left-side. Firstly I gathered from various dictionaries on the internet a set of source words -all being at least five letters long-that I could use for testing various texts. I ended up with a base of over 340,000 words.
Now testing this amount of data is no simple task. It is only in the last few years that such a data base could be applied to the analysis of Nostradamus work. There are 3742 lines of text to be analysed and each word has to be tested. This means over 1,200 million tests have to be done.
Ten years ago, with the technology available to me then, it would have taken over 8 months just to run the tests on Nostradamus text. And beyond 20 years ago the task would have been impossible. It seems fair to say that this is the first time since Nostradamus wrote his work that such a test is possible.
Today, using 3 machines that are not the fastest around I can accomplish the analysis task on all my sources in about 14 days.
In order to achieve something useful I needed to test other sources and I resolved to use short sections from a range of texts that fairly compared to Nostradamus work and which allowed some greater understanding of anagrams within a wider context. I ended up choosing poetry totalling some 1,800 lines which is about half that of the Nostradamus text. I chose 2 old French classics, an Italian classic and 2 old English classics in order to test various aspects of anagrams and the potential of my methods to uncover hidden coding. I also included the English translation of the Italian classic to act as a control. See TEXT SELECTION for further details.
Each text (and the word list) was converted into a standard form while ensuring that the integrity of the sources remained intact. For instance the Nostradamus text is the format I started using 10 years ago and hasn't been changed since. See TEXT CONVERSION PRINCIPLES for details. I believe the comparisons made possible by these processes are fair and free of distortion.
The rules I apply to the anagrams are also critical for they are designed to ensure rigour in this particular analysis. Imperfect anagrams are not allowed -the program used DOESN'T look for letter substitutions, composite words or anagrams in incomplete state (although I have programs that allow these aspects to be tested).
One important variable aspect is the way the program treats end-of-line lettering. Each line is considered as a circle so anagrams can be found right up to the last letter (with the rest at the beginning of the line). A second important aspect is that every letter in the line is considered to be a potential starting point for an anagram not just the letters that begin words. .See RULES FOR FINDING ANAGRAMS.for principles. There are other rules that could be applied but I have chosen these on which to base the analysis. The important issue here is that these rules are mandatory and mechanical and not able to be chosen at will- there is no discretion to choose that which benefits or detracts from a particular result.
The program for this whole anagram analysis is fairly simple (compared to that for split-anagrams) and for those who have an interest in building a routine for themselves I set out the basics in HOW THE ANALYSIS PROGRAM WORKS
Tuesday 9th November 2004- There is a difficulty in any research designed to test for a negative result in that the desire for the research to come up with something unexpected can conflict with the need for a disciplined outcome. This has been my dilemma over the last week because the analysis turned up a result that showed Nostradamus' text is different to the other texts analysed.
Although I felt my past work suggested otherwise I expected the results of disciplined analysis to show that Nostradamus' text was no different to the other texts examined.Yet this is not the case, so I was forced to re-examine every element of the research before I could proceed further.
I re-wrote the program using a completely different paradigm (See SECOND PROGRAM) and ran it against the first. The new program was far faster than the earlier one (for the new one allowed me to use a faster technology called SQL) and because of the slowness of progress in the earlier program there was a great deal of merging output from different computers which gave a potential for error. By checking the output of one program against the other I could eliminate any error of this kind. There were a minor number of omissions that had occurred and the program was rerun to redress this problem. The data from both analyses now matches and I am confident that the final product is free of similar errors.
There is a difference in the size of data from each source and this had to be normalised against a standard for the results to be meaningful. This was done by counting the letters in each text (for this gave the number of starting points for anagrams) and then using this number I proportioned each to match the count in the text of Nostradamus. This letter-count data is gathered very easily in a database such as Access (Summing length of lines) and is not prone to error. After reviewing this aspect I am confident it is a valid approach and not the source of any bias in the results.
(11th November- I realised that I had not checked the validity of the entry from which I was basing the count- this turned out to be an error for I had used the lines with blanks and other characters in it whereas it should be the one where all these are removed- this produced a significant change in the ratio which increases the count for Non-Nostradamus texts and increases the probability that his text is not different from the rest. My left hand brain (the rational side) is confident that this will cause the analysis to properly show no difference in Nostradamus and the other texts. My right hand side's smug confidence that the results were going in favour of Nostradamus lies in tatters and now it is more in hope than anticipation that it awaits the completion of the analysis. - Final values used for letter count are Nostradamus: 116,871, Spens:er19,092, Roland: 18,763, Moll: 7,089, Shake: 5,682, Dante-Eng: 4,630, Dante-It: 3,794)
The word base is another potential source of error but only if it had inbuilt influence from me. The sources for my word base are completely independent of my work. They can be downloaded from the following sites for those wishing to verify the independent worthiness of my sources.
There is no doubt there will be a masking effect produced by the nonsensical structure of many words in this lexicon but as the number is not vast and it should be equally distributed this should have no significant bearing on the assessment as to the possible use of anagrams in any text.
Some of the texts are in English (or a translation) and this must be a source of bias if the redundant anagrams (those that are identical to the test word) are included. To ensure fair comparison redundant anagrams were marked out as such and a result including and excluding these words was then possible for all the different sources.
Monday 15th November 2004- Both the emotional and rational minds are astounded by the result. The expectation of each was that after the adjustments described earlier there would be nothing of significance in a comparison of the Nostradamus' text with the others. Below is the chart of the results of the analysis.
The right hand, emotional side of the brain is pleased but apprehensive at this result for although it seems this is a clear victory it knows the left-hand side too well, this result will be pursued to try and find its logical flaw.
And the emotional side is also disappointed for it aches for the issue to be resolved and a negative result would have allowed this quest to end. Nothing is proven by this result, it just continues the pattern whereby Nostradamus text continues to intrigue and puzzle me.
Already the computers are running once more, running logical tests to try and resolve what it means when an analysis of a French text using English words leads to this surprising result.
This process of understanding begins with the graph below, which uses a logarithmic scale to display the result. This shows much more clearly that as the length of words increases there is a distinct increase in the ratio of Nostradamus count compared to that of the Other texts.
This graph suggests that a straight line could be drawn across the tops of each Series. Each line has a distinctive slope and they come together at some point to the left of the graph
It would not be surprising to see such a result if there was deliberate coding for it is less masked where randomly generated anagrams are less frequent (i.e. longer words).
Potential for Code in the Other Texts
The reason for choosing several texts in different languages was to try and quantify the extent of random anagram generation. Dante in modern English was expected to be particularly significant in this regard for it is nigh impossible for it to contain coding. Several of the texts were from a period in time where coding in text was very fashionable. It has been implied that Shakespeare's work may hide code in the forms of anagrams but the selection of sonnets makes this selection unlikely to be coded (for the nature of the word is paramount in sonnets). The same can be said for Moliere who was writing a play that needed to be attuned to the ear, not the mind. Saga's are different for in telling a story, elegance is less important, So Dante's Italian and Roland's Chansong hold greater potential for concealing code
The sources have unequal numbers of letters in them and it is important to test whether my equalisation factor distorts the result. The nature of the data is such that a simple test can be provided. The sum of the Other Source letters is very nearly equal to half the letters In Nostradamus. By using 2 line pairings of Nostradamus' four- line-quatrains as a basis for comparison it can be shown the results still hold.
Amongst the significant questions that are raised by these results is the reason why the texts written in foreign languages show a tendency to imply code when examined with an English word base. Even if they are in code it would seem unlikely they are coded in their own language. Simple logic suggests that testing with an English base should not reveal a coding pattern in another language. However simple logic often leads to wrong conclusions (as in the case of "Which one of a lead, wooden and glass ball of same size will fall fastest when dropped?-They all fall at the same rate).
The simple logic suggesting English (or another foreign base) would not work is undermined by the connections of profiles in different languages.
In this series of analyses I am not actually looking for English words in other language texts, I am looking for anagrams of English words.
In preparing for this analysis I have drawn upon the Moby reference source to construct both a French and an Italian word base. I have then examined them using my anagram profile functions and MS Access query tools. These foreign word bases are not as big as the English word base but they show that between 16% and 20% of the anagram profiles in the smaller bases are common to each other and the English word base.
This is a high percentage and if to this is added the very significant number that have very-similar profile then we will always get a large number of anagrams no matter what language is used,. However, the expectation would be that the variance between those with and without coding should increase when analysed with the appropriate language base (that in which the code is written).
It is evident from this that I should at least run a full test using the French Word base to see if the results hold and to see what is revealed.
This research has shown that a huge number of anagrams of varying lengths occur by chance. (The number of occurrences is certainly much larger than I expected.)
I consequently believe that the finding of anagrammatic structures in any text is extremely likely and that the ability to find chance sequences that suggest a meaningful relationship are also high.
Yet we know code does exist in many texts and in order to assess it we need a more rigid understanding of what is abnormal and what can occur by chance.
In order to go beyond this issue of probability and create a firmer base, an understanding of the findings in the earliest part of the research are essential. In order to pursue this I have created two different functions to randomise the order in the search lines. This means they contain the same letters but the syllabic structures are destroyed. This has then been used to analyse 1 in every 9 of the search lines in each Word base. Exact correlations can therefore be created for each line that is examined in its original and randomised state.
Tuesday 16th November 2004- The French analyses are complete and I have the computers running once more. They are testing randomisation when the word base is in English.
Since there are a different number of words in the English and French data bases it needs to be highlighted that my comparison for different languages is of form rather than quantity. It is difficult to know at this stage what impact trebling the word base has, it is highly unlikely to produce 3 times as many anagrams (per 100k letters) and is unlikely to have a definable factor. However in the two bases I have used this factor for non-redunadant anagrams seems to be between 1.40 (unique word-forms only) and 1.55 (non-unique forms included- e.g repetitions such as rat, art, tar).
The English word base I have used is so large because it includes a vast number of technical names and terms from a range of sciences and human endeavours. These are not part of the French word-base although many (such as chemical names) would be identical in each.
The reason for running the French word base is that a pattern seems to have emerged from the English analysis - it shows a distinct difference in the sources that is consistent over the full range of word-lengths.
The graphs below comparing the new French analysis to the earlier English one use a slightly different format to that used earlier- (logarithmic columns where each interval is ten times the one below it). They are divided into two groups (Count 1 and Count 2) and the results for the English and French word bases are shown alongside each other.
Significantly, when looking at the new results using a French lexicon base, there is no change in the order of the sources and there is no change in the consistency of the result over the range of word-lengths.
One reason for running the program using the French data was to test whether the low result for the English sources was due to something inbuilt into my program that prejudiced the result against the host language. These graphs show that this is not the case- the English sources are still poorly performed while all the French sources are very highly performed. Dante's English version performs worst in both languages. This English version of Dante was tested as a form of control (highly improbable that it is in code) and its placement provides no reason to dismiss the theory that these anagrams are revealing the potential for hidden code. And Shakespeare's Sonnets support this view- they give a better comparative performance under English analysis than in French. We can therefore conclude that on the sample tested my methodology produces consistent outcomes across various languages.
There are still important issues to be resolved because at this point we have observed a difference but not offered any reason as to why the English and French sources differ in the counts found in both word bases. However English is built on a historical basis and offers many more alternative word forms than French. These alternatives are derived from a whole host of invader-based sources and therefore English is likely to perform well across many languages.
In the next section of this paper I look at the classes of words in Nostradamus text in order to discover whether there are imbalances or themes that support the idea of coding. In the rst of this paper I look at the impact of randomising letters for those more interested in coding methodology.
Sunday 21st November 2004- I have spent the last few days trying to understand and then present the results of my random-letter analyses.
My hope was to produce a formula for the measurement of coding in any text. Although I cannot provide a totally definitive measure I believe the following provides a fair comparison basis Adjusted Anagram Count= Anagram Count x Word-Base-Size-Factor (1.55 for French, 1.0 for English).
I believe that it is essential in any measurement for any data to be assessed in the following ways.
The first two allow a measure to be made of the randomised and non-randomised generation of anagrams.
My results from randomising lettering.
The purpose of my randomised lettering test was to establish a base level for anagrams generated from a randomly ordered lettering set..
The first method I applied was to randomise the order of the lettering in the line based on a random number generator. This caused a marked drop in the count for every source. However, I realized this form of generator might well preserve syllabic structures (just as random shuffling of cards can preserve flushes etc.). In order to break these down I used a second method.
The second method involved dividing the letters into four (1,5,9 etc, 2,6,10 and so on) then placing these groups one after the other. The result of this was a further significant fall in the number of anagrams.
This analysis has an immediate import on the relevance of Nostradamus text. The text I have used for the analysis is derived from Erika Cheetham's "The Final Prophecies of Notradamus" -Warner Books -1993. In her preface she says "Note to Reader: The French text of the quatrains reproduces as closely as possible that of the original edition of 1568 (Benoist Rigaud, Lyon).."
Now this text is most peculiar, replacing u's with v's and v's with u's and using words and using many spellings no one has been able to attribute to normal patterns. e.g. The fourth line says "Fait psperer q n'est a croire vain." It would be expected that on the basis of this oddity throughout the text that Nostradmus text should be below the other texts, not above them, when any anagram count is taken. I can conclude on the basis of the randomisation effect that the high count for Nostradamus' text is likely to be understated .
There is also a significant pattern to be found when looking at the results for same language Text and Word-base versus ones that are different . This is much easier to interpret after a fair allowance is made for the difference in Word-base size (I applied a factor of 1.55). A consistent fall applies when analysed with a word-base in a language different from the source. This fall persists even when the line of text is randomised. The result using my two methods of randomisation lead to similar values and imply that there is a 1 in 6 decrease in count level whenever measuring English and French. This is independent of the direction (i.e Enlish Text to French base, French text to English word base).. This has to be due to structural constraints between the two languages and in particular in the frequencies of letter usage.
Count of words with more than 6 letters -Non-Redundant anagrams/100k letters
in original order
Text randomised by 4 way split
Although there is a smaller difference in this ratio for the Nostradamus' data it isn't enough to conclude that Nostradamus' text is coded in any language other than French (if at all).
However, this fall in count for language also implies that the count for Nostradamus' text is understated. His text is known to include Provencal, Latin and a variety of language variations. At best there impact should be neutral but in all probability they would mean the count given is understated.
There is a distinct difference in the results for the English sources and the French sources. Part of this arises because the English base is less English than might be expected. It incorporates many French words that are used in English. There are 13, 870 words that are common to the French and English data bases.
When these words are removed and each data base is showing its true language characteristics there is still a bias against anagrams in the English sources. A significant shift is however seen in the ratio's of same to different anagram counts. Nostradamus' text falls further behind implying that it is less pure in its use of the French Language.
The difference between the English and French counts may still include a structural component but even so the results show that once more Nostradamus' text is different to the other French texts. Once more through its high anagram counts it consistently points to the possibility of anagrammatic code.
At this point I would conclude that contrary to expectations Nostradamus' text has withstood my analysis. It consistently outperforms the other texts, across all categories. Further the nature of the text implies that this result is made more relevant because there are valid reasons for believing the count to be understated.
These conclusions imply that the analysis should be taken a stage further with there still being a possibility that Nostradamus' text is not in French and may be multilingual.
In this part of the analysis I have looked solely at the statistical nature of the anagrams, devoid of any relevance. The next step is to analyse the words that have been uncovered and see if they hold some measurable significance. The next stages will perform the following:
|Author: Allan Webber||Email: (Click Here)||nostradamusdecoded.com||Blog: Nostratextor|