October 2016 – Luke Starnes

As part of a recent PyData talk, I did some timing tests to compare the performance of various methods of translating a large number of geographic locations represented as ECEF XYZ to latitude / longitude / altitude.

I ran this for both 10,000 points and 1,000,000 points using the following methods:

Native Python (using built-in lists)
Numpy
Pandas
Numba (using Numpy array then Pandas DataFrame)
Numexpr (using Numpy array then Pandas DataFrame)
Cython (with Numpy array)
Cython Parallel (with Numpy array)

You can find the code here in a Jupter notebook.

The results are below. There is an issue worth noting (that I haven’t had time to run to ground) and that is that clearly the parallel Cython implementation is not correct since it is virtually identical in timing to the non-parallel implementation.

The Rise and Fall of the Third Reich

William Shirer was a foreign correspondent with UPI and CBS stationed in Europe before and during World War II. After the war he wrote The Rise and Fall of the Third Reich which was first published in 1960. As the title suggests, the book covers Hitler’s and the Nazi’s rise to power through their downfall at the end of World War II. It is a brutal and agonizing journey.

Here is Shirer’s closing:

The guns in Europe ceased firing and the bombs ceased dropping at midnight on May 8-9, 1945, and a strange but welcome silence settled over the Continent for the first time since September 1, 1939. In the intervening five years, eight months and seven days millions of men and women had been slaughtered on a hundred battlefields and in a thousand bombed towns, and millions more done to death in the Nazi gas chambers or on the edge of the S.S. Einsatzgruppen pits in Russia and Poland – as the result of Adolf Hitler’s lust for German conquest. A greater part of most of Europe’s ancient cities lay in ruins, and from their rubble, as the weather warmed, there was the stench of the countless unburied dead.
No more would the streets of Germany echo to the jack boot of the goosestepping storm troopers or the lusty yells of the brown-shirted masses or the shouts of the Fuehrer blaring from the loudspeakers.
After twelve years, four months and eight days, an Age of Darkness to all but a multitude of Germans and now ending in a bleak night for them too, the Thousand-Year Reich had come to an end. It had raised, as we have seen, this great nation and this resourceful but so easily misled people to heights of power and conquest they had never before experienced and now it had dissolved with a suddenness and a completeness that had few, if any, parallels in history.

A painful part of our collective history.

To state the obvious, this book is filled with a multitude of characters and I thought it would be interesting to see who plays large rolls and, since the book (for the most part) proceeds chronologically, when the characters come on and off the stage. This led me to parse and analyze the text. I parsed the 1990 edition which is 1,029 pages. Below is a textual analysis of this book.

I parsed and analyzed the text using python and pandas. I used a superset of stop words from here and here. The graphs are in plotly. And the word clouds were made using Andreas Mueller’s generator. All of the code is in github.

Some basics about the text:

Number of words: 571,387
Number of words (sans stop words): 244,881
Number of unique words: 22,748
Number of unique words (sans stop words): 22,266

Month: October 2016

Python Timing Compare

PyData Talk

A Textual Analysis – The Rise and Fall of the Third Reich

Recent Posts

Categories

Archives