Added work from my other class repositories before deletion

This commit is contained in:
2017-11-29 10:28:24 -08:00
parent cb0b5f4d25
commit 5ea24c81b5
198 changed files with 739603 additions and 0 deletions

View File

@@ -0,0 +1,26 @@
From an architectural point of view, very little is different between the fast version of kwic and the baseline
implementation. However, the one architectural difference that is there is significant in terms of speed, the point of
creating this version in the first place. In the baseline kwic, the code acts almost entirely like a single black box
block. Data in to data out, all done in pure python. The fast version's biggest gains came from changing a core aspect
of how some of the loops were being done, in terms that they are now handled by compiled c code rather than running in
native python. This change was the use of the map function, an alternative to the traditional loop in python.
While this is an architecturally significant change, as we're now expanding the number of boxes AND languages, is also
came with the speed benefits of running code in c. This speed jump was most noticeable when used in the function that is
called as an argument "key" in the alphabetization sort calls. Another place where changes were made that helped
increase speed, though didn't necessarily change the code architecturally, was in loops where inherited methods were
called. So, for example, I originally had many loops that called "array_name.append()". This is slow because every time
the interpreter comes across that line in each loop, it has to process "array_name" to determine whether it contains
append, and what append is. By aliasing "array_name.append" as a new variable like so "aliased = array_name.append" the
interpreter only has to perform that lookup once. Then, by replacing the calls in the loop with my alias, I save that
lookup time for each iteration. While this didn't result in a massive increase like the use of map did, it did help
enough to mention. A few other minor changes were also made to help shave a couple tenths of a second off here and
there. There were many places where I was needlessly making copies of large arrays of data, simply to make code
readability better, but which took both time and ram to accomplish. By streamlining the use of existing variables, I
again managed to save little bits of time here and there. Now, once it came to enabling listPairs, the other very slow
part of the code, I decided to change the way that the words were sanitized as that's where the most speed loss was
found. In the original version, I used a join command on an empty string that stripped out unwanted characters
essentially rather brute forced. For the fast version, I learned of a way to perform the same task using the python's
built in translate function. This change nearly halved the time it took for just the listPairs section of the code to
run. In order to test all of this, I used python's built in cProfile tool which times every user written function call,
as well as aggregates the times of all python's built in ones, and separately. This made it very easy to see what parts
of the code needed to be focused on for speed increases.

View File

@@ -0,0 +1,14 @@
Compared to the original kwic, the testing version of kwic is quite different. The focus of this version was to pull out
much of the code from the main into functions so that sensitive portions could be easily tested separately, as well as
be changed more easily. From an architectural point of view, the original kwic is very much a traditional black box
approach. Data and flags in, one tiny function call used for alphabetization (as I had trouble with it and needed to
pull it out), and the final data came out. The testing version on the other hand is approximately 25% longer in terms of
pure code length, and is split into ten separate functions rather than the two for the original kwic. By splitting the
core features of the kwic system into these functions, it made testing the development of the code that much easier.
Rather than having to run through all the code up to the point I wanted to test, I could simply only call the functions
for features I was actively testing. Of course, adding all these extra function calls did affect performance slightly,
though not as much as I was expecting. This version is only marginally slower than the baseline implementation. I also
considering adding a flag to kwic that would enable debugging print statements, but I decided against it as (at least
in my personal experience) adding tons of print statements isn't specifically always helpful. Generally, you only need
printing for a particular section of code, which could easily be manually added now that the important parts of the code
are broken out.