mirror of
https://github.com/caperren/school_archives.git
synced 2025-11-09 21:51:15 +00:00
84 lines
4.7 KiB
Plaintext
84 lines
4.7 KiB
Plaintext
# eventkwic.txt by Corwin Perren
|
|
|
|
###########################################
|
|
########## Specification Writeup ##########
|
|
###########################################
|
|
My specification for the Kwic class is as follows. Once Kwic is constructed, it waits for a method call. If no text has
|
|
been added to the implementation and either index or listPairs is called, they will return empty arrays. Once text is
|
|
added using addText, that text is appended to a class variable as one long string. It is done this way to make sure that
|
|
lines that may be broken up into multiple addText calls still work properly. Now that there is text to process, a call
|
|
to index will process the whole string that's been stored and return the proper kwic index. If listPairs is called
|
|
instead, it will internally run index, but not return the indexed data, so that the new text is processed. It then
|
|
creates the pairs and returns the array. Index works semi-incrementally. Rather than keeping track of every line of text
|
|
ever added to the class, index processes the new data into the array of tuples (again, just for the new input), appends
|
|
it to the class's global array, and then resorts it. In this way, we avoid having to recompute the sentences that have
|
|
already been processed, minus having to re-alphabetize. I chose to not re-compute the sentences each time there was a
|
|
new call to addText as that seemed like a waste of cpu cycles. It's basically a trade off of having a call to index
|
|
be fast, or having the call to addText be fast. One potential option I considered was to use multi-threading to
|
|
continually process the input text for both indexing and listing pairs. While this would have greatly increased the
|
|
complexity (especially where eventspec is concerned), it also would have resulted in a faster implementation.
|
|
|
|
I used the eventspec class and the kwic.fsm state machine to verify that all of the above processes happen correctly,
|
|
including handling the extra constructor fields for periodsToBreaks and ignoreWords when they are used. By printing out
|
|
the steps taken through processing, it is very easy to see the program take the correct steps. In the case that a
|
|
catastrophic state logic error has occurred, the EventSpec class will stop execution with a trace statement, making
|
|
it very easy to diagnose what incorrect step was taken and correct it. I decided not to place event calls where there
|
|
was the potential for large loops to keep the log of steps taken clear and concise. Had I placed further logic to handle
|
|
these loops, there could potentially be hundreds, thousands, or more logged steps that would make debugging difficult.
|
|
In a sense, using EventSpec performs a similar function to unit testing as if you ever change your code in a way that
|
|
causes it to skip a step, or produce fatally incorrect output, the class will let you know roughly where the changed
|
|
code is, so therefore a good idea about what went wrong.
|
|
|
|
#####################################
|
|
########## Trace Output #1 ##########
|
|
#####################################
|
|
# Code run
|
|
kc = Kwic(periodsToBreaks=True)
|
|
kc.addText("This pair? is good.\n So is this pair and that pair")
|
|
kc.index()
|
|
kc.reset()
|
|
kc.addText("This pair? is good.\n So is this pair and that pair")
|
|
kc.listPairs()
|
|
kc.print_eventspec_log()
|
|
|
|
# Trace Output
|
|
STEP #0: callConstructor --> idle
|
|
STEP #1: callAddText --> idle
|
|
STEP #2: callIndex --> processIndex
|
|
STEP #3: callProcessIndex --> checkIfText
|
|
STEP #4: callSplitPeriods --> splitIntoTuples
|
|
STEP #5: callSplitAsTuples --> fillCircular
|
|
STEP #6: callFillCircular --> checkIgnoreOrAlpha
|
|
STEP #7: callAlphabetize --> idle
|
|
STEP #8: callReset --> idle
|
|
STEP #9: callAddText --> idle
|
|
STEP #10: callListPairs --> processListPairs
|
|
STEP #11: callProcessListPairs --> listPairsIndexOrList
|
|
STEP #12: callProcessIndex --> checkIfText
|
|
STEP #13: callSplitPeriods --> splitIntoTuples
|
|
STEP #14: callSplitAsTuples --> fillCircular
|
|
STEP #15: callFillCircular --> checkIgnoreOrAlpha
|
|
STEP #16: callAlphabetize --> idle
|
|
STEP #17: callCreateListPairs --> idle
|
|
|
|
#####################################
|
|
########## Trace Output #2 ##########
|
|
#####################################
|
|
# Code run
|
|
kc = Kwic(ignoreWords=["and", "So"])
|
|
kc.addText("This pair? is good.\n So is this pair and that pair")
|
|
kc.listPairs()
|
|
kc.print_eventspec_log()
|
|
|
|
# Trace Output
|
|
STEP #0: callConstructor --> idle
|
|
STEP #1: callAddText --> idle
|
|
STEP #2: callListPairs --> processListPairs
|
|
STEP #3: callProcessListPairs --> listPairsIndexOrList
|
|
STEP #4: callProcessIndex --> checkIfText
|
|
STEP #5: processNewlineSplit --> splitIntoTuples
|
|
STEP #6: callSplitAsTuples --> fillCircular
|
|
STEP #7: callFillCircular --> checkIgnoreOrAlpha
|
|
STEP #8: callRemoveWords --> removingWords
|
|
STEP #9: callAlphabetize --> idle
|
|
STEP #10: callCreateListPairs --> idle |