ICS-33留学生语言辅导、Program程序设计讲解、Python编程辅导 讲解数据库SQL|调试Matlab程序

- 首页 >> Python编程
10/19/2020 Program 1
https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 10/14
Deterministic
FA
Write the required functions and script that solve, for a Non-Deterministic Finite Automaton, the same problem that was solved for
a Deterministic Finite Automaton in Problem #3 (above). Read about the differences between these two automata (below). Hint:
Adapt your code for the FA problem to solve the more general NDFA problem.
A non-deterministic finite automaton (NDFA) is machine described by its states and its transitions: each transition for a state
specifies an input and a set of states (more than one is allowed) that input can lead to: sets with more than one states is what
makes it non-deterministic. We can illustrate a NDFA as a graph with state labels in circles and edge labels for transitions (see
below). The critical difference between an FA and an NDFA is that an NDFA can have multiple edges with the same label going to
different states (we'll see how to represent and simulate such transitions below).
Input and Output:
Read a file that describes a NDFA: each line contains a state and an arbitrary number of input->state transitions. Build a dictionary
such that each key is a str state and whose associated value is another dictionary specifying all the transitions from that state: this
second dictionary has keys that are str inputs and associated values that are sets of str states: all the states a particular input can
lead to. The first token on each line is the str state and the remaining tokens (always coming in pairs) are str inputs and states. All
tokens (which can comprise any number of characters) are separated by one semicolon character. We annotate this dictionary as
{str:{str:{str}}}.
For example, the input file ndfaendin01.txt contains the following lines (which could appear in this order, or any other and still
specify the same NDFA):
start;0;start;1;start;0;near
near;1;end
end
Here is a picture of the endin01 NDFA. It graphically illustrates the three states (start, near, and end) and their transitions, using
inputs (0 and 1).
Here, the state start is a key in the main dictionary. It's value is a dictionary with two key/value pairs: 0 mapping to the set
containing start and near, and 1 mapping to the set containing just start. It means that in the start state, if the input is a 0 the
NDFA can stay in the start state or it can go to the near state; if the input is a 1 the NDFA must stay in the start state. And
similarly the next line means that in the near state, if the input is a 1 the NDFA must go into the end state. The last line means that
the end state has no transitions out of it.
Print the NDFA, one state (and its transitions) per line; the states are printed alphabetically and the transition dictionary for each
state is printed as a list of input/set of state items (2-tuples) such that these are printed alphabetically by the inputs, and the set of
states for each input is printed as an alphabetically sorted list (e.g., near comes before start). Note that the state end is a key in the
main dictionary, whose associated transitions are an empty dictionary.
For example, the file above would produce:
Non-Deterministic Finite Automaton Dump: state: list of transitions
end transitions: []
near transitions: [('1', ['end'])]
start transitions: [('0', ['near', 'start']), ('1', ['start'])]
Note that there are multiple data files for this program: ndfaendin01.txt and ndfatrain.txt and ndfare.txt;; test/debug your
program on the first file; when you are done, test it on the last file. Draw the FA represented by each for to ensure that your code
correctly prints and computes with it.
Next, repeatedly read and process lines from a second input file, computing the results of the non-determinisitc finite automaton on
the specified start-state with its inputs ; then print out the results in a special form. Each line in the file contains a start-state
followed by a sequence of inputs (all separated by semicolons). The start-state will be a state in the FA (it is a key in the outer
dictionary) the inputs may specify legal or illegal transitions (may or may not be keys in some inner dictionary).
10/19/2020 Program 1
https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 11/14
For example, the input file ndfainputendin01.txt contains the following two lines:
start;1;0;1;1;0;1
start;1;0;1;1;0;0
For example, the first line means, the start-state is start and the inputs 1, 0, 1, 1, 0, and 1.
The result of processing each line is to print the start-state, and then each input and the new states (plural) it could transition to (the
could is what makes it non-deterministic), and finally print the stop-states. For the ndfaendin01 NDFA and the first line in this file,
it should print
Start state = start
Input = 1; new possible states = ['start']
Input = 0; new possible states = ['near', 'start']
Input = 1; new possible states = ['end', 'start']
Input = 1; new possible states = ['start']
Input = 0; new possible states = ['near', 'start']
Input = 1; new possible states = ['end', 'start']
Stop state(s) = ['end', 'start']
Note that the set of states it might be in are printed as an alphabetized list. Also note especially that in the start state, if the input is
a 0, then the NDFA can either remain in the start state or go into the near state. For this program, we keep track of all states that
the NDFA can be in, using a set of new possible states. For the next input, 1, we can be either in the start state (from the start
state; an input of 1 allows us to stay in the start state) or the end state (from the near state; an input of 1 allows us to transition to
the end state). Thus, we keep track of the set of states the NDFA can be in, and the new set of states the NDFA can be in after
processing the next input. In this example, because 'end' is included in the stop-states, this input does end in 01.
For any state that does not have a transition specifying the current input, ignore that input for that state. For example, if near is one
of the possible states and 0 is the input, ignore the 0 for the near state.
Functions and Script:
Write the following functions and script. I am providing line counts for these function bodies not as requirements, but to indicate
the lengths of well-written Pythonic code.
read_ndfa has an open (file) parameter; it returns the dictionary representing the non-deterministic finite automaton; hint: I
used splicing and the zip function to build the inner dinctionaries, and I called the setdefault function for the inner dict:
alternatively I could have built it as defaultdicts from the standard collections module (body is 9 lines).
ndfa_as_str has a dictionary parameter (representing the FA); it returns a multi-line string (each line is ended by '\n'), which
when printed shows the contents of the NDFA in the appropriate textual form: sorted alphabetically by state, with a state's
transitions sorted by their input values, and sorted by states if an input results in multiple states (body is 4 lines; can you do it
in 1?).
process has a dictionary parameter (representing the NDFA), a str parameter (representing the start-state), and a list
parameter (representing a list of str inputs); it returns a list that contains the start-state followed by tuples that show the
input and resulting set of states after each transition. For the example shown above, process returns the following list.
['start', ('1', {'start'}), ('0', {'near', 'start'}), ('1', {'end', 'start'}), ('1', {'start'}),
('0', {'near', 'start'}), ('1', {'end', 'start'})]
Finally, remember that if an input is illegal for the current state (is not the key in some transition for the current state), just
ignore it. But if the input leads to no possible states (the empty set of states) terminate processing there (body is 12 lines).
interpret has a list parameter (the list result produced by process); it returns a multi-line string (each line is ended by '\n'),
which when printed illustrates the results of processing an NDFA on an input in the appropriate textual form. Note that in
this output the sets computed in process appear as lists sorted alphabetically by state. See how it prints the example list
argument shown above in the Sample Interaction below (body is 5 lines).
Write a script at the bottom of this module (in if __name__ == '__main__':) that prompts the user to enter the file describing
the NDFA, prints it, prompts the user to enter the file containing lines of start-states and input, and simulates the NDFA on
each line, printing the results in the appropriate textual form (body is 7 lines).
Sample Interaction:
The program, as specified, will have the following interaction: user-typed information appears in italics. Your output should
"match" this one.
Provide the file name storing a Non-Deterministic Finite Automaton: ndfaendin01.txt
Non-Deterministic Finite Automaton Dump: state: list of transitions
end transitions: []
near transitions: [('1', ['end'])]
start transitions: [('0', ['near', 'start']), ('1', ['start'])]
Provide the file name storing a sequence of start-states and their subsequent inputs: ndfainputendin01.txt
10/19/2020 Program 1
https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 12/14
Tracing NDFA (from a start-state)
Start state = start
Input = 1; new possible states = ['start']
Input = 0; new possible states = ['near', 'start']
Input = 1; new possible states = ['end', 'start']
Input = 1; new possible states = ['start']
Input = 0; new possible states = ['near', 'start']
Input = 1; new possible states = ['end', 'start']
Stop state(s) = ['end', 'start']

Tracing NDFA (from a start-state)
Start state = start
Input = 1; new possible states = ['start']
Input = 0; new possible states = ['near', 'start']
Input = 1; new possible states = ['end', 'start']
Input = 1; new possible states = ['start']
Input = 0; new possible states = ['near', 'start']
Input = 0; new possible states = ['near', 'start']
Stop state(s) = ['near', 'start']
In Week #2 of this course we will cover EBNF and regular expressions, which relate to the files below. You can run these files on
your code to ensure they produce the correct results.
The ndfatrain.txt file is a non-deterministic finite automaton that determines whether or not a train (a sequence of characters
representing different kinds of cars) is a legal train according to Chapter Exercise #7 in the ENBF lecture. Its input file is
ndfainputtrain.txt, which starts with a legal train (one that ends with the state done as one possible state) followed by an illegal
train (one that does not end with the state done as one possible state).
The ndfare.txt file is a non-deterministic finite automaton translation of the regular expression ((a*|b)cd)+. Its input file is
ndfainputre.txt, which starts with a match (one that ends with the state last as one possible state) followed by a non-match (one
that does not end with the state last as one possible state).
Problem #5:
Synonym Finder
Problem Summary:
Write the required functions and script that prompts the user to enter the the names of any number of text files; reads these text
files (creating one special semantic dictionary); optionally prints the dictionary in a special form; prompts the user to enter the
name of a file containing a series of synonym problems (given a word, which of a series of words is its best synonym); then uses
the semantic dictionary to try to solve each synonym problem, printing the results of each problem in a special form, and finally
printing the percentage of problems that it solved correctly.
The program will read the text file(s) to learn which words are similar to which others: each word in the semantic dictionary is
associated with a context dictionary storing the words that appear in the same sentences (and how often they appear in the same
sentences). The program will use a metric function to determine how similar any two words are by comparing their context
dictionaries. By this process, the program can determine whether two words are more or less likely to be synonyms.
Input and Output:
After repeatedly prompting for the text file names (ensuring each one can be read: try to open it and handle any exceptions) read
each file, a sentence at a time, building a semantic dictionary of the form {str:{str:int}}. The outer keys of this dictionary are all
the words in the text files; each word is associated with an inner context dictionary whose keys are all the other words that
appear in sentences with the outer word; each inner key is associated with the number of times it appears in sentences with the
outer key. All words in the sentences should be converted to lower-case letters and all punctuation should be removed; certain
common English function words should also be removed as "noise words".
An easy way to read the files, one sentence at a time, is to iterate over the result returned by repeatedly calling the helper
function sentence_at_a_time (which I have supplied at the top of synonym.py). When passed an open file to read from and a
set of words to ignore, this function returns an object that we can iterate over -one sentence at a time- using a for loop. For
example, if a file named trivial.txt contained the text
I went to the gym this
morning. Later in the
morning I rested; I was tired!
then executing the code
for s in sentence_at_a_time(open('f.txt'), {'in','the','to'}):
print(s)
would print
['i', 'went', 'gym', 'this', 'morning']
['later', 'morning', 'i', 'rested']
['i', 'was', 'tired']
Note that sentence_at_a_time will correctly return full sentences (each as a list of lower-case words), whether they appear on
one or multiple lines. For now, just assume that this function works correctly (you can certainly experiment with it on small text
10/19/2020 Program 1
https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 13/14
files); later in the quarter we will be able to understand its body: when we learn about regular expressions (in week 2) and
implementing iterators with generator functions (in week 4).
Print all the associations in the semantic dictionary, one per line in standard lexical order (with the inner dictionaries also printed
in standard lexical order); after printing all associations, print the length of the smallest and largest inner dictionaries, using the
following form.
For example, the file trivial.txt shown above (ignoring the words in, the, and to) would print as:
Semantic Dictionary
context for gym = i@1, morning@1, this@1, went@1
context for i = gym@1, later@1, morning@2, rested@1, this@1, tired@1, was@1, went@1
context for later = i@1, morning@1, rested@1
context for morning = gym@1, i@2, later@1, rested@1, this@1, went@1
context for rested = i@1, later@1, morning@1
context for this = gym@1, i@1, morning@1, went@1
context for tired = i@1, was@1
context for was = i@1, tired@1
context for went = gym@1, i@1, morning@1, this@1
min/max context lengths = 2/8
Most of the words in these sentences are unique, but the words i and morning appeared in two sentences, so they have the
largest inner dictionaries: morning appears in 2 sentences with i, so i also appears in 2 sentences with morning. Also note that
the context for any word will not contain that same word.
Read a file that contains a series of synonym problems (one per line) with each line in the following form:
A word
Any number of other words separated by spaces: the choices for its possible synonyms
The correct choice of the synonym
For example, one line in this kind of input file might look like
draw walk eat paint paint
indicating the problem is to find the synonym of draw from the choices walk, eat, and paint, where the correct answer is paint.
For each line in the file of synonym problems, it will print either that (a) it found the correct synonym; (b) it failed to find the
correct synonym, because it chose incorrectly; or (c) it failed to find the correct synonym because the metric function raised an
exception on one of the possible synonyms.
In these cases it will display a message like one of the following (note that here the strings appear in quotes; Hint: what is the
differeence between the str and repr functions? Which does print use by default):
Correct: 'picture' is most like 'painting' from ['painting', 'chair']
Incorrect: 'duty' is most like 'task', not 'example' from ['task', 'example']
Metric failure: could not choose synonym for 'earnest' from ['serious', 'amusing']
Functions and Script:
Write the following functions and script. I am providing line counts for these function bodies not as requirements, but to indicate
the lengths of well-written Pythonic code.
build_semantic_dictionary has a list of open files parameter (the training file(s)), and an open file parameter (words to
ignore file); it returns a semantic dictionary ({str:{str:int}}), in which each unignored word is associated with a context
dictionary whose keys are all the unignored words that appear in the same sentences with it, and whose associated value is
the number of times each key appears in the same sentences as the word: in the sentence I really really like you., the
context dictionary for each word in this sentence will count really twice (except the context dictionary for the word really
doesn't contain that word). Hints: Process each training file by reading it (using sentence_at_a_time, which is passed a set
of all the words to be ignored: remember to rstrip them when reading the file); process each word in a sentence by
incrementing the count in its context dictionary of every other word in the sentence. Recall that a word that is a key in the
semantic dictionary should not be a key in its own context dictionary (body is 9 lines containing four nested loops).
dict_as_str has a dictionary parameter (representing the semantic dictionary); it returns a multi-line string (each line is
ended by '\n'), which when printed shows the contents of the semantic dictionary followed by the max/min context
dictionary lengths in the appropriate textual form shown above (body is 5 lines).
cosine_metric has two context dictionaries as parameters; it returns a float indicating how close are the context
dictionaries (a higher number is better). Here is the formula for computing the cosine_metric
10/19/2020 Program 1
https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 14/14
The formula will always have a value between 0 and 1. Note that if either context dictionary is empty, the denominator
will be 0 and Python will raise a ZeroDivisionError exception, which this function should propagate, not handle with a
try/except.
For example, if cd1
= {'a':1, 'b':2, 'c':3} and cd2
= {'a':5, 'c':7, 'd':8} then we would calculate the cosine_metric as.
Hint: recall the get method on dictionaries can supply a default value to return if the key is not in the dictionary.
most similar has a str parameter (the target word, to find the synonym of), a list of str parameter (the synonym
candidates), a semantic dictionary parameter (of the context dictionaries for words), and a metric function parameter (of
which cosine_metric is one example); it returns the synonym candidate whose metric is largest when compared with the
target word. This function should raise the ZeroDivisionError exception if any word has no associated context dictionary
in the semantic dictionary.
similarity_test has an open file parameter (the file of synonym problems), a semantic dictionary parameter, and a metric
function parameter; it returns a descriptive str showing the results of all its attempts to solve synonym problems (whose
last line is the percentage of problems it solved correctly). See the three types of results illustrated in the section above
(body is 16 lines).
Write a script at the bottom of this module (in if __name__ == '__main__':) that prompts the user for (a) the names of the
training files (the default answer should be no-more; and unopenable files should be rejected) then builds the semantic
dictionary from these files, (b) whether to print the semantic dictionary, (c) the name of the synonym problem file
(rejecting any unopenable file - here use goody.safe_open) then attempts to solve all these problems and prints all the
results.
Sample Interaction:
The program, as specified, will have the following interaction: user-typed information appears in italics. Your output should
match the form of this one. It took my code about 5 seconds to execute build_semantic_dictionary.
Provide the name of a text file for training \(or type no-more\)[no-more]: bible.txt
Provide the name of a text file for training \(or type no-more\)[no-more]: archie_comics.txt
file named archie_comics.txt rejected: cannot be opened
Provide the name of a text file for training \(or type no-more\)[no-more]: war_and_peace.txt
Provide the name of a text file for training \(or type no-more\)[no-more]:
Dump (True or False) Semantic dictionary?[False]:
Enter name of problem file[synonym_problems.txt]: simple_problems.txt
Correct: 'earnest' is most like 'serious' from ['serious', 'amusing']
Incorrect: 'picture' is most like 'painting', not 'table' from ['house', 'painting', 'table', 'chair']
Incorrect: 'vexed' is most like 'annoyed', not 'amused' from ['amused', 'annoyed']
Correct: 'watch' is most like 'see' from ['hear', 'see', 'smell']
Metric failure: could not choose synonym for 'thief' from ['banker', 'robber', 'postman']
40.0% correct
It took my code about 8 seconds to execute the batch self-check file for this assignment. The "slow" lines were 13 (a method you
write) and 23 (loading a huge file that I have precomputed for you, used in many later tests).

站长地图