Specification
SYSTEMS AND METHODS FOR SEMANTIC KNOWLEDGE ASSESSMENT, INSTRUCTION, AND ACQUISITION
CROSS-REFERENCE TO RELATED APPLICATION
[0001] This application claims priority to pending U.S. Provisional Application No. 60/668,764, filed April 5. 2005. And incorporated by reference (Attorney Docket No. 581458001 US).
TECHNICAL FIELD
[0002] The following disclosure relates generally systems and methods for semantic knowledge assessment and Instruction.
BACKGROUND
[0003] The field of linguistics includes numerous pedagogical theories and methods related to language acquisition. Many of the conventional theories and methods are directed to rule-based grammatical concepts or processes. The standard grammar-translation method, for example, focuses on learning the syntax and structure of sentences. This method assumes that once students have sufficiently learned the grammatical rules for constructing sentences, they will be able to slot-in appropriate vocabulary as needed to generate meaningful language. For example, the audio lingual method (based on habit-formation) focuses primarily on syntactic structures, and vocabulary words are taught only as they would occur within the various stmcture. More recent research has focused on other grammatical features, such as the developmental sequence, the role of input, and/or the role of instruction in language acquisition.
[0004] Lexical concepts and vocabulary learning and instructional methods have historically been viewed as ancillary to mainstream language acquisition theories. However, while mainstream linguists remain primarily focused on grammatical concepts and approaches, another small subset of linguistic researchers and practitioners have focused on language acquisition from a predominantly lexical perspective.
[0005] Early lexical research, for example, attempted to develop an understanding of the number of words people know. This required defining both (a) what constitutes a word, and (b) what it means to know a word. Based on one predominant definition of what constitutes a word, there are about 180,000 words in the English language. The following chart, for example, outlines the relationship of frequency of English words to the coverage of running text in the Brown Corpus:
[0006] As shown in the chart above, about a quarter (24%) of all the words in English text are to be one of the 10 most frequent English words. The chart further demonstrates that as words become less frequent, their contribution to the text coverage decreases. In fact, the 100 most frequent English words account for almost half (49%) of ail the words in written English text. For example, the most common word in the English language, "the," occurs about 6 times in every 100 words of general text.
[0007] While most research and findings primarily focused on first language acquisition, there are Implications for second language acquisition as well. For example, early research suggested that native sepulchers have vocabularies of well over 150,000 words and, therefore, the direct study of words did not offer a practical route to language acquisition. Later research, however, determined that native vocabularies likely range from only about 10.000 to 20,000 words. Thereafter, the notion that benefits could be derived from the direct study of words gained credibility. Other researchers have looked into which vocabulary words English-as-a-second-language students should learn and how the vocabulary words might best be ranked in order of importance.
[0008] Some conventional lexical systems, for example, Include organizing vocabulary words by frequency as to a corpus or sulxlomain thereof. A corpus can consist of millions of pages of text of a given language. A sub-domain is a special purpose lexical item subset within a given language (e.g., American road signs, vocabulary and terms used in finance professions, vocabulary and tens used by information technology workers, etc.). Conventional lexical systems rely predominantly on word frequency in a copious in making determinations as to what constitutes level-appropriate study material for a given language or sub-domain thereof. For example, publishers have issued (a) level-adjusted graded readers that include only the first 1000 most frequent English words from a general corpus, and (b) word list books that present all of several thousand English words that might occur on a typical TOEIC English language proficiency examination.
[0009] Conventional lexical systems, however, include a number of drawbacks. One drawback with many conventional systems, for example, is that the published word lists do not take into account words that particular Individuals or groups of individuals may already know. As such, the words lists can include many hundreds, if not thousands, of words that a learner is already familiar with and, therefore, the lists are only marginally helpful in language acquisition because there is little or no advantage in studying known words. Rather, it is the study and acquisition of unknown lexical items that is most beneficial to attaining higher levels of communication ability and overall language ability. This same phenomenon holds true for other types of lexical Items, for example, sounds, utterances, multi-word-units, idiomatic expressions, images, signs, symbols, multi-symbol-units, programming code, each of which symbolizes, or serves to convey, a meaning within a language or sub-domain thereof.
[0010] Another drawback with conventional lexical systems is that there is no way to quickly and accurately identify the specific lexical items within a given language or language sub-domain that are recognizable and/or unrecognizable to an individual. For example, there are many hundreds of high frequency English words that have low probabilities of recognition by individuals, demographic segments, and/or populations. Conversely, there are many hundreds of low frequency English words that have high probabilities of recognition by individuals, demographic
segments, and/or populations. Conventional systems, however, cannot identify and separate the recognizable Items from the unrecogni2at)le items.
[0011] Conventional local systems also include a number of other drawbacks. For example, conventional systems generally do not measure and assess (a) the relative importance of each individual's unrecognizable lexical items, and (b)the lexical depth of knowledge of individuals, demographic segments, and/or populations. Further, most conventional systems do not include suitable processes to organize ability-appraise reading materials based on each individual learner's assessed decal ability. Additionally, most conventional approaches do not include suitable processes to assess retention ability for newly teamed lexical items. Accordingly, there is a need to improve lexical systems and methods for language acquisition and study.
[0012] This background section summarizes various existing theories, methods. and systems related to language acquisition and, more specifically, language acquisition from a predominantly lexical perspective, it also includes discussion of insights and observations made by the inventors about prior art lexical systems that are helpful to understanding the subsequently described invention, but that were not necessarily appreciated by persons skilled in the art or disclosed in the prior art. Thus, the inclusion of these insights and observations in this background section, including the discussion of various drawbacks associated with conventional lexical systems, should not be interpreted as an indication that such insights and observations were part of the prior art.
BRIEF DESCRIPTION OF THE DRAWINGS
[0013] Figure 1 is a block diagram illustrating a language assessment and instruction system for testing, compiling, assessing, and delivering ability-appropriate language instruction material in accordance with an embodiment of the invention.
[0014] Figure 2 is a block diagram illustrating various components of the system of Figure 1 configured to process a standard recognition ogive by demographic segment using cumulative individual test responses and respondent data in accordance with an embodiment of the invention.
[0015] Figure 3 is a graph illustrating a cumulative ogive of the recognizability of the 6000 most frequent British National Corpus ("BNC") English words.
[0016] Figure 4 is a block diagram illustrating various components of the system of Figure 1 configured to assess the lexical ability of an individual in accordance with an embodiment of the invention.
[0017] Figure 5 is a display diagram illustrating particular examples of Yes/No lexical decision questions for establishing the probability of recognition of each lexical item in accordance with an embodiment of the invention.
[0018] Figure 6A is a display diagram illustrating a lexical item depth of knowledge scale with specific aspects of lexical Item depth of knowledge in accordance with an embodiment of the Invention.
[0019] Figure 63 is a display diagram illustrating several examples of lexical depth of knowledge decision type questions in accordance with an embodiment of the invention.
[0020] Figure 7 is a display diagram illustrating a particular example of a graph and a written description of an individual respondent's score sheet report in accordance with an embodiment of the Invention.
[0021] Figure 8A fee a scatterplot graph Illustrating the undoable recognition ability of each of the 6000 most frequent BNC English words,
[0022] Figure 8B is a scatterplot graph illustrating a hypothetical student's estimated vocabulary size in relationship to frequency and word recognizability.
[0023] Figure 8C Is a bar chart illustrating the won recognition probability data illustrated in Figure SB.
[0024] Figure 8D is a scatterplot graph Illustrating the correlation between BNC frequency data and actual assessed BNC word recognition.
[0025] Figure 9 is a block diagram illustrating various components of the system of Figure 1 configured to prioritize lexical items based on an individual's assessed lexical ability in accordance with one of the invention.
[0026] Figure 10 is a block diagram illustrating various components of the system of Figure 1 configured to prepare and deliver ability-appropriate text material
based on an individual's assessed lexical ability in accordance with an embodiment of the invention,
[0027] Figure 11A is a display diagram illustrating an example of English language text filtered in accordance with a particular individual's assessed lexical ability in accordance with an embodiment of the invention.
[0028] Figure 11B is a display diagram Illustrating the text of Figure 11 A. after further processing in accordance with an embodiment of the Invention.
[0029] Figure 11C rs a display diagram illustrating the text of Figure 11A and 11B after completion of ability-appropriate filtering and editing in accordance with an embodiment of the invention.
[0030] Figure 12 is a block diagram of a basic and suitable computer and database system that may employ aspects of the invention.
[0031] Figure 13A is a block diagram illustrating a simple, yet suitable system In which aspects of the invention may operate in a networked computer environment.
[0032] Figure 13B is a block diagram illustrating an alternative system to that of Figure 13A.
DETAILED DESCRIPTION
A. General Overview
[0033] The following disclosure is directed generally to systems and methods for testing, compiling, assessing, and delivering ability-appropriate language intimation material. The language training systems described herein can assess an individuals lexical ability in any given language or lexicon (or any given special purpose sub-domain of a language or lexicon) and, using such assessments, establish a pedagogically optimal course of instruction to efficiently and quickly improve the Individual's language and communication ability. More specifically, the disclosed systems and methods can provide a quantification of each individuars lexical ability and generate statistically derived lexical recognition ability assessments and depth of knowledge assessments for individuals, demographic segments, and/or populations. The disclosed systems and methods can also generate a personalized language learning sequence of unrecognized lexical items
Spastically tailored for each individual based on that individual's assessed lexical ability and needs. Thus, the disposed systems and methods can provide for direct study of lexical items organized by lexical importance and delivered by various passive and interactive means to each individual learner.
[0034] The disclosed systems further includes the generation and delivery of various types of personalized language ability reports to users* and the further organization and conveyance of such reports and related data to others. The system can identify and adjust for any significant differences in specific lexical item recognizabiiity between different demographic segments within the same population and, in particular, between different ages. Furthemiore, the system can identify and adjust for any significant differences in lexical item recognizabiiity for any given language or sub-domain thereof that exist between the populations of two or more different countries.
[0035] The system further includes the reorganization and presentation of text materials (on any given topic) such that the lexicon of the reorganized text will include a pre-determined percentage of lexical items that are unrecognizable to the ieamer. The inclusion of a limited number of unrecognizable lexical items in running text thus permits a reader to assign meaning to the unrecognized lexical items through their usage in context among lei own items.
[0036] Aspects of the invention can be characterized in a number of different ways. For example, one aspect can include a method for compiling and maintaining the importance of lexical items within a given language corpus or sub-domain thereof. As used herein, the ten "Importance" can refer to any one or more of the frequency of item occurrence, scale of item consequence, number of item citations, item value, and any other item specific quantifiable variable. Another aspect of the invention can include a method for testing individual users for recognition of a series of select lexical items drawn from among a general language's lexicon, or the lexicon of a language sub-domain. The selected lexical items can include both real lexical items and pseudo-lexical items. Pseudo-lexical items generally appear to be plausible, but do not have meaning In the given language or lexicon. The method can include, for example, displaying the items using an interactive "Yes/No" lexical decision-type question testing process.
[0037] Still another aspect of the invention can Include a method for displajong lexical items in an interactive sequence such that the first item presented is randomly selected from among items having a predetermined recognlzability for the demographic segment to which the user belongs. A suitable algorithmic process can be used to guide the random selection of each subsequent lexical item, from up and down a recognizabiiity scale, until the user has identified as being recognized at least one real lexical item, and also has identified at least one real lexical item as being unrecognized. Pseudo-lexical items can be randomly dispersed within the presentation of real lexical items to control for the individual conjecturing behavior of a user.
[0038] Yet another particular aspect of the invention can include a method for storing (e.g., in a database) demographic information for each test respondent and data regarding each respondents responses and interactions with respect to the lexical item questions presented during the testing process. Another aspect of the invention can include a method for determining (for particular respondents, demographic segments, and populations) the ability to retain newly learned lexical item knowledge. Retention ability can be based on depth of knowledge, time of retention, or other suitable factors.
[0039] Further aspects of the invention can include (a) a method for aggregating response data from all respondents and determining a standard recognlzability measure for each lexical item as by demographic segment, (b)a method for establishing a cumulative lexical recognition ogive for one or more particular demographic segments or populations, (c) a method for including each individual respondents demographic data and lexical items recognition response data in a cumulative lexical recognition ogive, (d) a method for detenDining each respondent's lexical recognition ability along a cumulative lexical recognition ogive £ind, in this way, determining the corresponding respondent's recognized and unrecognized lexical items.
[0040] Another aspect of the invention Is directed to a method for testing each respondent's lexical Item depth of knowledge using an interactive display of lexical item depth of knowledge questions (e.g., multiple-choice and/or Yes/No decision-type questions). In one embodiment, for example, the first displayed depth of knowledge item is at the estimated level of ability based on the respondent's
assessed ability for lexical item recognition. Subsequent depth of knowledge questions are algorithmically selected to provide the maximum amount of information at the estimate of ability. With each response, the maximum likelihood, test information, and standard error of the estimate are recalculated and, accordingly, subsequent depth of knowledge questions can be selected at the revised estimate of ability and presented to the respondent. The process can be repeated until various lave}s of lexical item depth of knowledge ability at desired levels of accuracy are achieved.
[0041] Still another particular aspect of the invention is directed to a method for determining each of the following to generate a pedagogically optimal personal language learning sequence of unrecognized, unfamiliar, and likely to be forgotten lexical items for study by each individual—
(a) lexical item importance within a given corpus, or sub-domain thereof;
(b) a cumulative lexical recognition ogive for a demographic segment or population;
(c) multiple cumulative lexical depth of knowledge ogives for a demographic segment or population;
(d) a cumulative lexical retention ogives for a demographic segment or population;
(e) an individual respondent's lexical recognition ability;
(f) an individual respondent's lexical depth of knowledge ability; and
(g) an individual respondent's lexical retention abilities.
[0042] Another aspect of the invention induces a method for interactively exchanging each leamer's personal language learning sequence between a suitable database system and any variety of (earning programs or computer systems equipped to interface with such database system. Interactive exchange of data between leaming programs and the database system can generate revisions and maintenance to the language learning sequence and the database system can repeatedly deliver an updated and current language learning sequence to the connected learning programs or computer systems.
[0043] Still another aspect of the invention is directed to a method for generating learning materials including variations of one or more lexical items in a personal language learning sequence for each individual learner via a personalized electronic mail service. The electronic mail service can utilize various pedagogical strategies to assist subscribers to leam and retain knowledge of lexical items. For example, the personalized electronic mail service can request and provide various means for confirmation of subscriber interactions, thereby appropriate updates to be made to the language learning sequence database system.
[0044] Yet another aspect of the invention is directed to a method for generating and delivering various ability-appropriate graded-materials including reading, listening and video materials and other level*appropriate contextual language materials. Such ability-appropriate-materials can request and provide various means for combination of subscriber interactions, thereby allowing appropriate updates to be made to the language leaming sequence stored in a suitable data storage device.
[0045] Still yet another aspect of the invention is directed to a method for generating and delivering personalized interactive lexical language learning games. The language leaming games, for example, can deliver batches of lexical items and present lexical items as appropriate to the personal language leaming sequence. The language leaming games can also deliver and present other forms of level-appropriate leaming materials. The language learning games can deliver and present lexical items and other level-appropriate leaming materials via mobile communication devices, personal computers, portable electronic devices, and/or other suitable electronic devices. The language learning games can utilize various pedagogical strategies and graphical formats to help subscribers rapidly learn and retain knowledge of a large number of lexical items and other level-appropriate learning materials. The language learning games can also include automatic means to acknowledge and record subscriber interactions, thereby allowing appropriate updates to be made to a database system.
[0046] Another aspect of the invention is directed to a method for generating and delivering various types of personalized, cumulative, and/or comparative lexical ability reports to individuals, teachers, and/or program administrators. Reported findings can include, for example, (a) graphic and text descriptions of how many total
items are known, (b) how many items in a given corpus or given sub-domain are known/unknown, (c) how many items within different frequency bands of a corpus or given sub-domain are known/unknown, (d) how well lexical items are known by various aspects of depth of knowledge, (e) how rapidly new lexical items are being acquired through interaction with learning programs, (f) how many items remain before a specific ability goal is achieved, (g) estimates of time required to achieve specific ability goals, and (h) comparisons of any aspects of an individual's ability to equivalent aspects of the cumulative ability of a demographic segment or population.
[0047] Still another aspect of the invention can include a method for quickly and precisely identifying how many words a user knows, the exact words the user knows, and which words the user needs to learn in order to reach his or her language learning goal. For example, the system can include a lexical engine configured to determine the words each individual knows. In one embodiment, the lexical engine can display a series of words or other lexical items to the user on the screen of a computer or portable electronic device (e.g., cellular phone, PDA. etc.). The user can choose or click "Yes" If he or she recognizes the word or item, or "No" if he or she does not. Based on the responses, the lexical engine can determine the exact words or items a person knows within a given lexicon. The lexical engine can then rank the remaining unknown words in terms of priority to that individual, and these unknown words v/ill become the user's personal target list
[0048] The invention will now be described with respect to various embodiments. The following description provides specific details for a thorough understanding of, and enabling description for, these embodiments of the Invention. However, one skilled in the art will understand that the invention may be practiced without these details. In other instances, well-known structures and functions have not been shown or described in detail to avoid unnecessarily obscuring the description of the embodiments of the invention.
[0049] The terminology used in the description presented below Is intended to be interpreted in its bandiest reasonable manner, even though it is being used in conjunction with a detailed description of certain specific embodiments of the invention. Certain terms may even be emphasized below; however, any terminology intended to be interpreted in any restricted manner will be overtly and specifically defined as such in this Detailed Description section. Illustrative descriptions in this
patent application generally refer to the English language, however, the systems and methods described herein can be applied equally to any language or semantic acknowledge domain.
[0050] Although not required, aspects and embodiments of the present Invention will be described in the general context of computer-executable instructions, such as routines executed by a general-purpose computer (e.g., a server or a personal computer). Examples of such systems are described in more detail below with reference to Figures 12-13B,
B, Embodiments of Systems and Methods for Language Knowledge Assessment and Instruction
[0051] Figure 1 is a block diagram illustrating a language assessment and instruction system 100 configured in accordance with an embodiment of the invention. The system 100 can include testing components 124, compiling components 122. 126. 128, 130, and 132, assessing components 122, 124, and 132, and delivery components 116 configured to deliver ability-appropriate language Instnjction material to users.
[0052] The system 100 can include one or more corpus and sub-domains databases 110 (only one is shown) configured to store any desired number of corpus and corresponding sub-domains. The system 100 also includes a corpus program or module 112 for compiling importance of lexical item data. More specifically, within each corpus and sub-domain there is a set number of lexical items. The collective total of all lexical items in each corpus or sub-domain is called a lexicon. As used herein, the term "lexical item" refers to any symbol, multisymbol unit, sound, utterance, word, multiword unit, or idiomatic expression that symbolizes a meaning. The ten "lexicon" refers to all of the lexical items within a particular language. The lexical items in a given lexicon may be ranked in terms of Importance in the corpus or sub-domain. The corpus program 112, for example, can scan corpora and sub-domains and generate item Importance data by corpus and sub-domain. An Item importance database 114 can store lexical item importance data by corpus or sub-domain. One advantage of this feature is that lexical items are organized by relative importance with respect to each lexicon and, therefore, it
contributes to the most logical and efficient sequencing of unknown and unfamiliar lexical items into a personal language learning sequence for each user.
[0053] The system 100 further includes a calibration program or method 130 to estimate lexical item recognizably among a large sample 128, and apply the findings to generate both true ability estimates for each individual respondent and contribute to the generation of a personal language learning sequence 116 of target items for learning. This process can include, for example, using item response theory ("IRT") to construct a statistical model that establishes the probabilistic relationship between each item and each respondent, demographic segment, and/or population. One advantage of this feature is that it enables the system 100 to precisely determine and report the particular lexical items an individual respondent Is not likely to know and, therefore, should study.
[0054] The personal language learning sequence compiler 116 is configured to take Item importance data from a given corpus or sub-domain thereof, lexical Item recognizabllity data 122, and data from one or more aspects of lexical item depth of knowledge 122. and data from lexical item retention ability 120, and combine them in one or more algorithmic processes to generate and maintain a unique personal language learning sequence of likely unrecognized lexical items. The process is informed by each user's assessed lexical abilities and needs. Accordingly, each user's likely unrecognized yet important lexical items will be prioritized. Additionally, the organization of each user's language learning sequence can be further updated based on his or her ongoing expressions of lexical depth of knowledge and newly learned item retention data.
[0055] The system 100 also enables interactive exchange of personal language learning sequences 116 between an indMdual user database 126 and various learning programs 118 and/or other suitable environments. As the lamer interacts with one or more of the learning programs 118, the data can be obtained and compiled by an interactions and retention compiler 120. The interactions and retention compiler 120 can inform the leaming sequence compiler 116 as progress is made by a particular user to ensure that each user's language learning sequence remains constantly informed and updated as to the user's current lexical ability based on the interactions. More specifically, the Interactions and retention compiler 120 can recognize and compile information as to each user's capacity for learning
and ability to retain Acknowledge of newly acquired lexical items over time. In this way, the learning sequence compiler 116 can make adjustments to each user's language learning sequence based on the infonnation received from the interactions and retention compiler 120. Information regarding each user's interaction with learning programs and/or retention of newly learned items can also be stored in the individual user database 126 and made available (as needed) to the learning sequence compiler 116 and/or the reports module 134 (via the compiler 116). The system can also be configured, based on the personal language learning sequences 116, to create and deliver various ability-appropriate materials, in written or aural formats, including materials on topics selected by the learner. This process is described in greater detail below with reference to Figures 11A-11C.
[00561 The system 100 can also include a computer adaptive test ("CAT") component 124 as an example of one interface between a user and the system 100. For example, the CAT 124 can be configured to administer tests (e.g., interactive IRT tests) to users via personal computers, mobile phones, PDAs, or using other suitable devices and/or processes. In this way, the CAT 124 can t>e used to calculate each user's lexical item recognition ability and depth of knowledge abilities. The CAT 124 can also obtain appropriate item recognizabiiity and depth of knowledge data for one or more demographic segments and populations from an item recognizability and DOK database 122.
[0057] Each user's ability assessment and demographic details can be stored in the individual user database 126, and each user's raw item response data can be stored in a cumulative response by demographic segments database 128. The cumulative responses database 128 can also be configured to allow the response data from all individual test talkers to be periodically aggregated and compiled for use by the calibration program 130. The calibration program 130 can establishing recognizability for each lexical item and process related depth of knowledge analysis for populations and demographic segments. The calibration program's findings can be stored in the item recognizability and DOK database 122. The recognition and DOK ogives compiler 132 can be configured to assemble the data from the database 122 into ogives of recognition sorted by population, demographic segment, or another desired element. The ogives compiler 132 can provide each user*s
relevant ogive to butte the reports module 134 and the learning sequence compiler 116.
[0058] In one embodiment, the individual user database 126 can inform the personal language learning sequence compiler 116 as to the ability of the individual user. The recognition and depth of knowledge ogives compiler 132 can organize recognizability and DOK abilities measures for demographic segment and population. The ogives compiler 132 can accordingly permit each user's assessment to be made relative to known and unknown words by rank order of recognizability (as described below with respect to Figure 3). The learning sequence compiler 116 obtains importance of lexical item data from the item importance database 114 for both general language and any desired sub-domains thereof. The learning sequence compiler 116 can rank each user's unknown, unfamiliar, and likely to be forgotten lexical items in terms of priority based on the user's abilities and needs. The most important (but as yet unrecognized) lexical items are prioritized for study by the learning sequence compiler 116,
[0059] In one embodiment, the learning sequence compiler 116 can also be configured to provide the user's personal item sequence to various learning programs 118 including, but not limited to, electronic e-mail services, interactive language learning games or activities, and ability-appropriate text materials. Users can interact with various learning games 118 employing suitable pedagogical strategies and formats designed to assist each user study his or her personal language learning sequence. Users may interact with the leaming programs via personal computers, mobile phones. PDAs, or using other suitable devices and/or processes.
[0060] The reports module 134 can be configured to generate individual graphic and written scores for each user and make them available to the user or other personnel (e.g.. teachers, etc.) via personal computers, mobile phones, PDAs, or other suitable devices and/or processes. The reports module 134 can also be configured to generate aggregate-type reports with analysis and/or comparisons of multiple dimensions of lexical ability and learning progress to teachers and/or program administrators. Each report generally includes the number of words known to the user, the location and size of the user's high importance, or high-frequency, word knowledge gaps, and the number of words the user needs to acquire in order
to reach their important next lexical goal. Important lexical goals vary from language to language and from sub-domain to sub-domain. In the genera) English language, for example, it is estimated that knowledge of the first 3000 most frequent words generally permits a person to read typical English reading materials without the assistance of a dictionary. Accordingly, an important goal for users studying English will be to learn the first 3000 most frequent English words, in other embodiments, the reports can include different data and/or have different features.
[0061] In the illustrated embodiment, the components of the language training system 100 each include a separate component (e.g., a single database or a single processing component). In other embodiments, however, two or more of the above-described components can be within the same device. In further embodiments, the language training system 100 can include a different number of components and/or the components can have a different arrangement. Additionally, it will be appreciated that one or more of the components of the language training system 100 can have separate utility operating alone or as subsystems within the overall system. For example, various components of the system can be used merely for assessing a user's lexical knowledge. In other embodiments, the components can have other arrangements to perform other functions.
[0062] Figure 2 is a block diagram illustrating various components of the system 100 configured to process a standard recognition ogive by demographic segment using cumulative individual test responses and respondent data in accordance with an embodiment of the invention. More specifically, the cumulative user response database 128 can be analyzed by the lexical item calibration program 130 (utilizing item response theory) at desired intervals. The calibration program 130. for example, can utilize Joint Maximum Likelihood Estimation, a statistical procedure that jointly estimates the maximum likelihood of a vector of item responses. The program begins by making an initial estimate of the respondent's abilities, then treats these estimates as being fixed and estimates the maximum likelihood of the vector of item responses conditioned on the ability estimate to obtain estimates of the recognizabillty of the lexical items. The results of this step are then treated as fixed and the vector of item responses are then estimated using maximum likelihood conditioned on the lexical item recognizabillty to obtain new estimates of ability. This process continues until the process converges on set criteria.
[0063] In one embodiment, for example, each respondent can respond to a series of Items displayed before them in an interactive IRT online test. A suitable number of the lexical items displayed to any one respondent can also have been displayed to other respondents. The calibration program 130 can manage, organize, and periodically compile all respondents' answers as if they were a subset of one overall pool of items to one aggregate test in one embodiment, respondents' inputs may be organized by any specific demographic segmentation and/or by any language or sub-domain thereof. Because the recognizability measures of each lexical item and the individual ability measures of each respondent are simultaneously estimated by the calibration program 130, all estimates will be on the same scale. Provided the cumulative number of responses to each lexical item is sufficient to stabilize an item's recognizability measure, the system can accurately determine an individual's ability assessment in any specific language sub-domain.
[0064] By way of example, in one particular embodiment of the system (and for a demographic segment consisting of 18 year-old Japanese males) the specific recognizability of each lexical item in the Japanese language sub-domain for heavy metal music may be determined. The lexical items for the testing process would be generated through analysis of a corpus sub-domain specifically related to heavy metal music ("HMM"). The sub-domain will be scanned and organized by the corpus program 112, and organized into a lexicon of important items, in this example, ranked by frequency of occurrence within the corpus. As a first step, HMM lexical items will be tested with a beta-test group of approximately 1000 respondents among the target demographic segment. The beta testing can enable initial calibration of the recognizability of HMM lexical items among 18 year-old Japanese males. The test will then be capable of producing provisional estimates of HMM lexical knowledge for each subsequent 18 year-old male respondent. Provisional scores may also be retroactively sent to the initial 1000 beta-test respondents. Thereafter, as the cumulative number of respondents grows, with each subsequent calibration 130 of cumulative responses data 128, the accuracy of the individual ability estimation sharpens. The nature of lexical statistical probabilities is one of diminishing returns. In other words, after a certain point, it generally deans matter how many more people respond to each lexical item, the item's measure of recognizability remains generally stable.
[0065] The probabilities of a given response are expressed mathematically through a number of different IRT formulas, depending upon the variables and the purpose of the application. In one embodiment, the probability of a random respondent j with ability 9j answering a random Item i with recognizability r,
correctly is conditioned upon the ability of the respondent and the recognizability of the item. In other words, if a respondent has a high ability in a particular domain, he or she will probably recognize an item having high recognizability to the respondent's demographic segment and population. Conversely, if a respondent has a low ability and the item has low recognizability, the respondent will probably not recognize the item.
[0066] In one embodiment, a probability of item recognition can be calculated using the following equation:
where P’ {0) is the probability of a random respondent with ability 6 recognizing
item J, e is the base of natural logarithms (2.718), 0 is the respondent's ability measured in logits, b’ is the un-recognizability parameter of the item measured in
logits, and r’ is the recognizability parameter or (6, *-1.0).
[0067] The higher the value of the estimate of ability 9, the greater the respondent's ability. The estimate of ability 9 can range from -oo<‘
Documents
Application Documents
| # |
Name |
Date |
| 1 |
4976-chenp-2007-pct.pdf |
2011-09-04 |
| 1 |
4976-CHENP-2007_EXAMREPORT.pdf |
2016-07-02 |
| 2 |
4976-chenp-2007-form 5.pdf |
2011-09-04 |
| 2 |
4976-CHENP-2007 CORRESPONDENCE OTHERS 13-09-2011.pdf |
2011-09-13 |
| 3 |
4976-chenp-2007-form 3.pdf |
2011-09-04 |
| 3 |
4976-chenp-2007-abstract.pdf |
2011-09-04 |
| 4 |
4976-chenp-2007-form 18.pdf |
2011-09-04 |
| 4 |
4976-chenp-2007-claims.pdf |
2011-09-04 |
| 5 |
4976-chenp-2007-correspondnece-others.pdf |
2011-09-04 |
| 5 |
4976-chenp-2007-form 1.pdf |
2011-09-04 |
| 6 |
4976-chenp-2007-description(complete).pdf |
2011-09-04 |
| 6 |
4976-chenp-2007-drawings.pdf |
2011-09-04 |
| 7 |
4976-chenp-2007-description(complete).pdf |
2011-09-04 |
| 7 |
4976-chenp-2007-drawings.pdf |
2011-09-04 |
| 8 |
4976-chenp-2007-correspondnece-others.pdf |
2011-09-04 |
| 8 |
4976-chenp-2007-form 1.pdf |
2011-09-04 |
| 9 |
4976-chenp-2007-claims.pdf |
2011-09-04 |
| 9 |
4976-chenp-2007-form 18.pdf |
2011-09-04 |
| 10 |
4976-chenp-2007-form 3.pdf |
2011-09-04 |
| 10 |
4976-chenp-2007-abstract.pdf |
2011-09-04 |
| 11 |
4976-chenp-2007-form 5.pdf |
2011-09-04 |
| 11 |
4976-CHENP-2007 CORRESPONDENCE OTHERS 13-09-2011.pdf |
2011-09-13 |
| 12 |
4976-CHENP-2007_EXAMREPORT.pdf |
2016-07-02 |
| 12 |
4976-chenp-2007-pct.pdf |
2011-09-04 |