4 The research design

The tagged words of the SWECCL

The tagging is a method making human language processable by the machine。 The test we human can easily read is the raw text, which is not suitable for the computer to analyze。 The most common tagging method is the “word_tag” mode。 Here we adopt the CLAWS 4 Tagging Collection to tag the raw text。 After the tagging, the tagged text can be processed by the software Colligator developed by Beijing Foreign Studies University professor (Liang Maocheng: 2008), which can analyze the tagged text prepared in advanced。 

The regular expression

In order to sort as many as possible qualified results out of the corpus。 We should seek for some items which can cover many various kinds of conditions of our research aim。

In order to do that, we should firstly set a list of all the colligation items we are looking for。 Here we mainly focus on 7 different colligations of the infinitive particle and what precedes it。 They are listed as follows:来,自,优.尔:论;文*网www.chuibin.com +QQ752018766-

infinitive as subject preceded by a stop period mark (IAS)

infinitive as direct object preceded by verb (VI)

infinitive preceded by noun(NI)

infinitive preceded by adjective(AI)

infinitive preceded by present ”ING” participle (INGI)

infinitive preceded by past “ED” particle(EDI)

infinitive preceded by adverb(ADI)

The reason why this research chooses this seven colligation items is that the preceding part of the TO infinitive comes from the major lexical categories in linguistics: noun, verb, adjective and adverb。 They take up a majority of the total number of the English vocabularies; according to the math principle the possible combination result is the most various。 And for the convenience of referring to the items above, this article will use the abbreviation in the parentheses。 

In addition, we have to use the regular expression to retrieve targeted information out of the corpus。 “The regular expression is a kind of special character string which is applied to describing and matching string with same or similar property。” (Jurafsky &Martin: 2009) According to the tagged expression and the rules (Liang Maocheng: 2009) stated in forming it, we can turn some our human language grammatical devices into regular expression which can be interpreted by the machine。 The 7 abbreviation mentioned above can be rewritten by the regular expression。 Here listed as follows:(for more details, please check the appendix)

上一篇:夏洛特·帕金斯·吉尔曼《黄色墙纸》的叙事分析
下一篇:戴维·赫伯特·劳伦斯《儿子与情人》中灵与肉的不平衡

中国特色政治话语英译策...

德语论文现代德国乳品行...

也门与中国贸易平衡及首选产品评价

从李连杰好莱坞电影看中国形象的对外传播

英语公益广告中的概念隐喻研究

从翻译目的论看中国特色词语翻译

中国学习者对英语/n//l//r/的感知和产出研究

原位离子交换法合成AgBrAg3PO4复合光催化材料

浅谈芭蕾舞外开与中国古典舞外旋的区别

沉箱码头设计国内外研究现状和参考文献

社会工作视野下医患关系的冲突与协调

结肠透析机治疗慢性肾功...

论好莱坞电影中的中国文化元素

18岁可以學什么技术,18岁...

稀土伴生放射性冶炼厂环境放射性水平调查

谈人机工程学在公共电话亭设计中的应用

中学地理生活化教学研究