Web数据挖掘的应用英文文献和中文翻译(3)

The above process is carried out for our entire testing data set. Partition of training and testing data sets are given in result analysis section. Fig. 3 depicts the entire structure of training and testing data set process with knowledge base.
4. Experimentation and Result analysis
In this section, extensive experimentation has been conducted on our proposed model and evaluated the obtained results using accuracy measures such as Precision, Recall and F-measure. For different evaluation purpose our data set has been split into three criteria like 70:30, 60:40 and 50:50 of Training: Testing dataset respectively.
4.1. Accuracy Analysis
Training dataset is the set of data that we use to train the system. It is basically used in various areas of information science. Testing dataset is the set of data used in various areas of information science to check the validation of the system which is trained based on the training dataset. Theoretically, 20% of the data is used for training the system and the rest of the 80% of data is used to test the validation of the system15. But, it is not a feasible fact in practical.
Hence, we have considered three categories of data viz., 70:30, 60:40 and 50:50. Where, 70, 60 and 50 refers to the percent of URLs we have considered to train the system and 30, 40 and 50 refer to the percent of URLs that we have used to test the validation of the trained system. The results obtained after training and testing processes is discussed in the following sections.
4.1.1. Results obtained from 70:30 dataset
XML
URLs    True
Positive    True
Negative    False
Positive    False
Negative    Precision     Recall    F-Measure     Accuracy

CODE    21    286     24    0    0.4665    1.0000    0.6300    92.7
HTML    150    161    0    20    1.0000    0.8800    0.9370    93.9
PURE    4    319    0    8    1.0000    0.33..    0.5000    97.6
RSS    134    197    0    0    1.0000    1.0000    1.0000    100
Avg                            0.7667    96.4%
Table 1 Results obtained for 70:30 dataset
In the Table 1, 70% of the data is considered as training set and the rest (30%) is used as testing data. With this set of data, we have achieved an average accuracy of 96.4% and average F-measure of 0.7667. Graph has been plotted for obtained F-Measure and Accuracy as shown in Fig. 4.
For few category our proposed algorithm achieves less recall and precision value(s) because of tag similarity with other category XML URLs miss classification occurs

Fig. 4 Accuracy analysis for 70:30
In Table 2, 60% of the data is considered as training set and the rest 40% is used as testing data. With this set of data we have achieved an average accuracy of 97.35% and an f-measure of 0.8731.
4.1.2. Results obtained from 60:40 Dataset
Table 2 Results obtained for 60:40 dataset.
XML
URLs    True
Positive    True
Negative    False
Positive    False
Negative    Precision     Recall    F-Measure     Accuracy
CODE    26    378     11    1    0.7021    0.9622    0.8120    97.11

Web数据挖掘的应用英文文献和中文翻译(3)

移动码头的泊位分配问题英文文献和中文翻译

纤维素增强的淀粉-明胶聚...

多极化港口系统的竞争力外文文献和中文翻译

阻尼减震平台的设计英文文献和中文翻译

超精密自由抛光的混合机...

旋转式伺服电机的柔性电...

过程约束优化数控机床的...

压疮高危人群的标准化中...

酵母菌发酵生产天然香料...

浅谈高校行政管理人员的...

AES算法GPU协处理下分组加...

STC89C52单片机NRF24L01的无线病房呼叫系统设计

提高教育质量,构建大學生...

从政策角度谈黑龙江對俄...

上海居民的社会参与研究

浅论职工思想政治工作茬...

基于Joomla平台的计算机学院网站设计与开发