机器学习英文文献和中文翻译(6)

3。3。1 Reconstruction Based Models

In models based on reconstruction (including Autoencoders, Sparse Coding, RBMs, k-Means), it is often preferable to set epsilon to a value such that low-pass filtering is achieved。 One way to check this is to set a value for epsilon, run ZCA whitening, and thereafter visualize the data before and after whitening。 If the value of epsilon is set too low, the data will look very noisy; conversely, if epsilon is set too high, you will see a "blurred" version of the original data。 A good way to get a feel for the magnitude of epsilon to try is to plot the eigenvalues on a graph。 As visible in the example graph below, you may get a "long tail" corresponding to the high frequency noise components。 You will want to choose epsilon such that most of the "long tail" is filtered out, i。e。 choose epsilon such that it is greater than most of the small eigenvalues corresponding to the noise。

In reconstruction based models, the loss function includes a term that penalizes reconstructions that are far from the original inputs。 Then, if epsilon is set too low, the data will contain a lot of noise which the model will need to reconstruct well。 As a result, it is very important for reconstruction based models to have data that has been low-pass filtered。

Tip: If your data has been scaled reasonably (e。g。, to [0,1]), start with epsilon = 0。01 or epsilon = 0。1。

3。3。2 ICA-based Models (with orthogonalization)

For ICA-based models with orthogonalization, it is very important for the data to be as close to white (identity covariance) as possible。 This is a side-effect of using orthogonalization to decorrelate the features learned (more details in ICA)。 Hence, in this case, you will want to use an epsilon that is as small as possible (e。g。, epsilon = 1e − 6)。

Tip: In PCA whitening, one also has the option of performing dimension reduction while whitening the data。 This is usually an excellent idea since it can greatly speed up the algorithms (less computation and less parameters)。 A simple rule of thumb to choose how many principle components to retain is to keep enough components to have 99% of the variance retained (more details at PCA)

Note: When working in a classification framework, one should compute the PCA/ZCA whitening matrices based only on the training set。 The following parameters used be saved for use with the test set: (a) average vector that was used to zero-mean the data, (b) whitening matrices。 The test set should undergo the same preprocessing steps using these saved values。

3。4 Large Images

For large images, PCA/ZCA based whitening methods are impractical as the covariance matrix is too large。 For these cases, we defer to 1/f-whitening methods。 (more details to come)

3。5 Standard Pipelines

In this section, we describe several "standard pipelines" that have worked well for some datasets:

3。5。1 Natural Grey-scale Images

Since grey-scale images have the stationarity property, we usually first remove the mean-component from each data example separately (remove DC)。 After this step, PCA/ZCA whitening is often employed with a value of epsilon set large enough to low-pass filter the data。

3。5。2 Color Images

For color images, the stationarity property does not hold across color channels。 Hence, we usually start by rescaling the data (making sure it is in [0,1]) ad then applying PCA/ZCA with a sufficiently large epsilon。 Note that it is important to perform feature mean-normalization before computing the PCA transformation。

3。5。3 Audio (MFCC/Spectrograms)

For audio data (MFCC and Spectrograms), each dimension usually have different scales (variances); the first component of MFCCs, for example, is the DC component and usually has a larger magnitude than the other components。 This is especially so when one includes the temporal derivatives (a common practice in audio processing)。 As a result, the preprocessing usually starts with simple data standardization (zero-mean, unit-variance per data dimension), followed by PCA/ZCA whitening (with an appropriate epsilon)。

上一篇：轨道转化砂带的砂光机英文文献和中文翻译

下一篇：船舶建造规格书英文文献和中文翻译

机器学习英文文献和中文翻译(6)

AngularJS技术介绍英文文献和中文翻译

开关电源水冷却系统英文文献和中文翻译

减数分裂和基因重组英文文献和中文翻译

太阳能最大功率点追踪和...

移动码头的泊位分配问题英文文献和中文翻译

虚拟船舶装配集成建模方...

中学生科学探究中对等论...

农村幼儿教育开题报告

透过家徽看日本文化家紋から見る日本文化

“时尚与旅游”电子杂志的设计制作

论商业银行中间业务法律...

ASP.net+sqlserver会员管理系统设计

企业科研管理中统计报表...

高校体育场馆效益研究【2772字】

家电制造企业绿色供應链...

华夫饼国内外研究现状

基于安卓平台的二维码会议管理系统设计