Analysing Experimental Results With R
May 27, 2014 § 2 Comments
In this post, I will briefly show an analysis of the results of the experimental design I created earlier:
Creating an Experimental Design in R
The design concerns the interactions between storage protocol, iops, read%, rand% and block size for IO in VMWare. The effect under analysis is the CPU utilization of the ESX kernel. The motivation for this is discussed here:
NFS vs Fibre Channel: Comparing CPU Utilization in VMWare
I load the previous experimental design, having added a response column:
load( “V:/Doe/Design.1.rda” )
Design.1.withresp <- add.response(Design.1,
“V:/Doe/Design.1.with_response.csv”, replace=FALSE)
Now, apply linear regression and summarize the results:
LinearModel.1 <- lm(cpu ~ (read + rand + blk_sz + protocol + iops)^2,
data=Design.1.withresp)
summary(LinearModel.1)
This produces the following table:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.256812 0.018130 179.632 < 2e-16 ***
read1 -0.074656 0.018130 -4.118 4.34e-05 ***
rand1 -0.001125 0.018130 -0.062 0.95054
blk_sz1 0.040906 0.018130 2.256 0.02440 *
protocol1 0.608219 0.018130 33.547 < 2e-16 ***
iops1 1.032375 0.018130 56.942 < 2e-16 ***
read1:rand1 -0.016969 0.018130 -0.936 0.34967
read1:blk_sz1 -0.018875 0.018130 -1.041 0.29825
read1:protocol1 -0.006219 0.018130 -0.343 0.73171
read1:iops1 -0.110219 0.018130 -6.079 2.10e-09 ***
rand1:blk_sz1 -0.017750 0.018130 -0.979 0.32795
rand1:protocol1 -0.002937 0.018130 -0.162 0.87134
rand1:iops1 0.005656 0.018130 0.312 0.75516
blk_sz1:protocol1 0.062063 0.018130 3.423 0.00066 ***
blk_sz1:iops1 0.026219 0.018130 1.446 0.14865
protocol1:iops1 0.369719 0.018130 20.392 < 2e-16 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Residual standard error: 0.4587 on 624 degrees of freedom
Multiple R-squared: 0.8862, Adjusted R-squared: 0.8835
F-statistic: 324 on 15 and 624 DF, p-value: < 2.2e-16
The most immediately useful column is the estimate of the coefficient. If we assume that the CPU utilization is given by an equation of the form:
In this case, we see that ,
and so on for all the coefficients.
Meanwhile, each of the factors is normalized so that the low value corresponds to -1, and the high value to +1. So, we have:
Where protocol is -1 in the case of fibre channel and +1 in the case of NFS.
As load increases, we expect the CPU utilization to be dominated by effects involving relationships with IOPS. That is to say, if without the terms not involving IOPS:
So, we can simplify things be ignoring all effects not involving IOPS, giving the following formula for CPU utilization:
The terms involving interactions with rand and block size clearly have relatively little impact, so I discard them with minimal loss in precision.
In the original experiment, the CPU utilization was calculated for 8 cores. Normalizing for a single core, we have:
It is now also possible to approximate the CPU cost of NFS over fibre channel:
With the minimum difference for write IO, and the maximum for read.
So, in this experiment, NFS is found to be of the order of twice as expensive as fibre channel.
[…] ← Perl PDQ: 8 core response time under load Analysing Experimental Results With R → […]
[…] https://ascknd.com/2014/05/27/analysing-experimental-results-with-r/ […]