scale in R: How to Use scale() Function in R

Scaling is a process to compare the data that is not measured in the same approach. Scaling is the normalization of a dataset using the mean value and standard deviation. Scaling is often used with vectors or columns of a data frame.

The scaling is especially helpful in a regression analysis where the magnitude range of each variable can benefit from being normalized. This type of analysis often needs column scaling in a data frame to provide meaningful results. Without normalizing, the vectors or columns you are using, you will often get meaningless results.

scale in R

The scale() is a built-in generic R function whose default method centers and/or scales the columns of a numeric matrix. By centering values, the scale() function subtracts the values of every column by the corresponding ‘center’ value from the argument only if the value provided is numeric.

If the logical value is found, the mean of the column gets subtracted from the corresponding columns of the matrix.

Syntax

scale(x, center = TRUE, scale = TRUE)

Arguments

x: It is a numeric matrix(like object).

center: It is either a logical value or numeric-alike vector of length equal to the number of columns of x, where ‘numeric-alike’ means that as.numeric() will be applied successfully if is.numeric(.) is not True.

scale: It is either a logical value or a numeric-alike vector of length equal to the number of columns of x.

Explanation

The value of the center resolves how column centering is performed. If the center is a numeric-alike vector with a length equal to the number of columns of x, then each column of x has the corresponding value from the center subtracted from it.

To perform scaling, the scale() method divides the values of every column by the corresponding ‘scale’ value from the argument if the value is numeric. Otherwise, the values get divided by the standard deviation or the root-mean-square values.

The root-mean-square for a (possibly centered) column is defined as \(\sqrt{\sum(x^2)/(n-1)}\), where \(x\) is a vector of the non-missing values and \(n\) is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but it is generally not.

Implementing the scale() function in R

To create a matrix in R, use the matrix() function.

mat <- matrix(1:9, ncol = 3)

print(mat)

Output

     [,1] [,2] [,3]
[1,]   1    4    7
[2,]   2    5    8
[3,]   3    6    9

We created a 3 X 3 matrix.

To scale the matrix, use the scale() function.

mat <- matrix(1:9, ncol = 3)

scale(mat)

Output

     [,1] [,2] [,3]
[1,]  -1   -1   -1
[2,]  0     0    0
[3,]  1     1    1
attr(,"scaled:center")
[1] 2  5  8
attr(,"scaled:scale")
[1] 1  1  1

In the process of scaling a vector, you will get negative values if you center the data. It reduces the effect of a different scale when comparing vectors bringing it closer to a normal distribution. If you are trying to compare suggested data from different measurements, this type of normalization is helpful.

If we set ‘scale‘ = FALSE, we try to force that the scaling feature of the scale() is turned off and that only centralization of the data values will occur. See the following code.

mat <- matrix(1:9, ncol = 3)

scale(mat, center = c(1, 2, 3), scale = FALSE)

Output

     [,1] [,2] [,3]
[1,]   0    2    4 
[2,]   1    3    5
[3,]   2    4    6
attr(,"scaled:center")
[1] 1   2   3

The scale() function with default settings will calculate the mean and standard deviation of the entire vector, then “scale” each element by those values by subtracting the mean and dividing by the sd.

If you use the scale(x, scale=FALSE), it will only subtract the mean but not divide by the std deviation.

set.seed(1)
x <- runif(5)

# Manually scaling
print("Manually Scaling")
data <- (x - mean(x)) / sd(x)
print(data)

print("---------------------")

# Using scale() function
print("Using scale() function")
scale(x)

Output

[1] "Manually Scaling"
[1] -0.6957397 -0.3221799 0.3811385 1.5561576 -0.9193766

[1] "---------------------"

[1] "Using scale() function"
[,1]
[1,] -0.6957397
[2,] -0.3221799
[3,] 0.3811385
[4,] 1.5561576
[5,] -0.9193766
attr(,"scaled:center")
[1] 0.4640751
attr(,"scaled:scale")
[1] 0.2854034

You can see that the output of both approaches is the same.

R scale() function with center = FALSE

We can scale the values of a matrix by setting center = FALSE in the scale() function as an argument.

mat <- matrix(1:9, ncol = 3)

scale(mat, center = FALSE, scale = c(1, 2, 3))

Output

     [,1] [,2]  [,3]
[1,]  1    2.0  2.333333
[2,]  2    2.5  2.666667
[3,]  3    3.0  3.000000
attr(,"scaled:scale")
[1] 1   2   3

Using the scale() Function Without Actually Scaling

In this instance, no actual scaling occurs, so it does not help when comparing values measured in different ways.

What it does is to give you a central point around which your data will be found (center is TRUE, scale is not). This fact can be helpful in and sowing how individual data points compared to the average value.

mat <- matrix(1:9, ncol = 3)

scale(mat, scale = FALSE)

Output

     [,1] [,2]  [,3]
[1,]  -1   -1   -1
[2,]   0    0    0
[3,]   1    1    1
attr(,"scaled:center")
[1] 2  5  8

Unlike the previous examples, scaling is false, so no scaling occurs. However, it reveals a good deal of balance on both sides of the center mark. In such a simple case, the printed figures are sufficient to see this, but with the larger data sets, a graph would be needed to analyze the results.

Using the scale() Function without Scales Or Centers

In this case, we have no scales or centers. At first glance, this fact may make this idea seem entirely useless. However, it does have the effect of turning a vector into a single column array.  Let’s see this with the example of the matrix.

mat <- matrix(1:9, ncol = 3)

scale(mat, center = FALSE, scale = FALSE)

Output

     [,1] [,2] [,3]
[1,]  1    4    7
[2,]  2    5    8
[3,]  3    6    9

Conclusion

The scale() function makes more sense when you have multiple variables you consider across different scales. For example, one variable is of the order of magnitude 100 while another is of magnitude 100000.

The scale provides nothing else but standardization of the data. The values it creates are known under numerous different names, one of them being z-scores.

That is it for the scale() function in R.

Leave a Comment