R Advanced

Everything You Need to Know About read.table() Function in R

Most real-world data resides in external sources, including CSVs, Excels, Texts, or Databases. To bring back the data from these external sources, we need a bridge or function that we can use to analyze these data into our R ecosystem and further manipulate it to get the required results. That’s where a base function like read.table() comes into the picture.

read.table()

The read.table() function imports external tabular data into R data frames for analysis and manipulation. It connects R to the outside world datasets.  You can use it as the foundation for any data analysis project.

Syntax

# Reading a CSV file with a header row
df <- read.table("data.csv", header = TRUE, sep = ",", colClasses, na.strings = "")

# Reading a tab-separated text file without a header
df <- read.table("data.txt", header = FALSE, sep = "\t")

Parameters

Name Value
file It is a character string that specifies an input file we want to read.
header It is a header argument. If set to TRUE, it suggests a first row of Data Frame.
sep It specifies a separator character used to delimit the columns in the file.
colClasses It specifies column classes.
na.strings It specifies which strings should be interpreted as missing values.

Sample dataset

Before proceeding further, we need to create an external data source if we have not already:

You can skip this step if you have a data source.

We can create a CSV file using R by the command below:

cat("Name,Age,City\nKrunal,31,Perth\nJane,30,London\nSunita,35,Ahmedabad\n", file = "data.csv")

Output CSV file

It will create a “data.csv” file that looks like this:

Basic CSV Import with Header

We will read the sample “data.csv” file with a header in our R environment:

# Reading a CSV file
df <- read.table("data.csv", header = TRUE, sep = ",")

print(df)

Output

You can see from the above output of RStudio that we passed header = TRUE, which means that the output must contain the first row that has column names. The sep = “,” specifies the comma as the column separator.

Tab-Separated File (TSV) without Header

Let’s create a TSV (Tab-Separated File) without a header and import it using read.table() function in RStudio.

# Creating a sample TSV file (data.tsv)
cat("John\t25\tNew York\nYogita\t30\tDelhi\nPeter\t22\tParis\n", file = "data.tsv")

# Read the TSV file
df <- read.table("data.tsv", header = FALSE, sep = "\t")

print(df)

Output

In this code, we created a TSV file on the fly and imported it using read.table() function and displayed it in RStudio by writing few lines of code.

You can see that we have not imported column names, and that’s why it assigns by default names: V1, V2, and V3. The sep = “\t” uses the tab character as the separator.

Specifying column classes

Let’s create a new CSV file on the fly that contains mixed data type columns. After importing it as a data frame, we will analyze its structure.

# Create a sample CSV file with mixed data types (data_types.csv)
cat("Name,Age,Salary,IsActive\nJohn,25,50000,TRUE\nJane,30,60000,FALSE\n", file = "data_types.csv")

# Read the CSV with specific column classes
data <- read.table("data_types.csv", header = TRUE, sep = ",",
                    colClasses = c("character", "numeric", "numeric", "logical"))

str(data) # Check the structure of the data frame

Output

In the above programming code, you can see that we passed the “colClasses” argument, which is a vector specifying a data type for each column.

Handling Missing Values

If we want to replace missing values with NA in a final data frame, we need to pass na.strings = “” as an argument.

# Creating a sample CSV file with missing values (missing_data.csv)
cat("Name,Age,City\nJohn,25,\nJane,,London\nPeter,22,Paris\n", file = "missing_data.csv")

# Reading the CSV, specifying the missing value representation
df <- read.table("missing_data.csv", header = TRUE, sep = ",", na.strings = "") # na.strings = c("","NA","?") for multiple missing values

print(df)

Output

In the above code, we specified which string should be interpreted as missing value, and we told R that (“”) empty values should be treated as missing values, and hence, it was replaced with <NA> in the final data frame.

That’s all for today!

Recent Posts

R length(): Vector, List, Matrix, Array, Data Frame, String

Before executing an operation on an object, it is advisable to check its length, as…

15 hours ago

How to Round Numbers in R

Rounding is a process of approximating a number to a shorter, simpler, and more interpretable…

2 days ago

Adding Single or Multiple Columns to Data Frame in R

Whether you want to add new data to your existing datasets or create new variables…

4 days ago

sqrt() Function: Calculate Square Root in R

The square root of a number is a value that is multiplied by itself, giving…

5 days ago

How to Remove Duplicate Rows from DataFrame in R

Duplicate rows refer to all the values across all columns that are the same in…

6 days ago

How to Remove NA From Vector in R

A vector is a data structure that holds the same type of data. When working…

1 week ago