Interacting with CSV files using Python

CSV (Comma Separated Values) files are one of the most used forms of data storage. This is due to their flexibility and cross language support. Every major programming language has support for CSV file I/O (input/output). In this article, we will be using Python’s CSV module to illustrate some of the most common operations on CSV files.

Before jumping to the operations, let’s know a bit about the CSV file that we will be playing with. We have a CSV file, “user.csv“. Contents of the file are shown below.

user.csv

It’s in a folder along with some python scripts to perform I/O operations on them.

There are 3 main modes in which the files can be opened: Read, Write and Append mode.

Read mode (denoted by ‘r’): It reads contents from a file. The file should exist else it gives an error.

Write mode (denoted by ‘w’): It writes to a CSV file. If the file exists, it clears it and starts writing to it from the first row. Else, if the file does not exist, it is created and then the data is written to it.

Append mode (denoted by ‘a’): It writes rows to a pre-existing file. It is different from write mode as it does not clear the existing rows of the CSV file, it just adds the rows below them.

Reading from CSV files

The below code reads from a CSV file, row by row.

The first line imports the csv package so that we can use the methods provided by it for easy csv I/O. The 2nd line opens the file user.csv in read mode. We have used a file object called userFile, which points to the file contents. csv.reader is the object which iterates over the lines of the CSV file through the File object userFile.

The above code gives the following output:

Please note that the ‘with’ statement opens the file, and then the processing is done on the file, following which it closes the file. In short, it remembers to close the file, so that we don’t have to. Hence, there is no need to call any close function when we are using the ‘with’ statement.

Writing to CSV files

Now, suppose we want to write a new file “beneficiary.csv” containing the data of the beneficiaries of the users, we can do that using the following code:

In the above code, first we open the file (it is automatically created since it does not exist) using open function in write mode. Then we create the csv writer object using the file object ‘newFile’. The writerow method, as the name suggests, writes a row to the csv file. Please note that writerow method accepts only a single argument which we have given as a list here (notice the ‘[‘ and ‘]’ around the contents to be written) . Make sure you give a single argument only, as giving multiple arguments will result in the following error:

After writing the contents to the file, let us see if they have been correctly written or not by reading the contents of “beneficiary.csv” using the same code we used before.

The above code displays the contents of beneficiary.csv as shown below:

 Appending to CSV files

Suppose we want to add a few lines to the “beneficairy.csv” file. How can we do that? Yes, you guessed it right, the Append mode can be of help. The below code appends 3 rows to the existing “beneficiary.csv” file.

We can read the contents of beneficiary.csv by using the same code as earlier. The contents are updated, which was verified by the following output.

Now, we are done with the simple operations. Let’s do something complex so that we learn how to combine the reading and writing methods together to create efficient data-flow between files.

Task 1: Copy data from 1 file to another

Suppose we want to copy the contents of user.csv to a new file out.csv. The below code can help you with that.

In the above code, first we open the file (in write mode) to which we want to copy the contents, i.e. destination file. Then, we create the writer object. Further, we open the file (in read mode) from which we need to copy the contents from, i.e. the source file (user.csv). Then we create

the reader object which uses the file object inFile of user.csv.

After that we simply iterate row by row and copy the content from source to destination.First, the line from source is stored in ‘row’ variable, which is a list. Then, the list is written to the destination file using the writerow method.

Task 2: Combine data of 2 files

Suppose we want to combine the details of users in the two files “user.csv ” and “beneficiary.csv” into a single file.

We need to map the beneficiary detail to the appropriate user_id. In other words, we need to match the “user_id” attribute from both the files and merge the details of each user into a single file. Since each row of a csv file is a group of properties for a certain user_id, we can imagine them as a list in Python. Therefore, a CSV file is nothing but a list of lists i.e. a nested list. We will be using the concept of nested lists in the following code in order to combine the data of the 2 CSV files.

First, let us declare 2 lists which will store the content of the 2 CSV files, user.csv  and beneficiary.csv.

Now, we will iterate over each row of user.csv and append the row as a list to userList.

Doing the same again, we have benList, populated with the content of beneficiary.csv.

Displaying the contents of both the lists, just to verify whether we have correctly stored the content or not.

userList

 

benList

Now, we need to combine the 2 lists according to user_id, and add the rows to the new file row_wise.csv. It’s done using the code below:

In this code, first we open the output file, row_wise.csv in write mode.

Then, we create its file object(rowFile) and writer object (rowFileWriter).

Further, we create a temporary list called combined_row, which will contain the merged row for each iteration of the loop, if there’s a match between the user_id attribute among the 2 lists. Else, the combined_row will have only userList’s row in it.

Finally, each row is written to the output file.

Don’t worry if you didn’t get what happened in the above lines. Below is an example which explains the above process in more detail.

Iteration 1

1. The iteration starts, combined row is empty.

2. combined_row is loaded with the first row of userList.

3. Now, the first element of the first row of both userList and benList is compared. Since both are the same (“user_id”), the 2nd element of the first row of benList, i.e. the string “beneficiary” is appended to combined_row.

4. This makes our header row for the output file. combined row is written to row_wise.csv.

Iteration 2

1. The iteration starts, combined row is empty.

2. combined_row is loaded with the 2nd row of userList.

3. Now, the first element of the 2nd row of both userList and benList is compared. Since both are the same (“1”), the 2nd element of the 2nd row of benList, i.e. the string “xyz” is appended to combined_row.

4. This makes our header row for the output file. combined row is written to row_wise.csv.

This goes on till the 5th iteration. After that there is no match as the file ‘beneficiary.csv’ had ids only from 1 to 5.

The final content of the combined file, ‘row_wise.csv‘ is:

Complete code used in this article can be found at Github.

If you are still reading, I would like to tell you that you just learned some basic and  complex interactions with CSV files using Python CSV package. If you want to dig deeper into the subject, you can read more about CSV files in the official Python documentation, here.

Thanks for reading. Your comments/improvements are welcome in the comments section below. Please share this article with your friends and colleagues through the sharing buttons below.

Tags:,
Latest Comments
  1. sunitha

    Excellent presentation.

Leave a Reply

%d bloggers like this: