Programming for Data Science¶

File Handling¶

Dr. Bhargavi R

SCOPE, VIT Chennai

Need for File Handling¶

  • We've learned the Data Structures and IO mechanisms suitable for small amount of data. But this data is not persistant.
  • Most of the data science applications deal with data size which are difficult, if not impossible, to deal with.
  • File handling constructs supported by Python extend their hand to solve this problem.

Opening a file¶

Signature¶

file handle = open(file_name [, access_mode])
access_mode Description
r Reading only
r+ Both Read/Write
w Writing only
w+ Both Read/Write
a Appending
a+ Appending/Reading

File Object Attributes¶

attribute return value
file.closed true if file closed, else false
file.mode access_mode with which file was opened
file.name name of file

Examples¶

In [1]:
f_handle = open("file_to_write.txt", "w+")          #Both Write + Read

print ("Name of the file is -", f_handle.name)
print (f'File is opened in {f_handle.mode} mode')
Name of the file is - file_to_write.txt
File is opened in w+ mode

Reading files¶

read()      # Reads entire file and return as big string
readline()  # Reads and returns one line at a time. 
            # Also, keeps track of where you are !!
readlines() # Reads ALL LINES and returns as a list 
            # nothing more to read!!
In [2]:
f_handle = open("CaptaincyData.csv", "r+")

string1  = f_handle.read()

print(string1)
print(repr(string1))

f_handle.close()
"names","Y","played","won","lost","victory"
"Mahi",2012,45,22,12,0.488888888888889
"Sourav",2004,49,21,13,0.428571428571429
"Azhar",2000,47,14,14,0.297872340425532
"Sunny",1980,47,9,8,0.191489361702128
"Pataudi",1965,40,9,19,0.225
"Dravid",2008,25,8,6,0.32

'"names","Y","played","won","lost","victory"\n"Mahi",2012,45,22,12,0.488888888888889\n"Sourav",2004,49,21,13,0.428571428571429\n"Azhar",2000,47,14,14,0.297872340425532\n"Sunny",1980,47,9,8,0.191489361702128\n"Pataudi",1965,40,9,19,0.225\n"Dravid",2008,25,8,6,0.32\n'

Files (Cont...)¶

Use try(): except() to avoid open call failures.

In [3]:
try:
    f_handle = open("CaptaincyData.csv", "r+")
    string1 = f_handle.read()
    print(string1)
    f_handle.close()
except:
    print('File could not be Opened')
"names","Y","played","won","lost","victory"
"Mahi",2012,45,22,12,0.488888888888889
"Sourav",2004,49,21,13,0.428571428571429
"Azhar",2000,47,14,14,0.297872340425532
"Sunny",1980,47,9,8,0.191489361702128
"Pataudi",1965,40,9,19,0.225
"Dravid",2008,25,8,6,0.32

In [4]:
# using with statement 
with open("file_with.txt", 'w') as file: 
    file.write('Hello world !') 
In [5]:
f_handle = open("CaptaincyData.csv", "r+")

line1 = f_handle.readline()
line2 = f_handle.readline()
# line3 = f_handle.readline()
# line4 = f_handle.readline()

print(line1)
print(line2)
# print(line3)
# print(line4)

f_handle.close()
"names","Y","played","won","lost","victory"

"Mahi",2012,45,22,12,0.488888888888889

In [6]:
f_handle = open("CaptaincyData.csv", "r+")

for line in f_handle:
    print(line)
f_handle.close()
"names","Y","played","won","lost","victory"

"Mahi",2012,45,22,12,0.488888888888889

"Sourav",2004,49,21,13,0.428571428571429

"Azhar",2000,47,14,14,0.297872340425532

"Sunny",1980,47,9,8,0.191489361702128

"Pataudi",1965,40,9,19,0.225

"Dravid",2008,25,8,6,0.32

In [7]:
f_handle = open("CaptaincyData.csv", "r+")

string2 = f_handle.readlines()
for line in string2:
    print(line)
print(len(string2))

f_handle.close()
"names","Y","played","won","lost","victory"

"Mahi",2012,45,22,12,0.488888888888889

"Sourav",2004,49,21,13,0.428571428571429

"Azhar",2000,47,14,14,0.297872340425532

"Sunny",1980,47,9,8,0.191489361702128

"Pataudi",1965,40,9,19,0.225

"Dravid",2008,25,8,6,0.32

7

Writing files¶

write () #Used to write a fixed sequence of characters to a file 
writelines() #writelines can write a list of strings.
In [8]:
f_read  = open("CaptaincyData.csv", "r+")
f_write = open("file_to_write.txt", "w+")

string1 = f_read.read()

f_write.write(string1)
f_write.close()
In [9]:
f_write = open("file_to_write.txt", "r+")
string2 = f_write.read()
f_write.close()

print(string2)
"names","Y","played","won","lost","victory"
"Mahi",2012,45,22,12,0.488888888888889
"Sourav",2004,49,21,13,0.428571428571429
"Azhar",2000,47,14,14,0.297872340425532
"Sunny",1980,47,9,8,0.191489361702128
"Pataudi",1965,40,9,19,0.225
"Dravid",2008,25,8,6,0.32

Files (Cont...)¶

method description
tell() Returns current position within a file
seek(offset) to move to new file position
offset: byte count
In [10]:
f = open('free_text.txt', 'w+')
f.write('This is my first line!\nThis is my second line!\nThis is my third line\n')
f.close()
In [11]:
f = open('free_text.txt', 'r')
str_end_with = '\n'+'-'*50+'\n'

print('Initially the cursor is at', f.tell(), end=str_end_with)

char = f.read(1)
print(f"After reading '{char}', the cursor is at {f.tell()}", end=str_end_with)

word = f.read(2)
print(f"After read '{word}', the cursor is at {f.tell()}", end=str_end_with)

text = f.read()
print(f"Read all from current cursor: {text}", end=str_end_with)
print(f"Now cursor is at: {f.tell()}")                                            #WHY?
Initially the cursor is at 0
--------------------------------------------------
After reading 'T', the cursor is at 1
--------------------------------------------------
After read 'hi', the cursor is at 3
--------------------------------------------------
Read all from current cursor: s is my first line!
This is my second line!
This is my third line

--------------------------------------------------
Now cursor is at: 69
In [12]:
f_handle = open("CaptaincyData.csv", "r+")
print(f_handle.tell())

string1  = f_handle.readline()
print(len(string1))
 
print(f_handle.tell())
f_handle.close()
0
44
45
In [13]:
#Find number of lines that starts with S
f_handle           = open("CaptaincyData.csv")
num_lines          = 0
line_starts_with_s = 0
text               = ''

for line in f_handle:
    text       += line
    num_lines  += 1
    line       = line.strip().replace('"', '')
    
    if (line.strip()).startswith('S'):
        line_starts_with_s += 1

f_handle.close()

print(f'Number of lines: {num_lines}',
      '\nNumber of lines that starts with S: {line_starts_with_s}')
Number of lines: 7 
Number of lines that starts with S: {line_starts_with_s}
In [14]:
print(text)
"names","Y","played","won","lost","victory"
"Mahi",2012,45,22,12,0.488888888888889
"Sourav",2004,49,21,13,0.428571428571429
"Azhar",2000,47,14,14,0.297872340425532
"Sunny",1980,47,9,8,0.191489361702128
"Pataudi",1965,40,9,19,0.225
"Dravid",2008,25,8,6,0.32

In [15]:
f2   = open('bug.txt', 'w')
text = 'This is my Bug.'
print(text, "Second Word", 'Second Line', file=f2, sep='\n', end=str_end_with)
f2.close()
In [16]:
f2.close()