7. Text files

7.1. Script [fic_01]: Reading/writing a text file
The following script illustrates an example of working with text files:
# imports
import sys
# Creating and then sequentially processing a text file
# this file consists of lines in the format login:pwd:uid:gid:infos:dir:shell
# each line is stored in a dictionary in the format login => uid:gid:infos:dir:shell
# --------------------------------------------------------------------------
def display_info(dico: dict, key: str):
# displays the value associated with key in the dictionary dic if it exists
if key in dict.keys():
# display the value associated with key
print(f"{key} : {dico[key]}")
else:
# key is not a key in the dico dictionary
print(f"The key [{key}] does not exist")
# main -----------------------------------------------
# Set the file name
FILE_NAME = "./data/infos.txt"
# Create and populate the text file
fic = None
try:
# Open the file for writing (w=write)
fic = open(FILE_NAME, "w")
# Generate arbitrary content
for i in range(1, 101):
# one line
line = f"login{i}:pwd{i}:uid{i}:gid{i}:infos{i}:dir{i}:shell{i}"
# is written to the text file
fic.write(f"{line}\n")
except IOError as error:
print(f"Error processing file {FILE_NAME}: {error}")
sys.exit()
finally:
# close the file if it was opened
if file:
fic.close()
# open it for reading
fic = None
try:
# open the file for reading
fic = open(FILE_NAME, "r")
# empty dictionary at the start
dico = {}
# Each line is added to the dictionary [dico] in the format login => uid:gid:info:dir:shell
# Read the first line, removing leading and trailing spaces
line = fic.readline().strip()
# while the line is not empty
while line != '':
# we put the line into an array
info = line.split(":")
# retrieve the login
username = info[0]
# ignore the password
info[0:2] = []
# create an entry in the dictionary
dico[login] = info
# read next line
line = fic.readline().strip()
except IOError as error:
print(f"Error processing file {FILE_NAME}: {error}")
sys.exit()
finally:
# close the file if it was opened
if file:
fic.close()
# use the dico dictionary
display_info(dico, "login10")
display_info(dico, "X")
Notes:
- line 28: opens the file for writing (w=write). If the file already exists, it will be overwritten;
- Lines 30–34: Generate 100 lines in the text file;
- line 34: to write a line to the text file. The [write] method does not add a newline character. Therefore, you must include this in the written text;
- lines 35–37: handle any exceptions;
- line 37: Abort execution of the script (however, after the finally block has executed);
- lines 38–41: in all cases, whether an error occurs or not, close the file if it is open;
- line 47: open the file for reading (r=read);
- line 49: definition of an empty dictionary;
- line 52: the [readline] method reads a line of text, including the end-of-line character. The [strip] method removes "spaces" from the beginning and end of the string. By "space," we mean whitespace characters, line breaks, page breaks, tabs, and a few others. So here, [line] will not contain the line break characters [\r\n] (Windows) or [\n] (Unix);
- line 54: the file is processed until an empty line is encountered;
- lines 54–64: the text file is transferred to the dictionary [dico]. The key is the [login] field, and the value consists of the [uid:gid:infos:dir:shell] fields;
- lines 65–67: handle any exceptions;
- lines 68–71: close the file in all cases, whether an error occurs or not;
- lines 74-75: query the dictionary [dico];
The file [data/infos.txt]:
Screen output:
C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/files/fic_01.py
login10: ['uid10', 'gid10', 'infos10', 'dir10', 'shell10']
The key [X] does not exist
Process finished with exit code 0
7.2. Script [fic_02]: Handling UTF-8-encoded text files
In the rest of this document, we will be working exclusively with UTF-8 encoded text files. First, we will configure PyCharm:

- in [5-6]: select UTF-8 encoding for project files;
To create a UTF-8 encoded file, proceed as follows (fic-02):
# imports
import codecs
# writing UTF-8 to a text file
# exceptions are not handled
file = codecs.open("./data/utf8.txt", "w", "utf8")
file.write("Hélène went to Basel to stay with her grandmother over the summer")
file.close()
Notes
- line 2: to handle file encoding, we import the [codecs] module;
- line 6: the [codecs.open] method is used like the standard [open] function. However, you can specify the desired encoding (when creating) or the existing encoding (when reading). After opening, the [file] object obtained on line 6 is used like a standard file;
- line 7: accented characters were used, which usually have different representations depending on the character encoding used;
Results
When opening the [data/utf8.txt] file obtained (see line 6), the following result is obtained:

7.3. Script [fic_03]: handling text files encoded in ISO-8859-1
The script [fic_03] does the same thing as the script [fic_02] but encodes the text file in ISO-8859-1. We want to show the difference between the resulting files:
# imports
import codecs
# writing ISO-8859-1 to a text file
# we do not handle exceptions
file = codecs.open("./data/iso-8859-1.txt", "w", "iso-8859-1")
file.write("Hélène went to Basel to stay with her grandmother over the summer")
file.close()
When we open the [data/iso-8859-1] file created on line 6, we get the following result:

Because we configured the project to work with UTF-8 files, PyCharm tried to open the [iso-8859-1.txt] file in UTF-8. It can see [1] that the file is not in UTF-8. It then suggests [2] reloading the file in a different encoding:

- in [3-5]: the file is reloaded using ISO-8859-1 encoding;

- in [6], the same file but displayed with a different encoding;
If we go back to the project settings:

- we see that in [6-7], PyCharm noted that the file [iso-8859-1.txt] should be opened with ISO-8859-1 encoding. This is therefore an exception to the rule [5];
7.4. Script [json_01]: Working with a JSON file
JSON stands for JavaScript Object Notation. As the name suggests, it is a text-based representation of JavaScript objects. Here, we will use it with Python objects.
The JSON file being managed [data/in.json] will look like this:

- In [2], we can see that the text content of the [in.json] file could represent a Python dictionary. PyCharm has formatted (Ctrl-Alt-L) this text, but even if it were on a single line, it wouldn’t make any difference. The format of the text is irrelevant as long as it syntactically represents a Python object;
The script [json-01] shows how to use this file:
# imports
import codecs
import json
import sys
# reading/writing a JSON file
inFile=None
outFile = None
try:
# Open the JSON file for reading
inFile = codecs.open("./data/in.json", "r", "utf8")
# Transfer the content to a dictionary
data = json.load(inFile)
# Display the loaded data
print(f"data={data}, type(data)={type(data)}")
limits = data['limits']
print(f"limits={limits}, type(limits)={type(limits)}")
print(f"limits[1]={limits[1]}, type(limits[1])={type(limits[1])}")
# Transfer the [data] dictionary to a JSON file
outFile = codecs.open("./data/out.json", "w", "utf8")
json.dump(data, outFile)
except BaseException as error:
# display the error and exit
print(f"The following error occurred: {error}")
sys.exit()
finally:
# close any open files
if inFile:
inFile.close()
if outFile:
outFile.close()
Notes
- Line 3: To work with JSON, we import the [json] module;
- line 11: we will be working with JSON files encoded in UTF-8. Here, we open the file [data/in.json] using the [codecs] module;
- line 13: the [json.load] method reads the contents of the JSON file and stores them in the [data] variable. The type of this variable will be a dictionary;
- lines 15–18: to verify that we have indeed obtained a Python dictionary, we display some of its elements;
- lines 20–21: we perform the reverse operation: the dictionary [data] is written to a UTF-8 encoded file using the [json.dump] method;
- lines 22–25: handling any exceptions;
- lines 26-31: in any case, whether an error occurs or not, we close any files that may have been opened;
Results
C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/fichiers/json_01.py
data={'limits': [9964, 27519, 73779, 156244, 0], 'coeffR': [0, 0.14, 0.3, 0.41, 0.45], 'coeffN': [0, 1394.96, 5798, 13913.69, 20163.45], 'HALF-RATE_INCOME_LIMIT': 1551, 'SINGLE_PERSON_INCOME_LIMIT_FOR_REDUCTION': 21037, 'COUPLE_INCOME_LIMIT_FOR_REDUCTION': 42074, 'REDUCTION_VALUE_HALF_PORTION': 3797, 'SINGLE_TAX_DEDUCTION_THRESHOLD': 1196, 'COUPLE_TAX_DEDUCTION_THRESHOLD': 1970, 'COUPLE_TAX_THRESHOLD_FOR_DEDUCTION': 2627, 'SINGLE_TAX_LIMIT_FOR_DEDUCTION': 1595, 'MAX_10_PERCENT_DEDUCTION': 12502, 'MIN_10_PERCENT_DEDUCTION': 437}, type(data)=<class 'dict'>
limits=[9964, 27519, 73779, 156244, 0], type(limits)=<class 'list'>
limits[1]=27519, type(limits[1])=<class 'int'>
Process finished with exit code 0
- Lines 2–4 show that we have successfully retrieved the dictionary from the JSON file;
Now, let’s look at the contents of the [data/out.json] file:

The text in the file is on a single line. However, PyCharm recognizes JSON files, and we can format them—just like Python files and others—using Ctrl-Alt-L. This gives us the following:

7.5. Script [json_02]: Handling JSON Files Encoded in UTF-8
A JSON file encoded in UTF-8 can take two forms:
# imports
import codecs
import json
import sys
# dictionary
data = {'married': 'yes', 'tax': 1340}
# writing a JSON file
out_file1 = None
out_file2 = None
try:
# Transfer the [data] dictionary to a JSON file
out_file1 = codecs.open("./data/out1.json", "w", "utf8")
json.dump(data, out_file1, ensure_ascii=True)
# Write the [data] dictionary to a JSON file
out_file2 = codecs.open("./data/out2.json", "w", "utf8")
json.dump(data, out_file2, ensure_ascii=False)
except BaseException as error:
# display the error and exit
print(f"The following error occurred: {error}")
sys.exit()
finally:
# close files if they are open
if out_file1:
out_file1.close()
if out_file2:
out_file2.close()
…
- In this script, the [data] dictionary (line 7) is written to two JSON files (lines 14, 17);
- lines 14, 17: in both cases, a UTF-8 text file is created;
- line 15: when writing the dictionary, we use the parameter named [ensure_ascii=True];
- lines 18: when writing the dictionary, we use the parameter named [ensure_ascii=False];
Here are the two resulting files:

- In the [out1.json] file, accented characters have been replaced by a sequence of characters representing their UTF-8 code. This is sometimes referred to as "escaping." Technically, in the binary of [out1.json], the character é in [marié] is represented by the UTF-8 binary codes of the 6 characters [\u00e9] in succession;
- In the [out2.json] file, accented characters have been left as is. This means that in the binary data of [out2.json], these characters are represented by their UTF-8 binary code (just a single UTF-8 code, rather than 6 for [out1]). For the character é in [marié], we thus find the 4-byte binary code [00e9];
- it is the value of the [ensure_ascii] parameter of the [json.dump] method that determines the format used;
Some applications use "escaped" UTF-8 for their JSON files. In that case, the value [ensure_ascii=True] must be used. This value is actually the default. Therefore, if the [ensure_ascii] parameter is not used, we will be working with escaped UTF-8 JSON files.
The script continues as follows:
# imports
import codecs
import json
import sys
# dictionary
data = {'married': 'yes', 'tax': 1340}
…
# reading JSON files
in_file1 = None
in_file2 = None
try:
# Transfer JSON file 1 to a dictionary
in_file1 = codecs.open("./data/out1.json", "r", "utf8")
dico1 = json.load(in_file1)
# display
print(f"dico1={dico1}")
# Transfer JSON file 2 into a dictionary
in_file2 = codecs.open("./data/out2.json", "r", "utf8")
dico2 = json.load(in_file2)
# display
print(f"dico2={dico2}")
except BaseException as error:
# display the error and exit
print(f"The following error occurred: {error}")
sys.exit()
finally:
# close files if they are open
if in_file1:
in_file1.close()
if in_file2:
in_file2.close()
Notes
- lines 11–34: read the two files [out1.json, out2.json] and display the dictionary read in each case;
Results
Surprisingly, we see that we didn’t need to specify the encoding type (escaped or not) of the JSON string to be read to the [json.load] function (lines 17, 22). In both cases, we retrieve the correct dictionary.