Skip to content

7. Text files

Image

7.1. Script [fic_01]: Reading/writing a text file

The following script illustrates an example of working with text files:


# imports
import sys


# Creating and then sequentially processing a text file
# this file consists of lines in the format login:pwd:uid:gid:infos:dir:shell
# each line is stored in a dictionary in the format login => uid:gid:infos:dir:shell

# --------------------------------------------------------------------------
def display_info(dico: dict, key: str):
    # displays the value associated with key in the dictionary dic if it exists
    if key in dict.keys():
        # display the value associated with key
        print(f"{key} : {dico[key]}")
    else:
        # key is not a key in the dico dictionary
        print(f"The key [{key}] does not exist")


# main -----------------------------------------------
# Set the file name
FILE_NAME = "./data/infos.txt"

# Create and populate the text file
fic = None
try:
    # Open the file for writing (w=write)
    fic = open(FILE_NAME, "w")
    # Generate arbitrary content
    for i in range(1, 101):
        # one line
        line = f"login{i}:pwd{i}:uid{i}:gid{i}:infos{i}:dir{i}:shell{i}"
        # is written to the text file
        fic.write(f"{line}\n")
except IOError as error:
    print(f"Error processing file {FILE_NAME}: {error}")
    sys.exit()
finally:
    # close the file if it was opened
    if file:
        fic.close()

# open it for reading
fic = None
try:
    # open the file for reading
    fic = open(FILE_NAME, "r")
    # empty dictionary at the start
    dico = {}
    # Each line is added to the dictionary [dico] in the format login => uid:gid:info:dir:shell
    # Read the first line, removing leading and trailing spaces
    line = fic.readline().strip()
    # while the line is not empty
    while line != '':
        # we put the line into an array
        info = line.split(":")
        # retrieve the login
        username = info[0]
        # ignore the password
        info[0:2] = []
        # create an entry in the dictionary
        dico[login] = info
        # read next line
        line = fic.readline().strip()
except IOError as error:
    print(f"Error processing file {FILE_NAME}: {error}")
    sys.exit()
finally:
    # close the file if it was opened
    if file:
        fic.close()

# use the dico dictionary
display_info(dico, "login10")
display_info(dico, "X")

Notes:

  • line 28: opens the file for writing (w=write). If the file already exists, it will be overwritten;
  • Lines 30–34: Generate 100 lines in the text file;
  • line 34: to write a line to the text file. The [write] method does not add a newline character. Therefore, you must include this in the written text;
  • lines 35–37: handle any exceptions;
  • line 37: Abort execution of the script (however, after the finally block has executed);
  • lines 38–41: in all cases, whether an error occurs or not, close the file if it is open;
  • line 47: open the file for reading (r=read);
  • line 49: definition of an empty dictionary;
  • line 52: the [readline] method reads a line of text, including the end-of-line character. The [strip] method removes "spaces" from the beginning and end of the string. By "space," we mean whitespace characters, line breaks, page breaks, tabs, and a few others. So here, [line] will not contain the line break characters [\r\n] (Windows) or [\n] (Unix);
  • line 54: the file is processed until an empty line is encountered;
  • lines 54–64: the text file is transferred to the dictionary [dico]. The key is the [login] field, and the value consists of the [uid:gid:infos:dir:shell] fields;
  • lines 65–67: handle any exceptions;
  • lines 68–71: close the file in all cases, whether an error occurs or not;
  • lines 74-75: query the dictionary [dico];

The file [data/infos.txt]:

1
2
3
4
5
6
login0:pwd0:uid0:gid0:infos0:dir0:shell0
login1:pwd1:uid1:gid1:infos1:dir1:shell1
login2:pwd2:uid2:gid2:infos2:dir2:shell2
login98:pwd98:uid98:gid98:infos98:dir98:shell98
login99:pwd99:uid99:gid99:infos99:dir99:shell99

Screen output:


C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/files/fic_01.py
login10: ['uid10', 'gid10', 'infos10', 'dir10', 'shell10']
The key [X] does not exist

Process finished with exit code 0

7.2. Script [fic_02]: Handling UTF-8-encoded text files

In the rest of this document, we will be working exclusively with UTF-8 encoded text files. First, we will configure PyCharm:

Image

  • in [5-6]: select UTF-8 encoding for project files;

To create a UTF-8 encoded file, proceed as follows (fic-02):


# imports
import codecs

# writing UTF-8 to a text file
# exceptions are not handled
file = codecs.open("./data/utf8.txt", "w", "utf8")
file.write("Hélène went to Basel to stay with her grandmother over the summer")
file.close()

Notes

  • line 2: to handle file encoding, we import the [codecs] module;
  • line 6: the [codecs.open] method is used like the standard [open] function. However, you can specify the desired encoding (when creating) or the existing encoding (when reading). After opening, the [file] object obtained on line 6 is used like a standard file;
  • line 7: accented characters were used, which usually have different representations depending on the character encoding used;

Results

When opening the [data/utf8.txt] file obtained (see line 6), the following result is obtained:

Image

7.3. Script [fic_03]: handling text files encoded in ISO-8859-1

The script [fic_03] does the same thing as the script [fic_02] but encodes the text file in ISO-8859-1. We want to show the difference between the resulting files:


# imports
import codecs

# writing ISO-8859-1 to a text file
# we do not handle exceptions
file = codecs.open("./data/iso-8859-1.txt", "w", "iso-8859-1")
file.write("Hélène went to Basel to stay with her grandmother over the summer")
file.close()

When we open the [data/iso-8859-1] file created on line 6, we get the following result:

Image

Because we configured the project to work with UTF-8 files, PyCharm tried to open the [iso-8859-1.txt] file in UTF-8. It can see [1] that the file is not in UTF-8. It then suggests [2] reloading the file in a different encoding:

Image

  • in [3-5]: the file is reloaded using ISO-8859-1 encoding;

Image

  • in [6], the same file but displayed with a different encoding;

If we go back to the project settings:

Image

  • we see that in [6-7], PyCharm noted that the file [iso-8859-1.txt] should be opened with ISO-8859-1 encoding. This is therefore an exception to the rule [5];

7.4. Script [json_01]: Working with a JSON file

JSON stands for JavaScript Object Notation. As the name suggests, it is a text-based representation of JavaScript objects. Here, we will use it with Python objects.

The JSON file being managed [data/in.json] will look like this:

Image

  • In [2], we can see that the text content of the [in.json] file could represent a Python dictionary. PyCharm has formatted (Ctrl-Alt-L) this text, but even if it were on a single line, it wouldn’t make any difference. The format of the text is irrelevant as long as it syntactically represents a Python object;

The script [json-01] shows how to use this file:


# imports
import codecs
import json
import sys

# reading/writing a JSON file
inFile=None
outFile = None
try:
    # Open the JSON file for reading
    inFile = codecs.open("./data/in.json", "r", "utf8")
    # Transfer the content to a dictionary
    data = json.load(inFile)
    # Display the loaded data
    print(f"data={data}, type(data)={type(data)}")
    limits = data['limits']
    print(f"limits={limits}, type(limits)={type(limits)}")
    print(f"limits[1]={limits[1]}, type(limits[1])={type(limits[1])}")
    # Transfer the [data] dictionary to a JSON file
    outFile = codecs.open("./data/out.json", "w", "utf8")
    json.dump(data, outFile)
except BaseException as error:
    # display the error and exit
    print(f"The following error occurred: {error}")
    sys.exit()
finally:
    # close any open files
    if inFile:
        inFile.close()
    if outFile:
        outFile.close()

Notes

  • Line 3: To work with JSON, we import the [json] module;
  • line 11: we will be working with JSON files encoded in UTF-8. Here, we open the file [data/in.json] using the [codecs] module;
  • line 13: the [json.load] method reads the contents of the JSON file and stores them in the [data] variable. The type of this variable will be a dictionary;
  • lines 15–18: to verify that we have indeed obtained a Python dictionary, we display some of its elements;
  • lines 20–21: we perform the reverse operation: the dictionary [data] is written to a UTF-8 encoded file using the [json.dump] method;
  • lines 22–25: handling any exceptions;
  • lines 26-31: in any case, whether an error occurs or not, we close any files that may have been opened;

Results


C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/fichiers/json_01.py
data={'limits': [9964, 27519, 73779, 156244, 0], 'coeffR': [0, 0.14, 0.3, 0.41, 0.45], 'coeffN': [0, 1394.96, 5798, 13913.69, 20163.45], 'HALF-RATE_INCOME_LIMIT': 1551, 'SINGLE_PERSON_INCOME_LIMIT_FOR_REDUCTION': 21037, 'COUPLE_INCOME_LIMIT_FOR_REDUCTION': 42074, 'REDUCTION_VALUE_HALF_PORTION': 3797, 'SINGLE_TAX_DEDUCTION_THRESHOLD': 1196, 'COUPLE_TAX_DEDUCTION_THRESHOLD': 1970, 'COUPLE_TAX_THRESHOLD_FOR_DEDUCTION': 2627, 'SINGLE_TAX_LIMIT_FOR_DEDUCTION': 1595, 'MAX_10_PERCENT_DEDUCTION': 12502, 'MIN_10_PERCENT_DEDUCTION': 437}, type(data)=<class 'dict'>
limits=[9964, 27519, 73779, 156244, 0], type(limits)=<class 'list'>
limits[1]=27519, type(limits[1])=<class 'int'>

Process finished with exit code 0
  • Lines 2–4 show that we have successfully retrieved the dictionary from the JSON file;

Now, let’s look at the contents of the [data/out.json] file:

Image

The text in the file is on a single line. However, PyCharm recognizes JSON files, and we can format them—just like Python files and others—using Ctrl-Alt-L. This gives us the following:

Image

7.5. Script [json_02]: Handling JSON Files Encoded in UTF-8

A JSON file encoded in UTF-8 can take two forms:


# imports
import codecs
import json
import sys

# dictionary
data = {'married': 'yes', 'tax': 1340}

# writing a JSON file
out_file1 = None
out_file2 = None
try:
    # Transfer the [data] dictionary to a JSON file
    out_file1 = codecs.open("./data/out1.json", "w", "utf8")
    json.dump(data, out_file1, ensure_ascii=True)
    # Write the [data] dictionary to a JSON file
    out_file2 = codecs.open("./data/out2.json", "w", "utf8")
    json.dump(data, out_file2, ensure_ascii=False)
except BaseException as error:
    # display the error and exit
    print(f"The following error occurred: {error}")
    sys.exit()
finally:
    # close files if they are open
    if out_file1:
        out_file1.close()
    if out_file2:
        out_file2.close()

  • In this script, the [data] dictionary (line 7) is written to two JSON files (lines 14, 17);
  • lines 14, 17: in both cases, a UTF-8 text file is created;
  • line 15: when writing the dictionary, we use the parameter named [ensure_ascii=True];
  • lines 18: when writing the dictionary, we use the parameter named [ensure_ascii=False];

Here are the two resulting files:

Image

  • In the [out1.json] file, accented characters have been replaced by a sequence of characters representing their UTF-8 code. This is sometimes referred to as "escaping." Technically, in the binary of [out1.json], the character é in [marié] is represented by the UTF-8 binary codes of the 6 characters [\u00e9] in succession;
  • In the [out2.json] file, accented characters have been left as is. This means that in the binary data of [out2.json], these characters are represented by their UTF-8 binary code (just a single UTF-8 code, rather than 6 for [out1]). For the character é in [marié], we thus find the 4-byte binary code [00e9];
  • it is the value of the [ensure_ascii] parameter of the [json.dump] method that determines the format used;

Some applications use "escaped" UTF-8 for their JSON files. In that case, the value [ensure_ascii=True] must be used. This value is actually the default. Therefore, if the [ensure_ascii] parameter is not used, we will be working with escaped UTF-8 JSON files.

The script continues as follows:


# imports
import codecs
import json
import sys

# dictionary
data = {'married': 'yes', 'tax': 1340}



# reading JSON files
in_file1 = None
in_file2 = None
try:
    # Transfer JSON file 1 to a dictionary
    in_file1 = codecs.open("./data/out1.json", "r", "utf8")
    dico1 = json.load(in_file1)
    # display
    print(f"dico1={dico1}")
    # Transfer JSON file 2 into a dictionary
    in_file2 = codecs.open("./data/out2.json", "r", "utf8")
    dico2 = json.load(in_file2)
    # display
    print(f"dico2={dico2}")
except BaseException as error:
    # display the error and exit
    print(f"The following error occurred: {error}")
    sys.exit()
finally:
    # close files if they are open
    if in_file1:
        in_file1.close()
    if in_file2:
        in_file2.close()

Notes

  • lines 11–34: read the two files [out1.json, out2.json] and display the dictionary read in each case;

Results

1
2
3
4
5
C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/files/json_02.py
dico1={'married': 'yes', 'tax': 1340}
dico2={'married': 'yes', 'tax': 1340}

Process finished with exit code 0

Surprisingly, we see that we didn’t need to specify the encoding type (escaped or not) of the JSON string to be read to the [json.load] function (lines 17, 22). In both cases, we retrieve the correct dictionary.