7. Text files

7.1. Script [fic_01]: Reading/writing a text file
The following script illustrates an example of working with text files:
Notes:
- line 28: opens the file for writing (w=write). If the file already exists, it will be overwritten;
- Lines 30–34: Generate 100 lines in the text file;
- line 34: to write a line to the text file. The [write] method does not add a newline character. Therefore, you must include this in the written text;
- lines 35–37: handle any exceptions;
- line 37: Abort execution of the script (however, after the finally block has executed);
- lines 38–41: in all cases, whether an error occurs or not, close the file if it is open;
- line 47: open the file for reading (r=read);
- line 49: definition of an empty dictionary;
- line 52: the [readline] method reads a line of text, including the end-of-line character. The [strip] method removes "spaces" from the beginning and end of the string. By "space," we mean whitespace characters, line breaks, page breaks, tabs, and a few others. So here, [line] will not contain the line break characters [\r\n] (Windows) or [\n] (Unix);
- line 54: the file is processed until an empty line is encountered;
- lines 54–64: the text file is transferred to the dictionary [dico]. The key is the [login] field, and the value consists of the [uid:gid:infos:dir:shell] fields;
- lines 65–67: handle any exceptions;
- lines 68–71: close the file in all cases, whether an error occurs or not;
- lines 74-75: query the dictionary [dico];
The file [data/infos.txt]:
Screen output:
C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/fichiers/fic_01.py
login10 : ['uid10', 'gid10', 'infos10', 'dir10', 'shell10']
la clé [X] n'existe pas
Process finished with exit code 0
7.2. Script [fic_02]: Handling UTF-8-encoded text files
In the rest of this document, we will be working exclusively with UTF-8 encoded text files. First, we will configure PyCharm:

- in [5-6]: select UTF-8 encoding for project files;
To create a UTF-8 encoded file, proceed as follows (fic-02):
Notes
- line 2: to handle file encoding, we import the [codecs] module;
- line 6: the [codecs.open] method is used like the standard [open] function. However, you can specify the desired encoding (when creating) or the existing encoding (when reading). After opening, the [file] object obtained on line 6 is used like a standard file;
- line 7: accented characters were used, which usually have different representations depending on the character encoding used;
Results
When opening the [data/utf8.txt] file obtained (see line 6), the following result is obtained:

7.3. Script [fic_03]: handling text files encoded in ISO-8859-1
The script [fic_03] does the same thing as the script [fic_02] but encodes the text file in ISO-8859-1. We want to show the difference between the resulting files:
When we open the [data/iso-8859-1] file created on line 6, we get the following result:

Because we configured the project to work with UTF-8 files, PyCharm tried to open the [iso-8859-1.txt] file in UTF-8. It can see [1] that the file is not in UTF-8. It then suggests [2] reloading the file in a different encoding:

- in [3-5]: the file is reloaded using ISO-8859-1 encoding;

- in [6], the same file but displayed with a different encoding;
If we go back to the project settings:

- we see that in [6-7], PyCharm noted that the file [iso-8859-1.txt] should be opened with ISO-8859-1 encoding. This is therefore an exception to the rule [5];
7.4. Script [json_01]: Working with a JSON file
JSON stands for JavaScript Object Notation. As the name suggests, it is a text-based representation of JavaScript objects. Here, we will use it with Python objects.
The JSON file being managed [data/in.json] will look like this:

- In [2], we can see that the text content of the [in.json] file could represent a Python dictionary. PyCharm has formatted (Ctrl-Alt-L) this text, but even if it were on a single line, it wouldn’t make any difference. The format of the text is irrelevant as long as it syntactically represents a Python object;
The script [json-01] shows how to use this file:
Notes
- Line 3: To work with JSON, we import the [json] module;
- line 11: we will be working with JSON files encoded in UTF-8. Here, we open the file [data/in.json] using the [codecs] module;
- line 13: the [json.load] method reads the contents of the JSON file and stores them in the [data] variable. The type of this variable will be a dictionary;
- lines 15–18: to verify that we have indeed obtained a Python dictionary, we display some of its elements;
- lines 20–21: we perform the reverse operation: the dictionary [data] is written to a UTF-8 encoded file using the [json.dump] method;
- lines 22–25: handling any exceptions;
- lines 26-31: in any case, whether an error occurs or not, we close any files that may have been opened;
Results
C:\Data\st-2020\dev\python\cours-2020\python3-flask-2020\venv\Scripts\python.exe C:/Data/st-2020/dev/python/cours-2020/python3-flask-2020/fichiers/json_01.py
data={'limites': [9964, 27519, 73779, 156244, 0], 'coeffR': [0, 0.14, 0.3, 0.41, 0.45], 'coeffN': [0, 1394.96, 5798, 13913.69, 20163.45], 'PLAFOND_QF_DEMI_PART': 1551, 'PLAFOND_REVENUS_CELIBATAIRE_POUR_REDUCTION': 21037, 'PLAFOND_REVENUS_COUPLE_POUR_REDUCTION': 42074, 'VALEUR_REDUC_DEMI_PART': 3797, 'PLAFOND_DECOTE_CELIBATAIRE': 1196, 'PLAFOND_DECOTE_COUPLE': 1970, 'PLAFOND_IMPOT_COUPLE_POUR_DECOTE': 2627, 'PLAFOND_IMPOT_CELIBATAIRE_POUR_DECOTE': 1595, 'ABATTEMENT_DIXPOURCENT_MAX': 12502, 'ABATTEMENT_DIXPOURCENT_MIN': 437}, type(data)=<class 'dict'>
limites=[9964, 27519, 73779, 156244, 0], type(limites)=<class 'list'>
limites[1]=27519, type(limites[1])=<class 'int'>
Process finished with exit code 0
- Lines 2–4 show that we have successfully retrieved the dictionary from the JSON file;
Now, let’s look at the contents of the [data/out.json] file:

The text in the file is on a single line. However, PyCharm recognizes JSON files, and we can format them—just like Python files and others—using Ctrl-Alt-L. This gives us the following:

7.5. Script [json_02]: Handling JSON Files Encoded in UTF-8
A JSON file encoded in UTF-8 can take two forms:
- In this script, the [data] dictionary (line 7) is written to two JSON files (lines 14, 17);
- lines 14, 17: in both cases, a UTF-8 text file is created;
- line 15: when writing the dictionary, we use the parameter named [ensure_ascii=True];
- lines 18: when writing the dictionary, we use the parameter named [ensure_ascii=False];
Here are the two resulting files:

- In the [out1.json] file, accented characters have been replaced by a sequence of characters representing their UTF-8 code. This is sometimes referred to as "escaping." Technically, in the binary of [out1.json], the character é in [marié] is represented by the UTF-8 binary codes of the 6 characters [\u00e9] in succession;
- In the [out2.json] file, accented characters have been left as is. This means that in the binary data of [out2.json], these characters are represented by their UTF-8 binary code (just a single UTF-8 code, rather than 6 for [out1]). For the character é in [marié], we thus find the 4-byte binary code [00e9];
- it is the value of the [ensure_ascii] parameter of the [json.dump] method that determines the format used;
Some applications use "escaped" UTF-8 for their JSON files. In that case, the value [ensure_ascii=True] must be used. This value is actually the default. Therefore, if the [ensure_ascii] parameter is not used, we will be working with escaped UTF-8 JSON files.
The script continues as follows:
Notes
- lines 11–34: read the two files [out1.json, out2.json] and display the dictionary read in each case;
Results
Surprisingly, we see that we didn’t need to specify the encoding type (escaped or not) of the JSON string to be read to the [json.load] function (lines 17, 22). In both cases, we retrieve the correct dictionary.