Jump Links
Open a File and Read Its Contents
Using the with Keyword to Close a File Automatically
Copy a File by Reading and Writing
Write to a Log File by Appending
Use Buffering to Control File Saving
Specify an Encoding for Proper UTF-8 Support
Handle Badly Formed Files Using the Errors Parameter
Python’s open function should be your first port of call when you’re looking to read the contents of a file. Give it a filename and you’ll get back a versatile…
Jump Links
Open a File and Read Its Contents
Using the with Keyword to Close a File Automatically
Copy a File by Reading and Writing
Write to a Log File by Appending
Use Buffering to Control File Saving
Specify an Encoding for Proper UTF-8 Support
Handle Badly Formed Files Using the Errors Parameter
Python’s open function should be your first port of call when you’re looking to read the contents of a file. Give it a filename and you’ll get back a versatile object, letting you read and write data, both in plain text and binary formats.
These examples show just how flexible the function is, with support for different modes, buffering, encoding, and more.
Open a File and Read Its Contents
The open() function has quite a complicated signature, but in the simplest cases, you can use it to open a text file like this:
f = open(filename)
By default, Python opens this file in read mode, meaning that you can only read from it. This is ideal for configuration files, static data files, and ... It also makes this common case concise and easy to remember.
The open() function returns a file object that you can then use to carry out various tasks, including reading a file’s complete contents:
f = open("/usr/share/dict/words")text = f.read()print(text)
Note that this simple snippet will cause Python to raise an exception if the file does not exist:
One file that is guaranteed to exist is the script you’re running, which Python makes available in the special **__**file__ variable. This makes it straightforward to write a script that prints its own source code, otherwise known as a quine:
f = open(__file__)text = f.read()print(text)
If you’re being really strict, this is not quite a quine because reading a file is considered cheating.
Using the with Keyword to Close a File Automatically
When Python opens a file, it allocates system resources to that file, which it then uses for future operations like read and write. If your program runs to the end, Python should clean up these resources, making them available to other processes that may run on your system. But this is not guaranteed, so you should always ensure resources are cleaned up correctly. The simplest way of doing so is by explicitly calling the close() method on your file when you know you’ve finished working with it:
f = open("/usr/share/dict/words")text = f.read()f.close()
Aim to close a file as soon as you can. For example, if you open a file, read its contents, and then process the contents, try to close the file as soon as you’ve read its contents. There’s no need to keep the file open while you process data that you’ve read into a variable.
But problems can still occur: what if the call to close() doesn’t run properly? To get around this issue, you should use the with keyword. This creates a context manager for the block it wraps, which ensures file resources are freed:
with open("/usr/share/dict/words") as f: text = f.read()
Copy a File by Reading and Writing
Various Python libraries provide ways of copying a file, so this example is purely illustrative. It demonstrates the use of open’s second argument, mode. This argument tells open how you intend to use the file. You can use any valid combination of these characters:
Character | Meaning |
---|---|
r | Read |
w | Write |
x | Create & write |
a | Append |
b | Binary |
t | Text |
+ | Update |
The default mode is rt—read a text file—which is why the first example in this article worked as you would expect. To copy the file you read from, you’ll need to open a second file using the w mode to write. You’ll also need to use the b mode for both to ensure the read and write operations respect binary data.
source = "./image.jpg"target = "./a-copy-of-image.jpg"with open(source, "rb") as src, open(target, "wb") as tgt: buffer = src.read() tgt.write(buffer)
The with block operates on both files, so it will automatically close them once it completes.
Create a New Text File
You can also use the mode argument to create a new file, but protect any file of the same name that may already exist:
open(filename, "x")
If a file of the same name already exists, this open call will throw a FileExistsError exception. This is a good defensive measure that avoids the need to explicitly check for the file’s existence:
# Warning: don't do this!import os.pathfilename = "example.txt"if os.path.isfile(filename): print("Sorry, file already exists")else: with open(filename, "w") as f: # ...
Aside from having to write a bit less code, however, there’s an even better reason to use the “x” mode: it avoids a race condition. Consider the example above, which has one statement that checks if the file exists (if os.path.isfile(filename)) followed by another statement that opens it for writing (with open(filename, “w”) as f). If another process does something with that file in between those two statements, disaster can occur.
Opening a file in create mode can avert disaster because a single statement is responsible for checking the file and opening it. Either it fails, and the existing file is protected, or it succeeds, and no other process can create another file of the same name in the meantime.
Write to a Log File by Appending
By default, a file that you open in write mode will be truncated first, so its contents will be overwritten. To append to a file instead, you can use append mode:
log = open("file.log", "a")
Again, logging in Python is better handled with the logging module, which takes care of many of the awkward details. However, as an example, you can log to a file using code similar to this:
def startup(): print("Just a dummy")def main(): print("Doing the main thing") return 42def log(msg): logfile.write(msg + "\n")logfile = open("file.log", "w")log("starting startup")startup()log("startup finished")log("starting main")ret = main()log("main finished: " + str(ret))logfile.close()
Use Buffering to Control File Saving
To avoid constantly reopening and saving the same file, the logging example uses a global variable and a long-lived file handle. This seems OK in a simple example, but in a real-life situation, your program may run indefinitely, and you may want to check that log file at any time. If you do, you may be surprised:
while True: log(random.random()) input("Press Enter to continue")
This code mimics periodic logging from a long-running process. You can control its logging by pressing Enter when you want another line to be logged. However, if you run it, you should notice a big flaw: if you press Enter a few times and check the log file, you’ll see that nothing gets written to it!
The tail command—specifically, tail -f—can help you track changes to a log file in real time.
If you press Enter enough times, you should finally see some results because the output will have exceeded Python’s default buffer size. On my system, this is 8192 bytes, which will take a long time to accumulate.
Fortunately, for cases such as this, there’s the buffering argument, which lets you define the buffering policy. In the case of a log file, a great approach to take is line buffering, which
Try running the previous example with a tiny tweak:
logfile = open("file.log", "w", 1)
The third argument, 1, specifies line buffering. With this enabled, you should see that the log file updates every time the log() function runs because it includes a trailing newline character when it writes to the file.
Specify an Encoding for Proper UTF-8 Support
Character encoding is a complicated subject, but today’s dominance of UTF-8 means you’ll rarely need to worry about it. However, some legacy systems may use other encodings, and you never know what may crop up in the future. Mistakes with encoding can cause big problems.
UTF-16 is an alternative to UTF-8 that is more efficient for certain types of text. Most text written in English is better suited to UTF-8, but text in other languages, or collections of symbols—like a file full of emoji—will be smaller if they’re stored as UTF-16.
If you try to open a UTF-16 file using the standard open approach, you’ll see an error:
The open function—with a few caveats—expects a UTF-8 encoding by default, so you’ll need to specify the encoding to open a UTF-16 file:
f = open("../utf16-file.txt", encoding="utf-16")
Using a named argument, you can leave the mode and buffering arguments as their defaults. Alternatively, you can supply default (or custom) values for them:
f = open("../utf16-file.txt", "r", -1, "utf-16")
Even if you’re opening a UTF-8 file, it’s a good idea to explicitly declare the encoding using this argument. Some operating systems (e.g., Windows) may use a non-UTF-8 encoding, so it’s wise not to rely on the default.
Handle Badly Formed Files Using the Errors Parameter
Although you should take care to specify an encoding, it may not always be possible. But the open function doesn’t have to fail in the case of encoding errors; you can use the errors argument to select its behavior from several options.
The default value is ‘strict’, which raises an exception, but you can ignore such errors altogether with ‘ignore’:
f = open("../utf16-file.txt", errors='ignore')
The downside is that you may now be dealing with corrupted data, without knowing it. Whether this is something you can deal with will depend on the nature of your data.
A common alternative is to replace invalid characters with a character indicating that they are missing, typically a question mark. You might have spotted this behavior on the occasional web page that hasn’t specified its encoding properly. The ‘replace’ value for the errors argument will do exactly this.
Finally, the ‘backslashreplace’ value will replace each malformed character with its equivalent as a Python backslashed escape sequence. This may help you to debug the original cause of the problem, so it could be useful for testing or as part of a dev tools suite.