Strings and Bytes ================= There are two ways to work with text. 1. Let :mod:`ctypes` implicitly convert your :class:`str` or :class:`bytes` to a |null terminated| char or wide char array before passing it a C function which takes :c:`char *` or :c:`wchar_t *` arguments. This is slower (due to the conversion - although :ref:`this can be cached`) but more straight forward. 2. Treat your text as an array, passing raw |pointers| to C. This is harder but much more efficient. This page focuses on option :math:`1`. For option :math:`2` see :ref:`Buffers and Arrays`. Reading strings --------------- Passing strings from Python is easy. We demonstrate it with an equivalent to Python's :meth:`str.count`, with the simplification that the sub-string we are counting is only one character. If you want to work with :class:`bytes` instead, simply replace :c:`wchar_t` with :c:`char`. .. literalinclude:: ../demos/strings/strings-demo.c :language: C :end-before: // --- :caption: strings-demo.c Or a more C savvy person may prefer the equivalent but punchier code: .. literalinclude:: ../demos/strings/strings-demo.c :language: C :start-after: // --- The Python end is very no-nonsense. Let's compile the above:: from cslug import CSlug slug = CSlug("strings-demo.c") And run it:: >>> slug.dll.count("hello", "l") 2 Yay, it works! Now that we've got it going, let's talk about the code. Notice the types of the inputs :c:`count()`\ : :c:`wchar_t *` and :c:`wchar_t`. :c:`wchar_t *` accepts a :class:`str` of arbitrary length but :c:`wchar_t` accepts only a single character :class:`str`. We could have used pointers for both arguments, but using just :c:`wchar_t` adds an implicit check that our single character argument is indeed singular:: >>> slug.dll.count("This will break", "will") ctypes.ArgumentError: argument 2: : wrong type You may also notice that we've avoided having to specify the string's length anywhere. Instead we just use :c:`text[i] != 0;` to tell us when to stop the for loop. Here we are taking advantage of the fact that Python strings are |null terminated|, so to find the end of a string we simply need to find the *NULL* (integer 0) at the end. There is a catch to doing this though. If our string contained nulls in it then this function would exit prematurely. By default, Python won't allow us to make this mistake:: >>> slug.dll.count("This sentence \x00 contains \x00 Nulls.", "a") ctypes.ArgumentError: argument 1: : embedded null character However if we force our way through... >>> import ctypes >>> slug.dll.count(ctypes.create_unicode_buffer("One z \x00 lots of zzzzzzzz"), "z") 1 If your string is likely to contains NULLs then pass the string length as a separate parameter and use that to define your :c:`for` loops. Caching the conversion overhead ............................... When you pass a :class:`str` or :class:`bytes` to C you implicitly call :func:`ctypes.create_unicode_buffer` or :func:`ctypes.create_string_buffer`, performing a conversion or copy, before passing the result to C. If you pass the same string to C multiple times then this conversion is repeated redundantly. To avoid this, do the conversion yourself. i.e. This performs a conversion twice:: a = "Imagine that this string is a lot longer than it actually is." slug.dll.count(a, "x") slug.dll.count(a, "y") Whereas this performs only one conversion:: a = "Imagine that this string is a lot longer than it actually is." a_buffer = ctypes.create_unicode_buffer(a) slug.dll.count(a_buffer, "x") slug.dll.count(a_buffer, "y") Writing to strings ------------------ Writing to strings inplace or to new strings is possible but not so streamlined. 1. In order to avoid the cacophony of memory issues that is creating and sharing buffers in C, strings should only be created in Python. To write to a string in C, create an empty one of the right length then give it to C to populate. This unfortunately means that you must know how long your string will be before you write it. 2. As we've seen above, strings are converted to :mod:`ctypes` character arrays when passed to a C function. Writing to the converted one does not update the original and the converted array is discarded immediately after the function is complete, losing any changes the function made. To avoid this we must must do the conversion explicitly. We'll show these in our next example: A C function which outputs the reverse of a :class:`str`: .. literalinclude:: ../demos/strings/reverse.c :language: C :caption: reverse.c Notice that the output string is an argument rather than a :c:`return` value. This is in accordance with complication :math:`1` above. Let's compile the C code: .. literalinclude:: ../demos/strings/reverse.py :start-at: import ctypes :end-at: slug = And give ourselves something to reverse: .. literalinclude:: ../demos/strings/reverse.py :start-at: in_ = :end-at: in_ = Before using our C function, we need to make it an output to populate. Because of complication :math:`2`, this must be a :class:`ctypes.Array` instead of a generic Python :class:`str`. (Try giving it a Python :class:`str` anyway to see what happens). .. literalinclude:: ../demos/strings/reverse.py :start-at: out = :end-at: slug.dll.reverse :: >>> out.value '.gnirts siht esreveR' >>> out.value == in_[::-1] True Whenever you write a C function which requires weird handling in Python you should write a wrapper function to keep the weirdness out the way. .. literalinclude:: ../demos/strings/reverse.py :pyobject: reverse :: >>> reverse(".esu ot reisae hcum si noitcnuf sihT") 'This function is much easier to use.' Null terminated or not null terminated? --------------------------------------- .. highlight:: C In C, strings are automatically null terminated if you define them with:: char string[] = "literal"; or for unicode strings:: wchar_t string[] = L"literal"; If you specify the length of the string then any *spare* characters are nulls:: char string[4] = "hello"; // Array too short to fit "hello", truncated to "hell" with a build warning. char string[5] = "hello"; // Not null terminated. char string[6] = "hello"; // Null terminated. char string[7] = "hello"; // Double null terminated. .. highlight:: python3 Similarly in :mod:`ctypes`, both :func:`~ctypes.create_string_buffer` and :func:`~ctypes.create_unicode_buffer` append a null if the length is unspecified:: >>> ctypes.create_unicode_buffer("hello")[:] 'hello\x00' >>> ctypes.create_unicode_buffer("hello\x00")[:] 'hello\x00\x00' And set any *spare* characters to ``'\x00'`` if the length is specified:: >>> ctypes.create_unicode_buffer("hello", 4)[:] ValueError: string too long >>> ctypes.create_unicode_buffer("hello", 5)[:] 'hello' >>> ctypes.create_unicode_buffer("hello", 6)[:] 'hello\x00' >>> ctypes.create_unicode_buffer("hello", 7)[:] 'hello\x00\x00' In any other case you should assume that they aren't unless the documentation for a particular function you are using says it writes null-terminated strings. .. note:: The ``'\x00'`` character is an escape sequence (just like ``'\n'``), not a literal backslash followed by an x and two zeros. >>> len('\x00') 1 >>> print('invisible \x00 character') invisible character In the unlikely event that you want to type it literally, use a double backslash or raw string:: >>> print('\\x00') \x00 >>> print(r'\x00') \x00 >>> len('\\x00') 4