PGConf.EU: The event 

PostgreSQL, often Postgres, is a powerful, open-source object-relational database management system (RDBMS) emphasizing extensibility and standards compliance. It was developed at the University of California, Berkeley, starting in the late 1980s and has since evolved into a highly versatile and reliable database solution that supports a wide range of data types, including text, images, and video, making it a popular choice for developers and companies around the world. 

PGConf.EU (PostgreSQL Conference Europe) is the premier European conference for the PostgreSQL community, offering an in-depth exploration of the world’s most advanced open-source relational database. This annual event is a vital forum for PostgreSQL users, developers, and enthusiasts to converge, sharing insights, experiences, and innovations. 

The conference spans several days, focusing on several topics: 

Industry expertise: In this event, leading figures in the PostgreSQL world have keynotes and presentations by experienced professionals that provide valuable insights into the latest advancements, best practices, and the future of PostgreSQL. 

Knowledge sharing: The conference offers diverse sessions catering to various experience levels. From beginner-friendly tutorials to advanced talks on specific functionalities and integrations, participants gain a comprehensive understanding of PostgreSQL’s capabilities. 

Interactive environment: The event extends beyond the sessions and training. The attendees actively engage with experts, tackling real-world scenarios through discussions and seeking guidance. 

Networking hub: PGConf.EU fosters a dynamic environment where attendees can connect with peers, exchange ideas, and build valuable relationships. Dedicated networking sessions, social gatherings, and an exhibition hall brimming with PostgreSQL vendors and service providers create ample opportunities for collaboration and exploring potential business partnerships. 

Future of PostgreSQL: The conference also delves into discussions about the future of PostgreSQL, exploring upcoming trends, potential challenges, and the evolving landscape of database technology. 

This event serves as a catalyst for professional growth and innovation in the European PostgreSQL community and our DBA role here at La Redoute. 

 

4 Most interesting sessions 

Next, we describe the sessions we really liked that stood out among the many sessions, either because we immediately learned something to apply or because it challenges our status quo in an unexpected way. 

Figure 1: Our DBA team members André Fonseca and Diogo Passadouro.

 

#1: Multi-threaded PostgreSQL? 

PostgreSQL currently works on a process-based architecture. This means each database connection runs as a separate process managed by the operating system. While the OS distributes these processes across available CPUs for efficiency, this session discussed potentially switching to a multi-threaded architecture. 

This session was important not just because it shed light on PostgreSQL’s technical evolution toward embracing multi-threading but also because it significantly highlighted the PostgreSQL community’s openness and collaborative spirit toward adapting and evolving the database system in alignment with modern computing paradigms. 

#2: Performance tricks you have never seen before 2.0 

This session focused on lesser-known techniques that could improve the performance of PostgreSQL databases. Some of these techniques are simple and quick, but they could really make a difference in our team’s day-to-day activities.

Preloading_libraries 

When a connection is created, it loads the libraries defined in the parameter “session_preload_libraries”. This could make the first call slower since it needs to load the library into memory, but these libraries could be loaded when the server starts using the parameter “shared_preload_libraries”, however, it will use more memory. 

In conclusion, initializing libraries comes with a cost, but preloading helps. 

Storing data 

There are several parameters that could affect how many Write-Ahead Logging (WAL) are created, therefore impacting the disk activity. WAL is a standard method for ensuring data integrity.

The parameter “wal_level” determines how much information is written to the WAL file. The configuration with the fewest writes to WAL files is “minimal”, which only writes the information required to recover from crash or immediate shutdown. 

The configuration with the most writes to WAL files is “logical”, which adds information necessary to support logical decoding. 

One scenario in which this “wal_level” configuration greatly improves the performance (with some added risk) is when loading or updating big tables. We came across this when we refreshed a test environment database with the production data and applied the anonymization process. This process updates customer data (names, addresses, etc…). Behind the scenes, it updates huge tables. Setting this “wal_level” to minimal reduced the anonymization process duration by half. In this situation, the risk isn’t a major factor since it is a test environment. After this, we set the “wal_level” back to the original value. 

It was very interesting to see and explain certain situations that can influence performance that normally go unnoticed. 

 

#3: It’s Not You, It’s Me: Breaking Up with Massive Tables via Partitioning 

Sometimes the database size can reach significantly large volumes, up to many terabytes. Looking inside these databases there are tables with a lot of information, which translates into slower queries and more time-consuming maintenance tasks. 

There are mechanisms to reduce query execution time, such as creating indexes, but there comes a point where, no matter how optimized the indexes are, the queries will be slower, and the maintenance tasks will take longer to finish. 

One solution is to partition big tables. This means that a table is broken up into several small tables within the same host. It’s different from sharding because sharding splits the data across several hosts. 

With partitioned tables, and if you query the table using a partition key, you can significantly reduce the query execution time. In terms of maintenance tasks, instead of executing on a single table, it executes on each partitioned table individually. 

A major benefit is that this is completely transparent to the client. Data access is done in the same way, so there’s no need to change the data access queries. The only thing to be careful of is partitioning the table with the partition key used in the queries. If the query doesn’t have a condition to filter the data by partition key, the query is executed on all partitions, removing the benefit of using partitions. 

An example of a partition key is a date range. Let’s imagine a table with customer data. We can create partitions by the year of the customer’s date of birth per partition. Customers born in 1990 will be in one partition, customers born in 1995 will be in another, and so on. 

 

#4: Why using Open Source PostgreSQL matters 

Open source refers to a software development model in which the source code is made available to the public, allowing anyone to consult, modify, and redistribute it. 

Although this topic focuses on PostgreSQL, it applies to any Open Source software. 

These are the most important topics in Open Source. 

Transparency: Since the source code is available, anyone can see how the software works step by step, which increases the trust and security of the code. 

Collaboration: Anyone can contribute to software development by adding features and fixing problems. This way, the code is not locked down to one person, group, or company. In addition, people with different skills can contribute to developing and improving PostgreSQL. 

Security: Security experts can audit and analyze the code to find vulnerabilities and even propose fixes. 

Cost: PostgreSQL is entirely free. There is no need to buy licenses, pay a subscription, or any other software cost for the PostgreSQL software. The associated cost will be the server itself. 

There is another advantage to Open Source. A company that owns the source code for a specific database engine could not exist in 10, 50 or 100 years. This means that if there is a database using that software there’s a problem and something needs to be fixed, it’s impossible because there’s no access to the source code. However, if the code is completely open there’s always the possibility of fixing it, compiling it again, changing the code, understanding how it was done, etc… In the case of databases, if you have a database from a closed-source database engine, in the future, you may not even be able to access the data in the database due to incompatibilities in the code and the system architectures. In the case of open source, if there is an incompatibility, there is the possibility of correcting it and compiling it, accessing the old database again without losing the database and the data inside of it. 

Future of PostgreSQL 

PostgreSQL has grown a lot in recent decades. The fact that it is Open Source and free for anyone anywhere in the world to try has been a very important point in this growth. Focusing on the free part, it has been a very significant point for people and companies. Some other solutions on the market are quite expensive in terms of licenses. PostgreSQL has no license fee, which means cost savings, especially when you have dozens, hundreds or even thousands of databases. Focusing on the Open Source, the software is not tied to a person, group, or company, as mentioned before. Anyone can change, compile, and run the code, even far into the future. 

Another attractive point is that anyone can develop extensions for PostgreSQL for specific use cases that out-of-the-box PostgreSQL software might not provide. Anyone can develop an extension without having to change the PostgreSQL code base. There has been a lot of investment not only by the community but also by companies like Microsoft to create more and better extensions for PostgreSQL. This possibility of creating extensions is only limited by the imagination. 

Artificial Intelligence is going to be a major focus over the next few years. Using PostgreSQL and pgvector, it is possible to build AI applications that go beyond simple keyword matching. Pgvector is a PostgreSQL extension that unlocks PostgreSQL’s potential in the world of AI. By combining PostgreSQL and pgvector it is possible to create ChatBots, personalized search systems, recommendation systems, etc… In the talk “How I found my Pokémon cards thanks to Postgres: an AI journey (Youtube session), the speaker showed how PostgreSQL can work with AI and how PostgreSQL and pgvector can work to identify images and capture image details quickly and efficiently. 

 

Conclusion 

Having taken part in an event like this (this year, there was a record attendance of 720 participants at this event) where people from both personal and corporate backgrounds attended, from DBAs to DevOps and developers, focusing on a variety of topics, from security, performance and even AI, made it clear that PostgreSQL is growing at a fast pace and that it is no longer just a system for storing data, but much more than that. 

Here at La Redoute, we learned a lot from them in these few days. We got new ideas on how to improve our PostgreSQL databases and see the direction of PostgreSQL in the coming years. And we improved our knowledge of PostgreSQL database administration, equipping ourselves with improved skills to tackle complex challenges effectively. This contributed to working more effectively on the team’s daily tasks and all company projects, thus enhancing our efficiency and value for both the company and at a personal level. 

Authors