Sunsetting PHP Faker
I'm sunsetting fzaninotto/Faker, a popular PHP library for generating Fake data. Let me explain why, and what will happen for PHP developers using it.
I started developing Faker in October 2011 because I needed to populate a database with fake data for one of my projects. I was very surprised by how many people used it and excited to become better known in the PHP community thanks to my contributions to this project.
Using the works of Flaubert and Lewis Carroll to fill web interfaces was very satisfying. Being able to build more realistic prototypes thanks to large amounts of good fake data also changed the way I build projects.
<?php // use the factory to create a Faker\Generator instance $faker = Faker\Factory::create(); // generate data by accessing properties echo $faker->name; // 'Lucy Cechtelar'; echo $faker->address; // "426 Jordy Lodge // Cartwrightshire, SC 88120-6700" echo $faker->text; // Dolores sit sint laboriosam dolorem culpa et autem. Beatae nam sunt fugit // et sit et mollitia sed. // Fuga deserunt tempora facere magni omnis. Omnis quia temporibus laudantium // sit minima sint.
Because it embarks data for 70+ languages, including entire novels, Faker is a heavy library - more than 3MB.
Developers like to use Faker for automated testing, so their CI server downloads it whenever someone opens a PR. As a consequence, Faker is used a lot.
According to packagist, Faker has been downloaded 121M times. 121M x 3.3 MB = way too many Bytes.
I'm super concerned about the responsibility that we developers have on climate change. I have estimated the carbon footprint of Faker using GreenFrame.io. Over the years, Faker has probably emitted more than 11 Metric tons of CO2 equivalent. Being such a significant contributor to climate change kills me.
Most people only use 1 locale, so they only need a fraction of the library size. Yet, because it's designed as a multilingual library, there is no alternative: they have to download the 3MB even if they only need 10KB.
Faker should have been a small core library with no localized content. Then, developers from all over the world would have contributed localized packages to packagist, without requiring my supervision for merges. And users would just have to download the core and the locale they need.
Another core design problem is seeding. To allow reproducible builds, the Faker generator accepts a seed. With the same seed, Faker will always generate the same fake data. Except... this is only valid if the corpus of fake data to choose from never varies. This forbids any significant change to an existing provider - like removal of outdated data or duplicate content. To keep seeding, Faker must avoid fixing its data.
But unfortunately, it's too late now - Faker needs a full rewrite to overcome these problems.
Most of the PRs I received on Faker created or improved localized content. Tons of fake names, addresses, phone numbers, etc. Most of these PRs used copyrighted content, copied from a public website. Faker is MIT and cannot contain copyrighted content. Also, these PRs were in a language I don't speak, so I had no way of knowing whether the data was good or bad. I ended up closing many PRs for copyright reasons and blindingly merging the others.
Check this random PR for a glimpse of the misery that I lived as a maintainer.
A computer scientist contacted me a few years ago about a study he was working on. He analyzed thousands of open-source repositories and found that Faker had the lowest bus factor of all projects. That meant that Faker had a huge number of contributors from all over the world, and the maintainer (me) only committed a few changes a year - apart from merge commits, of course. To him, it was the sign that I was super good at delegating maintenance tasks. To me, it was the sign that I was super lame at putting my own touch on the project.
Also, I didn't use PHP in my job anymore. In fact, I haven't written a line of PHP in 5 years. Maintaining a library in a language I am not good at isn't fun. And it's not good for the library itself.
A year ago, I realized that I needed someone to
take the curse from me hand over the Faker maintenance.
This has already happened to me in the past. I was the lead developer of Propel, an open-source ORM for PHP. When I realized I didn't have time to maintain it anymore, I handed it over to a developer that I knew personnally, who did a great job maintaining it for about a year.
Then he handed it over to another developer. This second developer thought that the codebase was too bad and required a full rewrite. I insisted that a good library was a library with frequent releases, but I was no longer a maintainer. It was 6 years ago. The full rewrite is still ongoing, and the 2.0 version never came out.
To me, the handover was a failure. It's always heartbreaking to see something you've worked on so hard being spoiled by someone else.
As for Faker, I invited 2 new maintainers last year, who started modernizing the codebase. But as I explained above, maintaining Faker isn't fun. And also, we had different views on what's important for Faker, and this didn't help. As a result, they didn't contribute very much in the past year.
In other words, I suck at handing open-source projects over.
Finally, last week, the lack of activity in the project triggered a conversation on GitHub that lead to an offer to move Faker to another organization with new maintainers.
Which I saw as a hostile takeover - an offer to lose the reputation of a 25,000 stars project, and to hand over the project to developers who have never contributed significantly in the past. Not the kind of offer I'm willing to accept.
But after all, why does Faker need to evolve? If it's been downloaded 121M times, it's probably because it's good enough. I mean, hundreds of thousands of Faker users can't be all wrong at the same time, right?
Maybe there is a way to step aside from the world of ever-changing software. Maybe there is a way to sit and enjoy watching the baby grow by itself.
Just kidding. PHP8 will make the Faker codebase obsolete, and PHP developers want types. Leaving Faker as is will only make the problem worse.
So the only decision I can make is to retire
That means I won't be accepting new PRs, I won't merge existing ones, I won't make new releases, and I won't take new maintainers. I'll disable notifications from the Faker repository altogether, and delete the archive of all the Faker-related messages that fill my inbox - another good move for the planet.
I understand this can be sad for all the contributors who donated their time and work to Faker, and I'm sorry for that. But I think it's in the PHP community's interest that I step aside.
I don't want to break the web, so of course, Faker will still remain available on Packagist, and will continue to work as long as you don't upgrade your PHP version.
Faker fills an important need, so I am confident that someone else will soon publish a new library for generating Fake data. I'm even surprised nobody did it before.
It will be much better than Faker, and will evolve more quickly. If you're interested, you can start by forking Faker. You don't need my authorization - that's the beauty of open-source! Just be aware that you'll probably regret it in a few years ;)
So, do you want to have a big impact on the future of PHP development?
Long live open-source!
As I said earlier in this article, sunsetting Faker doesn't mean I won't contribute to open-source projects. If you liked Faker, you'll probably love the other project I'm currently working on: react-admin. Check out these demo apps that were built using react-admin: