To the FCC: Thoughts on Net Neutrality

The internet is a great equalizer. It is the foundation upon which we share all our communication and knowledge, like the printing press. Giving priority access to some over others is the equivalent of constraining access to the alphabet or language. You’re damning the less fortunate to remain illiterate.

Are you upset as well? Here’s something you can do. 

Here’s something else you can do.


    Leveling Up Your Development Skills with a Pinch of SysAdmin

     “If you’re developing properly, you shouldn’t have to worry about whether your site is running Apache, Nginx or anything else. Your code should just work.”
    - anon

    When developing a plugin or theme for use by the greater community this is certainly true. Anything you put out there for someone else, should absolutely not be dependent on the platform.

    However, if you already have a platform running, and your code is serving a purpose other than to be code for other people (a plugin or theme) to use on their sites — like if you are running a service yourself — then you want to make sure your code will run on your production server before pushing it there. That’s why I keep my servers running the same infrastructure. I have people who rely on my site running as expected.

    I have a Virtual Machine (VM) running on which I develop. VMs are great because you can control 100% what is running on it.

    Developing on a Mac is fun because it’s core is Unix-esk. Many developers will just use the PHP that is right there, or use MAMP. MAMP is a very good way to get started.

    A native setup like that works 99% percent of the time. But I found that in a few edge cases, if you’re not developing in the same environment that your production server runs on, it can make debugging complicated… and I HATE debugging my production server live. If you do that you might as well skip all other layers and code commando, straight on production.

    Another benefit of running your own VM is that you learn what goes on under the hood. Sometimes your code isn’t the only thing responsible for speed and performance. If you’re on a shared host, there’s a lot you can’t do. Once you practice on a local development environment, you might just find that you’ve built up enough gumption to run your own live server yourself.

    The lowest tier on Digital Ocean is certainly comparable in price to any shared hosting. The benefits are nice, though. Want to try out Nginx instead of Apache? Sure! Want to use fast-cgi instead of running PHP on top of Apache? Go for it? How about just a simple APC install? No arguing on the phone with customer support. Just do it!

    Doing this IS scary. The buck stops with YOU. Make sure you have proper fail safes, backups, etc. Digital Ocean has daily backups, which is nice. But if you’re hacked or you get a Reddit bump you need to handle it yourself.

    Doing this also means that you will be competent to spin up multiple environments for testing. Thus validating the quote at the beginning of the post.

    So how do I do that?

    • On my Mac I run VM Fusion. I found it far superior for this than Parallels for running a local server.
    • Digital Ocean has wonderful tutorials for spinning up servers. Try the LAMP stack. Want to run Nginx? No problem. I recommend trying several of these, a few times. The cool thing with a VM is that you can delete it and begin again, as many times as you need. Get to a good place? Take a snapshot and roll on.
    • I Deploy with git. Which basically consists of making sure your local server can SSH to your live server(s). Since most of the people who use the sites I manage, manage the content themselves I don’t have to worry about syncing databases, so I haven’t worked out a solution for that.
    • When I work with a team on a project I’ll typically have a staging server. It’s basically a clone of production, only without an easily accessible url. We coordinate with a central repository and test on the staging server. When code is ready to be shipped. It’s pushed to production.

    When I keep everything the same, I won’t have to worry about deploying. If it works locally, and it works in the staging area, I can be assured that it’ll work on production.

    I’d love to hear your thoughts. Discuss:

     


      Handling a PHP unserialize offset error… and why it happens

      I  discovered recently the importance of proper collation of database tables. I inherited a proprietary CMS to manage. The default collation was latin1_swedish_ci. Apparently it’s because “The bloke who wrote it was co-head of a Swedish company“. The problem occurred when a form we had on our site began getting submissions with foreign characters. The database collation couldn’t accept the characters and was saving them as question marks (?).

      Serialization is the process of translating data structures or object state into a format that can be stored.” For example the array:

      $returnValue = serialize(array('hello', 'world'));

      Will become:

      a:2:{i:0;s:5:"hello";i:1;s:5:"world";}

      This is what the above string means:

      • There is an array that is 2 in length. a:2.
      • The first item in the array has a key that is an integer with the value of 0. i:0.
      • The value for that item is a string that is 5 characters long, which is “hello”. s:5.
      • The second item in the array has a key that is an integer with the value of 1. i:1.
      • The value for that item is a string that is 5 characters long, which is “world”. s:5.

      An unserialize offset error can occur when the string count in the serialized data does not match the length of the string being saved. so in the above example that would look like this:

      a:2:{i:0;s:4:"hello";i:1;s:5:"world";}

      Notice the number ‘4’, while there are really 5 characters in the world ‘hello’.

      So the question is, why would the offset happen when a ? replaces a foreign character?

      To understand why, you need to dig into how UTF-8 works and things will become clear.

      The UTF-8 value of ‘?’ is ‘3f’, while the value for ‘Æ’ is ‘c3 86′. '?' translates into s:1:"?"; while 'Æ' translates into s:2:"Æ";. Notice the 2 replacing the 1 in the string length. So basically, what’s happening is that when php serializes the data it is storing the foreign character as a double the length but when it’s passed to MySQL, when the table isn’t formatted for UTF-8, the database converts the character to a ?, which is then stored as a single character. But the serialization length is not updated, so when you go and unserialize the data there is an offset error.

      How to resolve the problem

      There are several articles that provide solutions. The most popular is to use the base64_encode() function around the serialized data. This will prevent the data from getting corrupted since base64 converts the data to ASCII which any collation can take.

      //to safely serialize
      $safe_string_to_store = base64_encode(serialize($multidimensional_array));
      
      //to unserialize...
      $array_restored_from_db = unserialize(base64_decode($encoded_serialized_string));

      If you don’t have access to your database, or don’t want to fool with it, this is a great solution. You can also set your table collation to utf8_general_ci or utf8_general_ci and that should solve your problem as well (that’s what we did).

      But what if you already have bad data in your database, like we had, and you’re getting the horrid ‘Notice: unserialize() [function.unserialize]: Error at Offset’ error. When you get this notice, chances are you’re not getting all your data either…

      Here’s what you do:

      $fixed_serialized_data = preg_replace_callback ( '!s:(\d+):"(.*?)";!',
          function($match) {
              return ($match[1] == strlen($match[2])) ? $match[0] : 's:' . strlen($match[2]) . ':"' . $match[2] . '";';
          },
      $error_serialized_data );
      

      This will search out the strings, recount the length, and replace the string length with the correct value. Unfortunately it cannot recover what the original foreign character was, but at least the rest of your data will load.

      I got the original code from StackOverflow, but since PHP 5.5 the /e modifier in preg_replace() has been deprecated completely and the original preg_match statement suggested will error out. So I rewrote it with preg_replace_callback().


        Using a post-receive Git hook to mark a deployment in NewRelic

        I recently started monitoring my systems with NewRelic. Fantastic tool.

        One fun feature they provide is that you can mark in NewRelic’s dashboard when you’ve deployed new code. This way you can compare your site performance before and after the deploy.

        curl -H "x-api-key:YOUR_API_KEY_HERE" -d "deployment[app_name]=iMyFace.ly Production" -d "deployment[description]=This deployment was sent using curl" -d "deployment[changelog]=many hands make light work" -d "deployment[user]=Joe User" https://api.newrelic.com/deployments.xml

        Using Git’s post-receive hook is perfect for this, especially since I already use it to deploy my sites to the various servers.

        The only question I had was, how would I get the various variables from the post-receive hook into the curl statement?

        Well, here you go:

        description=$(git log -1 --pretty=format:%s)
        author=$(git log -1 --pretty=format:%cn)
        revision=$(git log -1 --pretty=format:%T)

        Now you can do this:

        curl -H "x-api-key:YOUR_API_KEY_HERE" -d "deployment[app_name]=iMyFace.ly Production" -d "deployment[description]=$description" -d "deployment[user]=$author" -d"deployment[revision]=$revision" https://api.newrelic.com/deployments.xml

          What I Want to Hear About this Tuesday at the State of the Union Address

          I received an email from BarackObama.com asking me to fill out a one question survey.

          The survey question was:

          What issue are you most excited to hear about in the State of the Union?

          This was my response:

          The biggest issue that lost my enthusiasm in the leadership of our president is how much the NSA has been sabotaging the security of the internet.

          I understand that the President worries about our safety, and that the NSA is telling him that they are making things safer.

          Frankly, I don’t believe that it is making us safer, it’s eroding the clear leadership that the US has taken in moving the world forward technologically, and is threatening jobs by undermining the integrity of US tech companies. It upset me greatly that the President focused mainly on phone record meta, who uses the phone these days?


            Introducing Assets Manager for WordPress

            Note: if the links aren’t working properly, resave the pretty permalinks settings.

            Download

            Many of the companies which my current place of employment interacts with have a higher level of security on their firewall (they also tend to use IE7, such is life). Because of this we were having issues sharing files with our constituents using the current industry file sharing tools.

            To solve this problem I was tasked with creating a custom version of the corporate file sharing webapps for internal use. This would solve the problems we were having. All the links would be hosted on our domain, so we wouldn’t have to worry about getting third parties’ domains whitelisted in other company’s firewalls.

            I decided that WordPress would be the best tool to build this on. It already has wonderful custom post management abilities as well as built-in media management tools.

            I’m proud of what I built, so I got permission to release it to the WordPress community as a white-labeled plugin. Special thanks to @binmind for his extensive QA testing of the company’s plugin, his testing was crucial for development of the proof of concept and making sure everything was working as it should.

            Instead of releasing the plugin as-is,  I decided to rebuild it from scratch. I’ve learnt a lot since building the original assets manager  and wanted to harden up the code base before releasing it to the public. Here are the results of my efforts.

            Features

            features

            Path Obfuscation:

            When a file is uploaded to WordPress you usually access it by linking directly to the location of where the file is hosted on the server. Assets Manager creates a unique obfuscated link for the file instead. When a file is downloaded it will receive the name you supply.

            This does two things:

            1. You can’t figure out where the file is actually hosted, nor can you find other files based on some pattern. This is a security feature. Since the links to the files do not indicate anything about where the files are, or what they will be called when downloaded, you can’t guess where other files are stored.
            2. Files are never linked to, they are read and served. This allows #1 to work. It also means that before the file is served, Assets Manager can check various things, like if the user is logged in or if the file has “expired”.

            When should this file expire?

            Because of #2 above, Assets Manager intercepts files before they are served to the user from the server. This means that you can decide when and how the file will be served. I’ve included the ability to set how long the file should last. If you see you’re running out of time, you can extend the expiration by as long as you wish. The expiration date of the file is displayed next to the expiration feature letting you know when the file will expire.

            Enable this file?

            Same as the above feature. If you send out the wrong link, you can easily edit the settings and uncheck “Enabled”.

            Secure this file?

            I can also  check to see if a user is logged in before serving them the file. It doesn’t actually make the file secure. If someone downloads it, they can send it anywhere. It only secures the link to the file.

            Remove file

            When a file is removed it is not deleted, it can still be found in the media library. It is just detached from that assets set. You can delete it via the media library if you wish.

            Stats

            A basic hit count is recorded per file.

            Asset Set

            Each asset set is a custom post type, the upload files are attached to this post. The URL for the asset set is obfuscated to protect it’s location. If it is linked to it will be indexed though. But bots can find it crawling the site.

            You can upload a set of files, then only share the one link. That way if you decide to change the links around you can. Only available files will be listed there. So if a file is “secure” and the user isn’t logged in, they won’t see it, nor will anyone see expired and disabled files.

            Future features I’m working on:

            • Sha1: If you upload a file that already exists it will link that file to your post instead of keeping multiple versions of the file. I believe that WordPress should work this way in general, all filesystems for that matter. That’s a benefit of networks. Why keep doubles, unless you intentionally are backing up the information?
            • File replacement: After uploading and even sharing a file you’ll be able to replace the file behind the active link with a file of the same MIME type. This way if you make a typo you can fix it quickly and replace the file without sending out a new link.
            What do you think?
             If you have ideas, discover bugs, let me know.

              Code Is Poetry

              codeispoetry

              At the bottom of every page of wordpress.org is the above statement, and it’s not just an empty phrase.

              I learned what I know from digging into WordPress. It started by my breaking the site I was supposed to be managing, sorry Karin. Many books, themes, plugins and years later I seem to be able to manage most any PHP site quite proficiently.

              No matter what I’m working on, I try to keep the above in mind. “Code Is Poetry.” If I can make a method more elegant, concise, I go for it.

              Having influenced me so much, I decided to put WordPress to a test. See if the good people at WordPress hold to their own mantra.

              To do so I installed the top CMS platforms on a local environment so I could compare their codebases and database structures with each other. I wasn’t very scientific about what is considered a “top” CMS. I pretty much Googled and made a list of the top few that came up the most. I have not run any performance tests, I may do that for another post. This post is just about structure of code and database. “Code is Poetry” right? Here are my results.

              cms file search

              File count (CMS’ in alphabetical order)
              Concrete5: 4006 files
              Drupal: 1065 files
              Joomla: 5083 files
              WordPress: 1062 files

              cms folder search

               Folder count
              Concrete5: 765
              Drupal: 136
              Joomla: 1233
              WordPress: 112

              Top level folders
              Concrete5: 20
              Drupal: 7
              Joomla: 17
              WordPress: 3

              Why This is Important

              A codebase to a developer is a lot like moving parts in electronics. There more there is, the more that can break. Less doesn’t necessarily mean better, a space shuttle is clearly better than a 747 and has far more moving parts. But to continue the analogy, a SSD is far superior to a HDD.

              Drupal and WordPress are neck and neck in numbers, though, WordPress is ahead by a hair ahead, except for the top level folder stat.

              The top level folder stat is important. WordPress wins hands-down here. Aside from having strong OCD tendencies, it’s important because it’s an indication of the overall clarity of structure of the codebase, which has clear ramifications. Try upgrading WordPress, one click. Try upgrading Drupal… HA!

              The WordPress codebase is structured beautifully with clear delineation between wp-includes, wp-admin, wp-content. It’s clear what is where, and what is what. You do not have to read through their documentation to see clearly where the core sits, and where you can mess around. You cannot say this about the other CMS platforms.

              cms folder breakdown

              Now for the Databases: Table count
              Concrete5: 172
              Drupal: 72
              Joomla: 68
              WordPress: 11

              For more about the elegance of WordPress’ database read: How WordPress Works: Dissecting the Database.

              In conclusion, I don’t want, ever again, to hear about how bloated WordPress is.


                How WordPress Works: Dissecting the Database

                The WordPress Database

                There is beauty in the simplicity of WordPress’ database structure. All the functionality of posts, pages, custom posts, taxonomy, users and core settings are here. In 11 tables.

                For comparison, the almighty Drupal has 72 tables, Joomla has 68.

                All posts, pages and custom posts are saved in the `wp_posts` table. They are differentiated by the `post_type` column. Any additional data you need to save with your post (whatever the post_type is) can be stored in `wp_postmeta`.

                Metas are extremely powerful. You can extend everything in pretty much any way with them.

                Example: Your site manages the courses of an educational institute. So you create the post_types of ‘Course’ and ‘Lecturer’. Now you can save in the `post_content` all about the ‘Course’ and ‘Lecturer’, but what if you need to store extra information about each, that you’ll need to access easily. For a course you might want to know the dates the course is taking place. If you save that in the ‘post_content’, as part of the other descriptive content, you will not be able to run queries easily on that information, you can’t sort it, pull it out for widgets etc. That’s where meta comes in.

                wp_postmeta table

                Each of the meta tables, postmeta, commentmeta and usermeta each have 4 columns: meta_id, post_id (or the equivalent), meta_key, and meta_value. Each post can have whatever extra meta you need, and it can be pulled out with a simple SELECT WHERE meta_key = ‘X’; command.

                And that’s pretty much it. All of WordPress’s functionality is there. Comments, users, and posts all have their basic structure in their main table and all can be extended as much as needed through their meta.

                Taxonomy is somewhat more complicated. It requires 3 tables. wp_term_taxonomy stores the types of taxonomies. Categories, Tags, and any other custom taxonomy type you create will be here. The individual terms will be in wp_terms. So if you have 3 categories and 15 tags in your site, each of those will be stored in wp_terms. wp_term_relationships links them all together keeping it all in order. Easy-peasy, right?

                The basic options of the WordPress install are in wp_options. The only table out of order is wp_links, a relic of installs past. Today all the link functionality can easily be incorporated as a custom_post_type. But because WordPress cares about backwards compatibility, the table remains.

                That’s it. Lean and mean.

                One question that comes up about meta is, doesn’t that mean that there are a lot of extra queries hitting the database? This would be true, if not for the caching system of WordPress. So each time you call get_post_meta() you’re not hitting the database. So you’re good.

                So when people say that WordPress is “bloated” I’m not quite sure what they’re talking about.


                  WordPress Proposal: “Deep” Linking Taxonomies to Custom Posts

                  EDIT: A very awesome plugin that does this and much more, exists. Go check out Piklist.

                  Scenario

                  You are building a site for an educational institute. There are several requirements:

                  • Speakers – These are the people giving the courses. There could be different speakers for the same course, if there are too many students for one course, or on different years.
                  • Courses – Each course could be unique, or it could be the same required course that every student needs to take to get through.
                  • Dates – The duration. If you’re dealing with conferences, it could be a single date. If it’s a course, it may be a time-frame.

                  Each of these could and should be a custom post type. And each would have its own custom taxonomy. Speakers should have a Department taxonomy. Courses should as well. Dates should have a Semester taxonomy.

                  Here’s where things get interesting. What if a Speaker had a taxonomy of Course, so all the lecturers of a specific Course could link themselves to that Course? Wouldn’t it make sense for both Courses and Dates to have the Semester taxonomy?

                  Proposed solution

                  In addition to linking taxonomies to all other posts with that taxonomy, there is adding the ability to link a taxonomy to a specific custom post as well. This is similar to descriptions for categories, however, taxonomies do not have meta. Posts do.

                  This way, when you visit this educational institute’s site and you’re looking at a course, but you’d like to see more about the speaker, you can click one taxonomy link and see all other courses tagged with the speaker, or you can click straight through to the post about that speaker.

                  The opposite linking works just as well. You’re looking at a speaker and would like to learn more about a course they teach. The course is already a taxonomy, so you could click and see all the other Speakers who are tagged with this course, i.e. all the Speakers who teach this. Or you could click through to the course itself.

                  Obviously this can be done already. Just not automatically, or easily.

                  How

                  If this were build as a plugin I would create a look-up table linking the taxonomy ID to a post ID. If it were to be incorporated into the core, I would extend `wp_term_taxonomy` with another column that would associate the taxonomy term with the specific custom post ID. A link could be generated with a function like `get_term_post_link()`.

                  I think I’ll go ahead and write this plugin now…

                  EDIT: It exists!


                    How I Optimized My LAMP Server

                    I recently switched servers for this site. I moved from Media Temple to Digital Ocean. Think of Digital Ocean as AWS but faster, cheaper, and with great UX. I’ve been meaning to move there for a while, ever since I figured out how to manage my own LAMP stack.

                    One benefit of Digital Ocean is their fantastic documentation. So there isn’t much to figure out… But for someone who came from Front-end Development, it’s a bit intimidating to manage your own server. To tell you the truth, I’ve tried this move a few times, but the last time I set up a stack for this site I used SUSE Linux (I don’t know what I was thinking), and the site kept crashing.

                    Since then I’ve played with VMWare and got comfortable with setting up my own development server, and moved to CentOS.

                    The missing link was optimizing Apache.

                    I’m a big fan of This Week in Startups and one of their sponsors is New Relic. If they say something is worth trying, I try it.

                    After switching to Digital Ocean I set up New Relic on the new site. Even though I had installed W3 Total Cache on my install, New Relic was still giving me error warnings every 10-15 minutes. Frustrating! True, I AM running a WordPress multisite on the lowest tier, but none of the sites are high traffic. I should be able to do that.

                    Well, after digging into New Relics errors I saw that I was using 100% of my my physical memory and 200% of my swap memory. BAD.

                    Then I found Jean-Sebastien Morisset’s check_httpd_limits.pl. WOW.

                    I updated my httpd.conf with his recommendations and look at the results:

                    Physical Memory - New Relic DashboardYou can clearly see when the new settings took effect.

                    Here’s the site’s load average:

                    Load Average - New Relic Dashboard

                    Best part is, since these settings took effect, NO MORE ERROR WARNINGS FROM NEW RELIC!!!

                    So, if you read this Jean-Sebastien, thanks for your wonderful tool! And New Relic, thank YOU for your excellent monitoring that pushed me to do this!