How I am planning to "Marie Kondo" our Sugar instance. What's your "decluttering" strategy?

In the 90ies (yes, before some of you were born, sigh) there was a phrase people liked to use when describing the attempts at keeping data clean and relevant: "herding cats". It certainly evoked the overwhelming and chaotic nature of the task. 

A more recent expression people use, generally in the context of cleaning and organizing one's household, is "to Marie Kondo" something.
It has a much more controlled and meditative connotation and I much prefer it to dealing with cats (I'm more of a dog person).

For those new to Marie Kondo she proposed a minimalist approach that allows you to gradually and effectively tackle clutter in a home.
Her idea is to pick one area at a time like a closet or shelf, remove everything into a pile, look at each thing you removed and if that thing does not "spark joy" then get rid of it. Then calmly move to the next area, thus gradually decluttering the whole house without being overwhelmed by the task.

This coming year, after 12 years of Sugar customizations that started in Sugar Professional v6, I plan to "Marie Kondo" our Sugar instance.


It is so easy in sugar to add a field, and sometimes as Sugar admins we are too quick to please our users.

My first phase will be to look at the custom fields we added to stock modules, one module at a time, and ask my version of Marie's question:

Does that data spark information?

That's not just a question for the Sugar Admin, that's a question that involves reviewing why we added that field in the first place and how does that data provide information and to whom.


But, before I even get into that rabbit hole... I wanted to make my life easier:

If there is no data or barely any data in that field to start with, surely it cannot be informative, can it?

To determine, for each module's custom fields, what percentage of the records do not have any/much information I built myself a little API

<?php
class percentageFieldUsageApi extends SugarApi
{
  public function registerApiRest() {
    return array(
      'percentageFieldUsage' => array(
        'reqType' => 'POST',
        'path' => array('percentageFieldUsage'),
        'pathVars' => array(),
        'method' => 'percentageFieldUsage',
        'shortHelp' => 'CustomWR: Given a module_name, return the % non-usage of each custom field (i.e. percentage of rows where that field is empty or null)',
        'longHelp' => '',
      ),
    );
  }
  function percentageFieldUsage($api, $args)
  {
    $counts = array();
    $this->requireArgs($args,array('module_name'));
    $module_name = $args['module_name'];
    $bean = BeanFactory::getBean($module_name);
    $field_defs[$module_name] = $bean->getFieldDefinitions();
    foreach($field_defs as $module=>$fields){
      foreach($fields as $field=>$properties){
        $field_name = $properties['name'];
        if($properties['type']!='link'  && $properties['source'] == 'custom_fields'){
          $table = strtolower($module_name)."_cstm";
          $query = "select avg({$field_name} is null ";
          if($properties['type']!='datetime' && $properties['type']!='date'){
            $query .= " or {$field_name} = '' ";
          }
          if($properties['type']=='multienum'){
            $query .= " or {$field_name} = '^^' ";
          }
          $query .= " ) * 100 as percent_empty from {$table};";
          $params = array();
          $conn = $GLOBALS['db']->getConnection();
          $stmt = $conn->executeQuery($query, $params);
          $result = $stmt->fetchAll();
          $counts[]=array('module' => $module,
                          'field' => $field_name,
                          'field_type' => $properties['type'],
                          'percent_empty' => $result[0]['percent_empty']
                         );
        }
      }
    }
    return($counts);
  }
}

This returns an array with module name, field name, type and %unused

I initially thought of running it for all modules but concluded that more control was better so the module is mandatory (you could easily extend this to loop through module names if desired and I may well do so in the future).


What I learned reenforced two important concepts:

Empty fields definitely do not "spark joy"

Just because you can does not mean you should! (i.e: don't be shy, always ask why the want it, and how they will use it)


So from here I will:

- review each of the quasi-empty fields, confirm they can be removed, and remove them.

- For all custom fields determine if any changes were made to Sugar that could replace the custom field and if so, move that information.

- For those fields that cannot be removed and are quasi-empty: document how they are used and determine how they can be filled

I would really like to hear your decluttering strategies! :)

And as always, your critique is welcome! 

FrancescaS

Parents
  • Hi  ,

    I've done data migrations multiple times, none at Sugar, so I just wanna provide an alternative approach but doesn't mean is the best.

    You're right, data cleansing can be a vigorous and brutal exercise but there are tools out there that can be used, I myself, used (free opensource OpenRefine, formally Freebase, here's a video of it in action), you'll be able to find outliers, missing values, removing duplicates, data types and consistency as well tons of insights in your data.

    In some projects I was involved in, we had to use AWS Data Analytics "power" to ingest data at scale to find anomalies but might be an overkill for your case, but glad to share if you think it's useful Slight smile.

    My point here is: if you write code yourself to understand your data, you will get results for the code you wrote, perhaps using a tool designed to identify data issues might be the right approach.

    Utilizing analytics to approach your data is the optimal strategy in my point of view.

    Hope this helps,
    Rafa

    SugarCRM | Principal Developer Advocate

  • Thank you  , 

    I will keep your suggestions in mind.

    Our sugar instance is the culmination of 12 years of customizations that were made by me in various versions of Sugar.

    This first phase is more about "cleansing" the customizations I've made over the years than the data itself.

    As I mentioned, one of the lessons learned over time is

    "Just because you can does not mean you should! "

    I am learning to say:

Reply Children
No Data