28 January 2015

API First for Administrators

A couple of Dreamforces ago, when Salesforce launched the mobile Salesforce 1 platform, Marc Benioff discussed the idea of 'API first' at his keynote. API stands for Application Programming Interface and is the foundation for any application to interact with data on a platform. It's a blinking cursor on the screen, waiting for action.

As I looked across the Dreamforce keynote room, I saw sporadic nods and acknowledgments. People appeared to get the idea that end-user applications like Salesforce 1 could be built on top of the API. But for me, it was a profound transformation because it was a very different way of thinking about administration.

API first is way of designing and building features on a platform. It means that before I design a great user experience, I'll prioritize and design the underlying API that the user experience will be built upon. This gives an amazing amount of flexibility; not only can I build user experiences on top of the API, but so can ISVs (Independent Software Vendors who market their applications on the Salesforce platform through the AppExchange) as well as integration specialists. The best part is no data is left behind. Instead of areas of our application being land-locked by a user interface only experience, like View Setup Audit Trail, we'll start with the assumption that data can be exposed through the API and build from there.

Before API first, I tended to believe exclusively in the power of button-click administration. If there wasn't a user experience for me to click, it wasn't worth doing. It was not only easier as an administrator to solve problems with a great user interface but it made ongoing maintenance easier with clicks, not code. A good example of this idea was workflow which can be created without laying down a single line of Apex code. This is still true, even in a world where API is first.

What I realized was that API first didn't mean I had to give up my administrative superpowers to an insurmountable learning curve that is writing "code". It meant that I had more tools in my toolbelt for solving problems, not as an administrator or as a developer, but simply as a person with a problem to solve.

API first also disrupts the way I design and build features for administrators. An example of API first in the administrative world are profiles and permission sets, which are fundamental to managing user entitlements. These setup objects contain a wealth of information about what a user is entitled to do within the context of their organization. But reporting, assignment, and management of these permission containers tends to be challenging.

From a button click perspective, it seems like a solvable problem. I even tried a couple of times, with varying levels of success building the Enhanced Profile List View and Enhanced Profile User Interface. And while the user experience is arguably better, I found it to be a really tough problem to solve from a usability perspective.

The challenge comes from the scale and relationships that define a user's responsibilities within an organization. A single profile or permission set may have millions of permission values. A user may have one profile but multiple permission sets, adding an order of magnitude of possible permission values for any user. And while we often understand a profile or permission set as a black box with a name like "Sales Rep", at a granular level, these permission containers can contain millions of different user rights that our reps need to do their job. Effectively, it's a Gordian Knot problem. And like Alexander the Great's sword that sliced the famous knot in half, as an administrator, my sword is the API.

Profile and Permission Set Schema

As a result of adding more API support for these objects, I've seen incredible administrative tools created in the community like the Permcomparator and the Permissioner. These tools address specific problems like profile comparison and permission set assignment. And even the permission set assignment tool within setup is built using the same APIs that Arkus used to build the Permissioner.

I faced a similar problem with event monitoring. Event monitoring provides an easy to use, downloadable file based API for extracting the same application logs that Salesforce uses to support, audit, and optimize it's own service. While an organization may have hundreds of thousands or even millions of CRM records, that scale is dwarfed by the sheer number of log events that users generate every day. One organization I've worked with generates approximately twenty million events per day or three billion events every six months.

Finding ways of integrating, processing, and visualizing this data is critical. "API first" now becomes "API only" due to the sheer size and bandwidth that this data consumes.

Event monitoring scale is managed by integrating these large data sets into both Salesforce and ISV tools. Whether it's extracting, transforming, and loading the data into the Salesforce Wave platform or into Splunk, customers need to work with billions of events. Because it's exposed to the API first, you can choose whether to simply extract, transform, and load the data into a tool or to pre-aggregate the data by rolling it up and exploring summarized aspects of the data. And once the data is in a more usable form, great user interfaces like Salesforce Wave or New Relic can take over.

While these are just examples of how API first works in an administrator's world, I've found that the greatest barrier of entry for an administrator to work with the API is that it's difficult to start with a blinking cursor on the screen. Before jumping into a scripting language or Apex, I recommend getting your feet wet with a tool that uses the API to interact with data. Many Administrators I know use Dataloader, SOQL Explorer, or an ISV tool like Jitterbit. I prefer the Workbench for a couple key reasons:
  1. It's available on every platform because it's cloud based
  2. It has a swiss army knife's equivalent of important API features including support for SOAP, REST, Tooling, and Metadata APIs
  3. It has a simple and effective user experience that gets me one step closer to the blinking cursor
The second tool that becomes critical is the Salesforce API documentation. This documentation reads like Grey's Anatomy, which is exactly what you need when figuring out how far you can push the envelope with what you can actually do. So an even better place to start is with the SOQL (Salesforce Object Query Language) documentation. SOQL is just about asking questions and getting answers.

Finally, I was surprised last week while talking with a customer about how to get started with the API. They didn't even know which permissions to request from their administrator. The key permission you need is "API Enabled". You will also need at least view access to data. It's easy to start with "View All Data"; however, this escalated permission should be reduced to read on any objects you want to query as you identify specific tasks you want to accomplish. And if you're doing anything with the REST API like building an integration using cURL, I recommend you ask your administrator for a connected app client id and secret. It's a great way to constrain API access for a specific application.

Finally, come up with a list of questions and start simple like 'what are the names of my accounts' (SELECT Name FROM Account)

Example of a SOQL query in Workbench

API first is a disruptive way of administering any system. It doesn't mean that you can't be a button click administrator. Button click administration is still the easiest and best way to maintain the setup of your organization. It just means that, with the API, you might be able to solve a problem sooner while waiting for that perfect user interface to be built. 


Icons by DryIcons

05 January 2015

Simple Script for Loading Event Monitoring Data into the Salesforce Wave

In the past few weeks, I've had a number of customer calls asking how we can make it easier to get Event Monitoring data into the Salesforce Wave. In a previous blog post:  Event Monitoring + Salesforce Wave = BFF, I talked about the best practices to ETL (extract, transform, and load) the data into Wave. This is helpful if you are an integration specialist who is used to building enterprise integrations with Salesforce. However, if you're just trying out Event Monitoring and Wave for the first time, you need something simple that gets you up and running as fast as possible.

To get customers up and running faster, I created a script that incorporated all of these best practices. You can download this script from my Github repository: elfWave.sh to give it a try. Alternatively, you can use this script to understand the data flow in case you create scripts using other languages or middleware integration tools.

There are a couple of gotchas that you should be aware of that are specific to this script and process:
  1. jq is a pre-requisite - I recommend downloading it into your /bin directory.
  2. datasetutils.jar is a pre-requisite for making it easy to upload your data into Wave. You can download the latest version from the Analytics cloud Github repo, or download version 32.0.0 that I used from my Github repo.
  3. Event Monitoring and Wave needs to be enabled in your org. The user downloading the Event Monitoring files needs 'View Event Log Files' user permission on their profile or permission set. And the user uploading files to Analytics Cloud needs 'Upload External Data to Analytics Cloud' user permission on their profile or permission set.
  4. When running the script, be sure you have the rights (e.g. chmod +x elfWave.sh) and if you don't save the script in your /bin directory, you start it properly by navigating to the directory where you saved the script (e.g. cd ~/Downloads) and invoke it (e.g. sh elfWave.sh or ./elfWave.sh).
  5. The script has been tested on Mac OSX and Ubuntu Linux but may need some additional configuration if you run it on Windows (e.g. install cygwin and curl).
Below is a breakdown of the more important parts of the script:

If you are not scheduling the script to run automatically, you should prompt the user for logins to both their Event Monitoring and Wave orgs. It's likely that these are the same orgs; however, I'm making the assumption that these may be two different orgs.
#prompt the user to enter their username for the target (Wave) org
read -p "Please enter the Wave target org username (and press ENTER): " tUsername

#prompt the user to enter their password for the target (Wave) org
read -s -p "Please enter the Wave target org password (and press ENTER): " tPassword

#prompt the user to enter their username for the source (Event Monitoring) org
read -p "Please enter username for Event Monitoring source org (and press ENTER): " username

#prompt the user to enter their password for the source (Event Monitoring) org
read -s -p "Please enter password for Event Monitoring source org (and press ENTER): " password

#prompt the user to enter their instance end-point for the source (Event Monitoring) org
echo 
read -p "Please enter instance (e.g. na1) for the for Event Monitoring source org loginURL (and press ENTER): " instance

#prompt the user to enter the date for the logs they want to download for the source (Event Monitoring) org
read -p "Please enter logdate (e.g. Yesterday, Last_Week, Last_n_Days:5) (and press ENTER): " day

You can choose whether to upload any single event type or all 28 types to Wave. You might want to try out just one event type (e.g. API or URI) to test it. However, if you don't choose anything, all 28 will be selected.
#prompt the user to enter the eventType they want to download for the source (Event Monitoring) orgs
printf 'What EventType do you want to download?\n'
printf '1. All 28 event types (Default)\n'
printf '2. API\n'
printf '3. ApexCallout\n'
printf '4. ApexExecution\n'
printf '5. ApexSoap\n'
printf '6. ApexTrigger\n'
printf '7. AsyncReportRun\n'
printf '8. BulkApi\n'
printf '9. ChangeSetOperation\n'
printf '10. ContentDistribution\n'
#….

read eventMenu

case $eventMenu in
     1)
          eventType=${eventType:-All}
          ;;
     2)
          eventType=${eventType:-API}
          ;;
     3)
          eventType=${eventType:-ApexCallout}
          ;; 
     4)
          eventType=${eventType:-ApexExecution}
          ;; 
     5)
          eventType=${eventType:-ApexSoap}
          ;;      
     6)
          eventType=${eventType:-ApexTrigger}
          ;; 
     7)
          eventType=${eventType:-AsyncReportRun}
          ;; 
     8)
          eventType=${eventType:-BulkApi}
          ;; 
     9)
          eventType=${eventType:-ChangeSetOperation}
          ;; 
     10)
          eventType=${eventType:-ContentDistribution}
          ;; 
     #…. 
     *)
          eventType=${eventType:-All}
          ;;
esac

echo ${eventType}

Download the files into the 'eventType-raw' directory, convert the TIMESTAMP from integer to date/time format, and store them in the eventType directory.
#loop through the array of results and download each file with the following naming convention: EventType.csv
for i in "${!ids[@]}"; do
    
    #make directory to store the files by date and separate out raw data from 
    #converted timezone data
    mkdir "${eventTypes[$i]}-raw"
    mkdir "${eventTypes[$i]}"

    #download files into the ${eventTypes[$i]}-raw directory
    curl "https://${instance}.salesforce.com/services/data/v32.0/sobjects/EventLogFile/${ids[$i]}/LogFile" -H "Authorization: Bearer ${access_token}" -H "X-PrettyPrint:1" -o "${eventTypes[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" 

    #convert files into the ${eventTypes[$i]} directory for Salesforce Analytics
    awk -F ','  '{ if(NR==1) printf("%s\n",$0); else{ for(i=1;i<=NF;i++) { if(i>1&& i<=NF) printf("%s",","); if(i == 2) printf "\"%s-%s-%sT%s:%s:%sZ\"", substr($2,2,4),substr($2,6,2),substr($2,8,2),substr($2,10,2),substr($2,12,2),substr($2,14,2); else printf ("%s",$i);  if(i==NF) printf("\n")}}}' "${eventTypes[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" > "${eventTypes[$i]}/${eventTypes[$i]}-${logDates[$i]}.csv"

done

Because there may be 'n' number of days resulting in 'n' number of files, we merge the files into a single eventType file that will be uploaded into a single dataset with the eventType name. This will simplify the development of lenses and dashboards that trend events over a period of days.
#variable to count the number of unique event types
uEventTypes=( $(echo ${elfs} | jq -r ".records[].EventType" | uniq) )

#merge data into single CSV file
for j in "${uEventTypes[@]}"
do
    output_file="$j.csv"
    count=0

    for f in `ls $j/*.csv`
    do
        echo "still merging [$f]"
            
            echo "merging file: $f to $output_file."
            if [ $count -eq 0 ]; then

                    awk -F ',' '{print $0}' $f 1>$output_file
            else
                    awk -F ',' 'FNR>1 {print $0}' $f 1>>$output_file
            fi
            count=`expr $count + 1`
            echo "number of input files: $count merged to output file: $output_file"
    done
done

Using the datasetutils java app, upload the transformed, merged CSV files into Wave.
#load CSV files to datasets in Wave
for i in `ls *.csv`; do
    #variables to specify file and dataset name
    eventFile=`echo $i`
    eventName=`echo $i | sed 's/\.csv//g'`
    
    #comment next line to test before uploading to Wave
    java -jar datasetutils-32.0.0.jar --action load --u ${tUsername} --p ${tPassword} --inputFile ${eventFile} --dataset ${eventName}
done

Finally, because we may want to repeat this process on a regular basis (e.g. daily, weekly, monthly, etc…), choose whether to remove the files locally or keep them around for long-term storage and audit purposes.
#prompt user to clean up data and directories
read -p "Do you want to delete data directories and files? (Y/N)" del 

if [ $del == Y ] || [ $del == y ] || [ $del == Yes ] || [ $del == yes ]; then
    #clean up data directories
    for i in "${!uEventTypes[@]}"; do
        rm -r "${uEventTypes[$i]}-raw"
        rm -r "${uEventTypes[$i]}"
        rm "${uEventTypes[$i]}.csv"
    done
    rm -r "archive"
    #leave data and directories for audit reasons
    echo "The files were removed."
elif [ $del == N ] || [ $del == n ] || [ $del == No ] || [ $del == no ]; then
    echo "The files will remain."
fi

This script doesn't replace the need for an integration specialist to automate the ETL process; however, it should hopefully make it easier to trial both Event Monitoring and Wave.

08 December 2014

Event Monitoring + Salesforce Wave = BFF


At Dreamforce 2014, I led a session entitled Audit Analytics that described the integration between Event Monitoring and Project Wave.

Combing the two solutions is a no brainer. Event Log Files generates CSV files on a daily basis. The Wave platform makes sense of CSVs for business analysts. 

While you can watch the video at http://bit.ly/DF14AuditAnalytics, there are a couple of tips, tricks, and best practices I want to share when using Event Log Files with the Wave platform:
  1. Consider storage requirements. Event data isn't like CRM data - there's a lot more of it. One org I work with logs approximately twenty million rows of event data per day using Event Log Files. That's approximately 600 million rows per month or 3.6 billion every half year. That means you will need to consider what data you import and how you manage that data over time.
  2. Understand your schema. There are tons of great use cases that Event Log Files solve; however, the secret sauce here is understanding what's possible already. Download a sample of files and take a look in Excel or run the head command in your terminal (i.e. head -n 2 VisualforceRequest-2014-10-21.csv) to get a sense of the kinds of lenses and dashboards you want to create. Read more about the lexicon of possible field values in the Event Log File Field Lexicon blog posting.
  3. You should convert the TIMESTAMP field in each log file to something that Wave can understand and trend in their timeline graphs. Event Log Files provides an epoch style TIMESTAMP (i.e. 20140925015802.062) rather than date format (i.e. 2014-09-25T01:58:02Z). I usually build this transformation into the download process. Read more about this transformation process with my Working with Timestamps in Event Log Files blog posting.
  4. You should de-normalize Ids into Name fields where possible. For instance, instead of uploading just USER_ID, you should also upload USER_NAME so that it's more human readable. If you don't do this before you upload the data, you can always use SAQL to help normalize name fields. Read more about using pig and data pipelines to de-normalize data before importing it into Wave with the Hadoop and Pig come to the Salesforce Platform with Data Pipelines blog posting.
  5. Merge files across days to reduce the number of datasets you have to manage (i.e. awk -F ',' 'FNR > 1 {print $0}' new_* > merged_file.csv) rather than treating each day of log files as a new dataset.
  6. Import your data using the dataset loader from Github: https://github.com/forcedotcom/Analytics-Cloud-Dataset-Utils/releases. This is the easiest way to automate dataset creation and management.
Combining large scale event data about the operational health of your organization with the power of an incredible visualization platform has the ability to change how you separate truth from fiction with your users.