08 December 2014

Event Monitoring + Salesforce Wave = BFF


At Dreamforce 2014, I led a session entitled Audit Analytics that described the integration between Event Monitoring and Project Wave.

Combing the two solutions is a no brainer. Event Log Files generates CSV files on a daily basis. The Wave platform makes sense of CSVs for business analysts. 

While you can watch the video at http://bit.ly/DF14AuditAnalytics, there are a couple of tips, tricks, and best practices I want to share when using Event Log Files with the Wave platform:
  1. Consider storage requirements. Event data isn't like CRM data - there's a lot more of it. One org I work with logs approximately twenty million rows of event data per day using Event Log Files. That's approximately 600 million rows per month or 3.6 billion every half year. That means you will need to consider what data you import and how you manage that data over time.
  2. Understand your schema. There are tons of great use cases that Event Log Files solve; however, the secret sauce here is understanding what's possible already. Download a sample of files and take a look in Excel or run the head command in your terminal (i.e. head -n 2 VisualforceRequest-2014-10-21.csv) to get a sense of the kinds of lenses and dashboards you want to create. Read more about the lexicon of possible field values in the Event Log File Field Lexicon blog posting.
  3. You should convert the TIMESTAMP field in each log file to something that Wave can understand and trend in their timeline graphs. Event Log Files provides an epoch style TIMESTAMP (i.e. 20140925015802.062) rather than date format (i.e. 2014-09-25T01:58:02Z). I usually build this transformation into the download process. Read more about this transformation process with my Working with Timestamps in Event Log Files blog posting.
  4. You should de-normalize Ids into Name fields where possible. For instance, instead of uploading just USER_ID, you should also upload USER_NAME so that it's more human readable. If you don't do this before you upload the data, you can always use SAQL to help normalize name fields. Read more about using pig and data pipelines to de-normalize data before importing it into Wave with the Hadoop and Pig come to the Salesforce Platform with Data Pipelines blog posting.
  5. Merge files across days to reduce the number of datasets you have to manage (i.e. awk -F ',' 'FNR > 1 {print $0}' new_* > merged_file.csv) rather than treating each day of log files as a new dataset.
  6. Import your data using the dataset loader from Github: https://github.com/forcedotcom/Analytics-Cloud-Dataset-Utils/releases. This is the easiest way to automate dataset creation and management.
Combining large scale event data about the operational health of your organization with the power of an incredible visualization platform has the ability to change how you separate truth from fiction with your users.

01 December 2014

Working with Timestamps in Event Log Files

An event in a log file represents that something happened in our application along a timeline of events.

As a result, every Event Log File contains a TIMESTAMP field which represents the time each event happened in GMT. This is useful for understanding when the event happened, for correlating user behavior during a time period, and for trending similar events over various periods of time.



The TIMESTAMP field in Event Log Files is stored as a number. This simplifies storage costs since date formatting a string comes with a storage cost and can be more difficult to transform latter. This can become a challenge when importing Event Log File data into an analytics system that requires a different date time format. And there are a lot of different kinds of date time formats that are possible.

For instance, Salesforce Analytics Cloud's Wave platform accepts a variety of different date time formats:


This means that you will have to convert the TIMESTAMP field for each row within an Event Log File into something that Wave or any other analytics platform can interpret.

I usually convert the TIMESTAMP when I download the file, that way it makes it easier to do it in one step.

To convert it, I use a simple AWK script that Aakash Pradeep wrote, in my download script or in my Mac terminal. It takes the input from a downloaded file like Login.csv and creates a new file, substituting each TIMESTAMP field value with the right format:
awk -F ','  '{ if(NR==1) printf("%s\n",$0); else{ for(i=1;i<=NF;i++) { if(i>1&& i<=NF) printf("%s",","); if(i == 2) printf "\"%s-%s-%sT%s:%s:%sZ\"", substr($2,2,4),substr($2,6,2),substr($2,8,2),substr($2,10,2),substr($2,12,2),substr($2,14,2); else printf ("%s",$i);  if(i==NF) printf("\n")}}}' Login.csv  > new_Login1.csv

Date time formats can be a challenge in any system and this utility provides me a quick and easy way of converting date time formats into something I can use with my analytics platform of choice.

24 November 2014

Downloading Event Log Files using a Script

Event Monitoring, new in the Winter '15 release, enables use cases like adoption, user audit, troubleshooting, and performance profiling using an easy to download, file based API to extract Salesforce app log data.

The most important part is making it easy to download the data so that you can integrate it with your analytics platform.

To help make it easy, I created a simple bash shell script to download these CSV (comma separated value) files to your local drive. It works best on Mac and Linux but can be made to work with Windows with a little elbow grease. You can try these scripts out at http://bit.ly/elfBash. These scripts do require a separate JSON library called jq to parse the JSON that's returned by the REST API.

It's not difficult to build these scripts using other languages such as Ruby, Perl, or Python. What's important is the data flow.

I prompt the user to enter their username and password (which is masked). This information can just as easily be stored in environment variables or encrypted so that you can automate the download on a daily basis using CRON or launchd schedulers.

#!/bin/bash
# Bash script to download EventLogFiles
# Pre-requisite: download - http://stedolan.github.io/jq/ to parse JSON

#prompt the user to enter their username or uncomment #username line for testing purposes
read -p "Please enter username (and press ENTER): " username

#prompt the user to enter their password 
read -s -p "Please enter password (and press ENTER): " password

#prompt the user to enter their instance end-point 
echo 
read -p "Please enter instance (e.g. na1) for the loginURL (and press ENTER): " instance

#prompt the user to enter the date for the logs they want to download
read -p "Please enter logdate (e.g. Yesterday, Last_Week, Last_n_Days:5) (and press ENTER): " day

Once we have the credentials, we can log in using oAuth and get the access token.

#set access_token for OAuth flow 
#change client_id and client_secret to your own connected app - http://bit.ly/clientId
access_token=`curl https://${instance}.salesforce.com/services/oauth2/token -d "grant_type=password" -d "client_id=3MVG99OxTyEMCQ3hSja25qIUWtJCt6fADLrtDeTQA12.liLd5pGQXzLy9qjrph.UIv2UkJWtwt3TnxQ4KhuD" -d "client_secret=3427913731283473942" -d "username=${username}" -d "password=${password}" -H "X-PrettyPrint:1" | jq -r '.access_token'`

Then we can query the event log files to get the Ids necessary to download the files and store the event type and log date in order to properly name the download directory and files.

#set elfs to the result of ELF query
elfs=`curl https://${instance}.salesforce.com/services/data/v31.0/query?q=Select+Id+,+EventType+,+LogDate+From+EventLogFile+Where+LogDate+=+${day} -H "Authorization: Bearer ${access_token}" -H "X-PrettyPrint:1"`

Using jq, we can parse the id, event type, and date in order to create the directory and file names

#set the three variables to the array of Ids, EventTypes, and LogDates which will be used when downloading the files into your directory
ids=( $(echo ${elfs} | jq -r ".records[].Id") )
eventTypes=( $(echo ${elfs} | jq -r ".records[].EventType") )
logDates=( $(echo ${elfs} | jq -r ".records[].LogDate" | sed 's/'T.*'//' ) )

We create the directories to store the files. In this case, we download the raw data and then convert the timestamp to something our analytics platform will understand better.

Then we can iterate through each download, renaming it to the Event Type + Log Date so that we easily refer back to it later on. I also transform the Timestamp field to make it easier to import into an analytics platform like Project Wave from Salesforce Analytics Cloud.

#loop through the array of results and download each file with the following naming convention: EventType-LogDate.csv
for i in "${!ids[@]}"; do
    
    #make directory to store the files by date and separate out raw data from 
    #converted timezone data
    mkdir "${logDates[$i]}-raw"
    mkdir "${logDates[$i]}-tz"

    #download files into the logDate-raw directory
    curl "https://${instance}.salesforce.com/services/data/v31.0/sobjects/EventLogFile/${ids[$i]}/LogFile" -H "Authorization: Bearer ${access_token}" -H "X-PrettyPrint:1" -o "${logDates[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" 

    #convert files into the logDate-tz directory for Salesforce Analytics
    awk -F ','  '{ if(NR==1) printf("%s\n",$0); else{ for(i=1;i<=NF;i++) { if(i>1&& i<=NF) printf("%s",","); if(i == 2) printf "\"%s-%s-%sT%s:%s:%sZ\"", substr($2,2,4),substr($2,6,2),substr($2,8,2),substr($2,10,2),substr($2,12,2),substr($2,14,2); else printf ("%s",$i);  if(i==NF) printf("\n")}}}' "${logDates[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" > "${logDates[$i]}-tz/${eventTypes[$i]}-${logDates[$i]}.csv"

done

Downloading event log files is quick and efficient.  You can try these scripts out at http://bit.ly/elfBash. Give it a try!