05 January 2015

Simple Script for Loading Event Monitoring Data into the Salesforce Wave

In the past few weeks, I've had a number of customer calls asking how we can make it easier to get Event Monitoring data into the Salesforce Wave. In a previous blog post:  Event Monitoring + Salesforce Wave = BFF, I talked about the best practices to ETL (extract, transform, and load) the data into Wave. This is helpful if you are an integration specialist who is used to building enterprise integrations with Salesforce. However, if you're just trying out Event Monitoring and Wave for the first time, you need something simple that gets you up and running as fast as possible.

To get customers up and running faster, I created a script that incorporated all of these best practices. You can download this script from my Github repository: elfWave.sh to give it a try. Alternatively, you can use this script to understand the data flow in case you create scripts using other languages or middleware integration tools.

There are a couple of gotchas that you should be aware of that are specific to this script and process:
  1. jq is a pre-requisite - I recommend downloading it into your /bin directory.
  2. datasetutils.jar is a pre-requisite for making it easy to upload your data into Wave. You can download the latest version from the Analytics cloud Github repo, or download version 32.0.0 that I used from my Github repo.
  3. Event Monitoring and Wave needs to be enabled in your org. The user downloading the Event Monitoring files needs 'View Event Log Files' user permission on their profile or permission set. And the user uploading files to Analytics Cloud needs 'Upload External Data to Analytics Cloud' user permission on their profile or permission set.
  4. When running the script, be sure you have the rights (e.g. chmod +x elfWave.sh) and if you don't save the script in your /bin directory, you start it properly by navigating to the directory where you saved the script (e.g. cd ~/Downloads) and invoke it (e.g. sh elfWave.sh or ./elfWave.sh).
  5. The script has been tested on Mac OSX and Ubuntu Linux but may need some additional configuration if you run it on Windows (e.g. install cygwin and curl).
Below is a breakdown of the more important parts of the script:

If you are not scheduling the script to run automatically, you should prompt the user for logins to both their Event Monitoring and Wave orgs. It's likely that these are the same orgs; however, I'm making the assumption that these may be two different orgs.
#prompt the user to enter their username for the target (Wave) org
read -p "Please enter the Wave target org username (and press ENTER): " tUsername

#prompt the user to enter their password for the target (Wave) org
read -s -p "Please enter the Wave target org password (and press ENTER): " tPassword

#prompt the user to enter their username for the source (Event Monitoring) org
read -p "Please enter username for Event Monitoring source org (and press ENTER): " username

#prompt the user to enter their password for the source (Event Monitoring) org
read -s -p "Please enter password for Event Monitoring source org (and press ENTER): " password

#prompt the user to enter their instance end-point for the source (Event Monitoring) org
echo 
read -p "Please enter instance (e.g. na1) for the for Event Monitoring source org loginURL (and press ENTER): " instance

#prompt the user to enter the date for the logs they want to download for the source (Event Monitoring) org
read -p "Please enter logdate (e.g. Yesterday, Last_Week, Last_n_Days:5) (and press ENTER): " day

You can choose whether to upload any single event type or all 28 types to Wave. You might want to try out just one event type (e.g. API or URI) to test it. However, if you don't choose anything, all 28 will be selected.
#prompt the user to enter the eventType they want to download for the source (Event Monitoring) orgs
printf 'What EventType do you want to download?\n'
printf '1. All 28 event types (Default)\n'
printf '2. API\n'
printf '3. ApexCallout\n'
printf '4. ApexExecution\n'
printf '5. ApexSoap\n'
printf '6. ApexTrigger\n'
printf '7. AsyncReportRun\n'
printf '8. BulkApi\n'
printf '9. ChangeSetOperation\n'
printf '10. ContentDistribution\n'
#….

read eventMenu

case $eventMenu in
     1)
          eventType=${eventType:-All}
          ;;
     2)
          eventType=${eventType:-API}
          ;;
     3)
          eventType=${eventType:-ApexCallout}
          ;; 
     4)
          eventType=${eventType:-ApexExecution}
          ;; 
     5)
          eventType=${eventType:-ApexSoap}
          ;;      
     6)
          eventType=${eventType:-ApexTrigger}
          ;; 
     7)
          eventType=${eventType:-AsyncReportRun}
          ;; 
     8)
          eventType=${eventType:-BulkApi}
          ;; 
     9)
          eventType=${eventType:-ChangeSetOperation}
          ;; 
     10)
          eventType=${eventType:-ContentDistribution}
          ;; 
     #…. 
     *)
          eventType=${eventType:-All}
          ;;
esac

echo ${eventType}

Download the files into the 'eventType-raw' directory, convert the TIMESTAMP from integer to date/time format, and store them in the eventType directory.
#loop through the array of results and download each file with the following naming convention: EventType.csv
for i in "${!ids[@]}"; do
    
    #make directory to store the files by date and separate out raw data from 
    #converted timezone data
    mkdir "${eventTypes[$i]}-raw"
    mkdir "${eventTypes[$i]}"

    #download files into the ${eventTypes[$i]}-raw directory
    curl "https://${instance}.salesforce.com/services/data/v32.0/sobjects/EventLogFile/${ids[$i]}/LogFile" -H "Authorization: Bearer ${access_token}" -H "X-PrettyPrint:1" -o "${eventTypes[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" 

    #convert files into the ${eventTypes[$i]} directory for Salesforce Analytics
    awk -F ','  '{ if(NR==1) printf("%s\n",$0); else{ for(i=1;i<=NF;i++) { if(i>1&& i<=NF) printf("%s",","); if(i == 2) printf "\"%s-%s-%sT%s:%s:%sZ\"", substr($2,2,4),substr($2,6,2),substr($2,8,2),substr($2,10,2),substr($2,12,2),substr($2,14,2); else printf ("%s",$i);  if(i==NF) printf("\n")}}}' "${eventTypes[$i]}-raw/${eventTypes[$i]}-${logDates[$i]}.csv" > "${eventTypes[$i]}/${eventTypes[$i]}-${logDates[$i]}.csv"

done

Because there may be 'n' number of days resulting in 'n' number of files, we merge the files into a single eventType file that will be uploaded into a single dataset with the eventType name. This will simplify the development of lenses and dashboards that trend events over a period of days.
#variable to count the number of unique event types
uEventTypes=( $(echo ${elfs} | jq -r ".records[].EventType" | uniq) )

#merge data into single CSV file
for j in "${uEventTypes[@]}"
do
    output_file="$j.csv"
    count=0

    for f in `ls $j/*.csv`
    do
        echo "still merging [$f]"
            
            echo "merging file: $f to $output_file."
            if [ $count -eq 0 ]; then

                    awk -F ',' '{print $0}' $f 1>$output_file
            else
                    awk -F ',' 'FNR>1 {print $0}' $f 1>>$output_file
            fi
            count=`expr $count + 1`
            echo "number of input files: $count merged to output file: $output_file"
    done
done

Using the datasetutils java app, upload the transformed, merged CSV files into Wave.
#load CSV files to datasets in Wave
for i in `ls *.csv`; do
    #variables to specify file and dataset name
    eventFile=`echo $i`
    eventName=`echo $i | sed 's/\.csv//g'`
    
    #comment next line to test before uploading to Wave
    java -jar datasetutils-32.0.0.jar --action load --u ${tUsername} --p ${tPassword} --inputFile ${eventFile} --dataset ${eventName}
done

Finally, because we may want to repeat this process on a regular basis (e.g. daily, weekly, monthly, etc…), choose whether to remove the files locally or keep them around for long-term storage and audit purposes.
#prompt user to clean up data and directories
read -p "Do you want to delete data directories and files? (Y/N)" del 

if [ $del == Y ] || [ $del == y ] || [ $del == Yes ] || [ $del == yes ]; then
    #clean up data directories
    for i in "${!uEventTypes[@]}"; do
        rm -r "${uEventTypes[$i]}-raw"
        rm -r "${uEventTypes[$i]}"
        rm "${uEventTypes[$i]}.csv"
    done
    rm -r "archive"
    #leave data and directories for audit reasons
    echo "The files were removed."
elif [ $del == N ] || [ $del == n ] || [ $del == No ] || [ $del == no ]; then
    echo "The files will remain."
fi

This script doesn't replace the need for an integration specialist to automate the ETL process; however, it should hopefully make it easier to trial both Event Monitoring and Wave.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.