Fixing Excel’s Sci Not Faux Pas with R

Encountered what I think is a pretty common excel problem at work today. A colleague showed me an excel spreadsheet that was reading warehouse locations as scientific notation. For example, location 05E03 was being read into excel as 5.00E+03 and if you tried to edit the cell or convert it to text, you’d be given ‘5000’ losing all the necessary information. Any location with an “E” in the middle gave us this problem. While this could probably be easily fixed with VBA, I opted to fix with R since my VBA skills are piss poor.

# Read data, identify target column, and write function to 'clean' the locations
data <- read.csv("DorisData.csv", stringsAsFactors=FALSE)
names(data)
doris <- function(x) {
  if (strsplit(x,'')[[1]][1] == "0"){
    helper = x    
  } else {
    helper = paste0("'","0", strsplit(x,"")[[1]][1], 
                             strsplit(x,"")[[1]][5], 
                             strsplit(x,"")[[1]][7], 
                             strsplit(x,"")[[1]][8])
  }
  return (helper)
}

Notice that I added an apostrophe before the string that was getting stuck in scientific notation. Without this… the problem would persist.

# Apply doris to each location
data['WLOC'] = apply(data['WLOC'], 1, doris)

# Write data back to csv
write.csv(data, 'dorisdata.csv')
}

After writing this back to csv and reading with excel the last step was to drop the ‘ from the E locations.

 "=if(left(A2,1)="0", A2, right(A2, 5)" 

Now locations such as 04B23 stayed as were and locations such as 05E23 read as 05E03 instead of 5.00E+03.

Leave a Reply

Your email address will not be published. Required fields are marked *