So you ~~inherited~~ took over some Tableau Workbooks and need to rework them into something more efficient and migrate to Tableau server. I am going to recreate teh Tableau Workbooks on Tableau server rather than just publishing as I found the calculated fields get published as a dimension. Other than calculated fields, I found there are group definition to take care of too and I have not look into sets yet. With multiple data sources and calculated fields and groups, I wanted get a complete list of these to work through.
🎶Meanwhile, some music for you to enjoy as you read. This piece is part of Vivaldi’s The Four Seasons, particularly the first of the 4 violin concertos, called Spring.
🎶让大家继续读的时候,欣赏一点音乐。这件作品是维瓦尔第的“四季”的一部分 - 4小提琴协奏曲中的第一首,名为“春天”。
Source | 来源: https://musopen.org/
https://soundcloud.com/alan-chang-563971986/four-concerti-grossi-4-vivaldi-op8-i-la-primavera-allegro-largo-allegro
Source | 来源: https://musopen.org/music/2213/antonio-vivaldi/the-four-seasons/
I did some research and found that Tableau Workbooks are basically saved as XML files. So I should be able to “look” into it. I found a page which gives a clue on handling the calculated fields but that doesn’t take care of the group definitions. So I studied the way it works for the calculated fields and rework a bit to cater for getting the information for calculated fields and group definitions. My program will produce two Excel files: one with the information for calculated fields and the other with the information for group definitions. It’s not perfect but here is the program which I wrote in Jupyter Notebook and downloaded as python file if anyone is interested.
#!/usr/bin/env python
# coding: utf-8
# This program has two parts:
# 1. Extract Tableau TWB Calculated Fields into an Excel file
# Reference :https://vizartpandey.com/how-to-extract-tableau-field-information-using-python-api/
#
#
# 2. Extract Tableau TWB Group definition into an Excel file
# Note: I decided to output to 2 worksheets because I wanted to look at them in different format
# Librarys import and setup
import easygui
import xml.etree.ElementTree as ET
import os
import pandas as pd
from IPython.display import Markdown, display, HTML
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
# To prompt user for twb file, uncomment the next line and comment out the line after.
# file = easygui.fileopenbox(default="C:\projects\ace\twb\*.twb")
# Update your TWB filename to read
file = r"C:\Users\ace\Documents\My Tableau Repository\Workbooks\Tableau For Healthcare - Ambulatory Access.twb" # <--- change filename if necessary
#parse the twb file
tree = ET.parse(file)
root = tree.getroot()
# create a dictionary of name and tableau generated name
calc_dict = {}
for item in root.findall('.//column[@caption]'):
if item.find(".//calculation") is None:
continue
else:
calc_dict[item.attrib['name']] = '[' + item.attrib['caption'] + ']'
# list of calculation field name, tableau generated name, and calculation/formula
calc_list = []
for item in root.findall('.//column[@caption]'):
if item.find(".//calculation") is None:
continue
else:
if item.find(".//calculation[@formula]") is not None:
calc_caption = '[' + item.attrib['caption'] + ']'
calc_name = item.attrib['name']
calc_raw_formula = item.find(".//calculation").attrib['formula']
calc_comment = ''
calc_formula = ''
for line in calc_raw_formula.split('\r\n'):
if line.startswith('//'):
calc_comment = calc_comment + line + ' '
else:
calc_formula = calc_formula + line + ' '
for name, caption in calc_dict.items():
calc_formula = calc_formula.replace(name, caption)
calc_row = (calc_caption, calc_name, calc_formula, calc_comment)
calc_list.append(list(calc_row))
else:
pass
# convert the list of calculation into a data frame
data = calc_list
data = pd.DataFrame(data, columns=['Name', 'Remote Name', 'Formula', 'Comment'])
# remove duplicate rows from data frame
data = data.drop_duplicates(subset=None, keep='first', inplace=False)
# Note: This part does not handle the comments enter in the same line as the formula but works for me well enough and I'm not going to amend further.
# get the name of the file
# export to Excel
base = os.path.basename(file)
os.path.splitext(base)
filename = os.path.splitext(base)[0]
out_folder = 'C:\\projects\\ace\\output\\'
out_file = out_folder + filename + '_' + 'calculations.xlsx'
data.to_excel(out_file, 'Calculations')
display(f'Opening output file containinng calculations {out_file}')
os.startfile(out_file)
# list of group name, tableau generated name, and group definitions
group_list = []
for item in root.findall(".//column[@caption]"):
group_caption = item.attrib.get('caption')[0:]
group_name = item.attrib.get('name')[1:-1]
for group in item.findall(".//calculation[@column]"):
group_depend_field = group.attrib.get('column')[1:-1]
for bin in group.findall(".//bin[@default-name]"):
default_bin = bin.attrib.get('default-name')
bin_value = bin.attrib.get('value')[0:]
bin_items = []
for bin_item in bin.findall(".//value"):
bin_item_value = bin_item.text
bin_items.append(bin_item_value)
bin_items_dict = dict(enumerate(bin_items, 1))
group_row = (group_caption, group_name, group_depend_field, bin_value, default_bin, bin_items_dict)
group_list.append(list(group_row))
# convert the list of groups into a data frame
group_data = group_list
group_df = pd.DataFrame(group_data, columns=['Name', 'Remote Name', 'Dependent Field', 'Bin', 'Default', 'Items'])
# export to Excel
out_file = out_folder + filename + '_' + 'groups.xlsx'
group_df.to_excel(out_file, 'Groups')
display(f'Opening output file containinng groups {out_file}')
os.startfile(out_file)
请看我其他帖: @ace108